diff --git a/lib/README.md b/lib/README.md index 4cf7703f27f335a84c3676edb004530a658c96b9..c12d9d347a626ccd2f5d5d5be4a00cf07a2a7e0b 100644 --- a/lib/README.md +++ b/lib/README.md @@ -1,10 +1,10 @@ # CHERTOY - Creating language model with sent2vec -This is an implementation to provide necessary pre-processing steps for modeling an own sent2vec model which is used in the experiments. The two language models we built are a uni-gram and a bi-gram model over the wikipedia 2017 corpus. +This is an implementation to provide necessary preprocessing steps for modeling an own sent2vec model which is used in the experiments. The two language models we built are a uni-gram and a bi-gram model over the wikipedia 2017 corpus. ## RUNNING INSTRUCTIONS -## Pre-Processing Wikipedia Dump +## Preprocessing Wikipedia Dump Download Wikipedia Dump - Wikipedia Dumps for the english language is provided on https://meta.wikimedia.org/wiki/Data_dump_torrents#English_Wikipedia