fix typo

3e4e9c27 · toyota · 1aa9b631 · 3e4e9c27 · 3e4e9c27
Commit 3e4e9c27 authored 7 years ago by toyota
--- a/README.md
+++ b/README.md
@@ -95,7 +95,7 @@ After running the system you'll have the output file in your project folder.
 ### RUN THE SYSTEM:
-git clone https://gitlab.cl.uni-heidelberg.de/semantik_project/wsd_chernenko_schindler_toyota.git
+git clone https://gitlab.cl.uni-heidelberg.de/semantik_project/wsi_chernenko_schindler_toyota.git
 cd bin
@@ -108,7 +108,7 @@ python3 chertoy.py /home/tatiana/Desktop/FSemantik_Projekt /test /topics.txt /re
 ### Other files:
-* Performances_Table.pdf.pdf - a performance table with F1, RI, ARI and JI values of the baseline and 40 experiments (incl. CHERTOY) on the trial data.
+* Performances_Table.pdf - a performance table with F1, RI, ARI and JI values of the baseline and 40 experiments (incl. CHERTOY) on the trial data.
 * bin
@@ -120,7 +120,7 @@ The folder experiments contains an implementation of the baseline and 40 differe
 * lib
-The folder contains code for preprocessing Wikipedia Dataset to train own sent2vec models for the experiments and a README file. Our preprocessed Wikipedia 2017 dataset and two self-trained models of the Wikipedia 2017 dataset, that we used in our experiments with sent2vec, are provided on /proj/toyota on the server of the Institut.
+The folder contains code for preprocessing Wikipedia Dataset to train own sent2vec models for the experiments and a README file. Our preprocessed Wikipedia 2017 dataset and two self-trained models of the Wikipedia 2017 dataset, that we used in our experiments with sent2vec, are provided on /proj/toyota on the server of the Institut of Computerlinguistics Heidelberg.
 Other models that we used during our experiments can be found in sense2vec and sent2vec repositories.
 * experiments

--- a/lib/README.md
+++ b/lib/README.md
@@ -8,15 +8,14 @@ This is an implementation to provide necessary pre-processing steps for modeling
 Download Wikipedia Dump
 - Wikipedia Dumps for the english language is provided on https://meta.wikimedia.org/wiki/Data_dump_torrents#English_Wikipedia
- In our model we used enwiki-20170820-pages-articles-multistream.xml.bz2 (14.1 GiB)
+- For our model we used enwiki-20170820-pages-articles-multistream.xml.bz2 (14.1 GiB)
 Dependencies:
 - wikiExtractor: http://attardi.github.io/wikiextractor
 - fasttext: https://github.com/facebookresearch/fastText
 - sent2vec: https://github.com/epfml/sent2vec
+First the wikipedia text needs to be extracted from the provided XML.
-First of all the wikipedia text needs to be extracted from the provided XML.
 -extracted file: enwiki-20170820-pages-articles-multistream.xml (21.0GB)
 From the XML the plain text will be extracted using wikiExtractor:
@@ -25,9 +24,9 @@ WikiExtractor.py -o OUTPUT-DIRECTORY INPUT-XML-FILE
 _Example_
 WikiExtractor.py -o /wikitext enwiki-20170820-pages-articles-multistream.xml
-WikiExtractor will create several directories AA, AB, AC, ...,  CH with a total size of 6.2GB. Each directory contains 100 txt documents (besides CH -> 82).
+WikiExtractor will create several directories AA, AB, AC, ...,  CH with a total size of 6.2GB. Each directory contains 100 .txt documents (besides CH -> 82).
 Each article begins with an ID such as <doc id="12" url="https://en.wikipedia.org/wiki?curid=12" title="Anarchism">. Also comments in Parentheses are provided.
-Using preprocess_wikitext.py we delete all IDs, parentheses with their content and also quotes like ' or " and getting a plain wikipedia text. The text file contain one sentence per line. 
+Using preprocess_wikitext.py we delete all IDs, parentheses with their content and also quotes like ' or " for getting a plain wikipedia text. The output text file contains one sentence per line. 
 _Usage_
 python3 preprocess_wikitext.py input_directory_path output_txt_file_path