diff --git a/Senseval_Prep/README.md b/Senseval_Prep/README.md index a650bf425f1261ab94495beafbb41923ca37a5a6..f937bc660ffa7059f9f73f1d165c13943a5028cd 100644 --- a/Senseval_Prep/README.md +++ b/Senseval_Prep/README.md @@ -6,7 +6,12 @@ Softwareprojekt WS2018/19 Betreuerin: Prof. Dr. Anette Frank Graph Embedding Propagation -# Senseval Preprocessing +# Senseval Preprocessing for Method 1 + +This is an implementation to provide preprocessed data for our Word Sense Disambiguation Method 1. The skript will produce json-files for SensEval-2 and 3. This files include sentence splitted lists with lemmatized lowered words in a tuple together with the according WordNet3.0 POS-tag. +The output will be two JSON-files with preprocessed data from SensEval-2 respectively SensEval-3 datasets. + +# Senseval Preprocessing for Method 2 This is an implementation to provide preprocessed data for our Word Sense Disambiguation Method 2. The skript will produce pkl-files for each document in Senseval2/3 named as the document name. From provided Senseval-english-allword-test-data and their Penntree Bank annotations only the useful information will be filtered out. Lemmas which are not included in glossmappings or listed in stopwords will be deleted. For multiword-expressions, only the tag for the head-token will be saved. Information about their satellites will be discarded. @@ -29,16 +34,21 @@ gloss_mapping.txt stopwords.txt - includes stopwords, which will be filtered out -Python3 skript +Python3 skripts - senseval_preprocessing.py +- preprocess_senseval_method1.py ## Dependencies re - for regular expression matching -pickle - for saving the resulting lists in a pkl-file +json - for saving the results for WSD method 1 +pickle - for saving the resulting lists in a pkl-file for WSD method 2 nltk - WordNetLemmatizer from NLTK for lemmatizing -## Running Instructions -python3 senseval_preprocessing.py [-s] [-g] [-v] +## Running Instructions Method 1 +python[3] preprocess_senseval_method1.py + +## Running Instructions Method 2 +python[3] senseval_preprocessing.py [-s] [-g] [-v] -s / --stopwords Path to txt-file with stopwords -g / --gloss Path to txt-file with gloss mappings -v / --version valid input: 2 or 3 for senseval 2 / 3