Skip to content
Snippets Groups Projects
Commit f9995c98 authored by toyota's avatar toyota
Browse files

update README with preprocessing for method 1

parent f9879f72
No related branches found
No related tags found
No related merge requests found
......@@ -6,7 +6,12 @@ Softwareprojekt WS2018/19
Betreuerin: Prof. Dr. Anette Frank
Graph Embedding Propagation
# Senseval Preprocessing
# Senseval Preprocessing for Method 1
This is an implementation to provide preprocessed data for our Word Sense Disambiguation Method 1. The skript will produce json-files for SensEval-2 and 3. This files include sentence splitted lists with lemmatized lowered words in a tuple together with the according WordNet3.0 POS-tag.
The output will be two JSON-files with preprocessed data from SensEval-2 respectively SensEval-3 datasets.
# Senseval Preprocessing for Method 2
This is an implementation to provide preprocessed data for our Word Sense Disambiguation Method 2. The skript will produce pkl-files for each document in Senseval2/3 named as the document name.
From provided Senseval-english-allword-test-data and their Penntree Bank annotations only the useful information will be filtered out. Lemmas which are not included in glossmappings or listed in stopwords will be deleted. For multiword-expressions, only the tag for the head-token will be saved. Information about their satellites will be discarded.
......@@ -29,16 +34,21 @@ gloss_mapping.txt
stopwords.txt
- includes stopwords, which will be filtered out
Python3 skript
Python3 skripts
- senseval_preprocessing.py
- preprocess_senseval_method1.py
## Dependencies
re - for regular expression matching
pickle - for saving the resulting lists in a pkl-file
json - for saving the results for WSD method 1
pickle - for saving the resulting lists in a pkl-file for WSD method 2
nltk - WordNetLemmatizer from NLTK for lemmatizing
## Running Instructions
python3 senseval_preprocessing.py [-s] [-g] [-v]
## Running Instructions Method 1
python[3] preprocess_senseval_method1.py
## Running Instructions Method 2
python[3] senseval_preprocessing.py [-s] [-g] [-v]
-s / --stopwords Path to txt-file with stopwords
-g / --gloss Path to txt-file with gloss mappings
-v / --version valid input: 2 or 3 for senseval 2 / 3
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment