From f9995c9863829a6d3ef18d06e7a1d9a3d1859463 Mon Sep 17 00:00:00 2001 From: Utaemon Toyota <toyota@cl.uni-heidelberg.de> Date: Wed, 27 Feb 2019 20:56:04 +0100 Subject: [PATCH] update README with preprocessing for method 1 --- Senseval_Prep/README.md | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/Senseval_Prep/README.md b/Senseval_Prep/README.md index a650bf4..f937bc6 100644 --- a/Senseval_Prep/README.md +++ b/Senseval_Prep/README.md @@ -6,7 +6,12 @@ Softwareprojekt WS2018/19 Betreuerin: Prof. Dr. Anette Frank Graph Embedding Propagation -# Senseval Preprocessing +# Senseval Preprocessing for Method 1 + +This is an implementation to provide preprocessed data for our Word Sense Disambiguation Method 1. The skript will produce json-files for SensEval-2 and 3. This files include sentence splitted lists with lemmatized lowered words in a tuple together with the according WordNet3.0 POS-tag. +The output will be two JSON-files with preprocessed data from SensEval-2 respectively SensEval-3 datasets. + +# Senseval Preprocessing for Method 2 This is an implementation to provide preprocessed data for our Word Sense Disambiguation Method 2. The skript will produce pkl-files for each document in Senseval2/3 named as the document name. From provided Senseval-english-allword-test-data and their Penntree Bank annotations only the useful information will be filtered out. Lemmas which are not included in glossmappings or listed in stopwords will be deleted. For multiword-expressions, only the tag for the head-token will be saved. Information about their satellites will be discarded. @@ -29,16 +34,21 @@ gloss_mapping.txt stopwords.txt - includes stopwords, which will be filtered out -Python3 skript +Python3 skripts - senseval_preprocessing.py +- preprocess_senseval_method1.py ## Dependencies re - for regular expression matching -pickle - for saving the resulting lists in a pkl-file +json - for saving the results for WSD method 1 +pickle - for saving the resulting lists in a pkl-file for WSD method 2 nltk - WordNetLemmatizer from NLTK for lemmatizing -## Running Instructions -python3 senseval_preprocessing.py [-s] [-g] [-v] +## Running Instructions Method 1 +python[3] preprocess_senseval_method1.py + +## Running Instructions Method 2 +python[3] senseval_preprocessing.py [-s] [-g] [-v] -s / --stopwords Path to txt-file with stopwords -g / --gloss Path to txt-file with gloss mappings -v / --version valid input: 2 or 3 for senseval 2 / 3 -- GitLab