From c7d0266772aecc32b01b3292086e70f23f9f8107 Mon Sep 17 00:00:00 2001
From: Nadia <nwarslan@cl.uni-heidelberg.de>
Date: Wed, 27 Feb 2019 23:18:01 +0100
Subject: [PATCH] wn readme

---
 scripts/preprocessing/wordnet/README.md | 37 +++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
 create mode 100644 scripts/preprocessing/wordnet/README.md

diff --git a/scripts/preprocessing/wordnet/README.md b/scripts/preprocessing/wordnet/README.md
new file mode 100644
index 0000000..30c771c
--- /dev/null
+++ b/scripts/preprocessing/wordnet/README.md
@@ -0,0 +1,37 @@
+# AUTHORS
+Lyuba Dimitrova, Nadia Arslan, Nicolas Weber, Utaemon Toyota
+
+# PROJECT
+Softwareprojekt WS2018/19
+Betreuerin: Prof. Dr. Anette Frank
+Graph Embedding Propagation
+
+# WordNet Preprocessing
+To build a WordNet graph for Embedding Propagation from WordNet data.-files preprocessing is split into two steps.
+
+Both steps create a graph (wn_graph_step1.pkl and wn_graph_step2.pkl), which is saved in a pickle file.
+
+Step 1 (make_wn_graph_step1.py) creates a graph with string information, reading all synsets from data.-files as nodes of the graph. Label information like synset_id, pos, lexical file number, lemmata and gloss are added as node attributes. Relations between synsets are added as edges between the corresponding nodes. Nodes without edges are removed from the graph. 
+
+Step 2 (make_wn_graph_step2.py) takes the graph created in step 1. The gloss will be tokenized, pos-tagged and lemmatized with spacy, stopword will be removed. Finally every item of every label type will be mapped to an index. Node attributes are saved as arrays. To facilitate working with embeddings in Tensorflow, information about the label lengths and vocabulary size is added as graph attributes.
+
+
+# Required Data
+- data.-files from WordNet 3.0
+- stopwords file from WordNet 3.0
+
+# Dependencies
+For make_wn_graph_step1.py and make_wn_graph_step2.py
+-networkx	for building the graph
+-re		for searching regular expressions
+-os		for path joining
+-pickle		to save data in a pickle file
+-spacy		for tokenization, pos-tagging and lemmatization
+-numpy		for arrays
+-json 		to dump mapping
+
+
+# Running instructions
+python[3] make_wn_graph_step1.py
+python[3] make_wn_graph_step2.py
+
-- 
GitLab