Lyuba Dimitrova, Nadia Arslan, Nicolas Weber, Utaemon Toyota
# PROJECT
Softwareprojekt WS2018/19
Betreuerin: Prof. Dr. Anette Frank
Graph Embedding Propagation
# WordNet Preprocessing
To build a WordNet graph for Embedding Propagation from WordNet data.-files preprocessing is split into two steps.
Both steps create a graph (wn_graph_step1.pkl and wn_graph_step2.pkl), which is saved in a pickle file.
Step 1 (make_wn_graph_step1.py) creates a graph with string information, reading all synsets from data.-files as nodes of the graph. Label information like synset_id, pos, lexical file number, lemmata and gloss are added as node attributes. Relations between synsets are added as edges between the corresponding nodes. Nodes without edges are removed from the graph.
Step 2 (make_wn_graph_step2.py) takes the graph created in step 1. The gloss will be tokenized, pos-tagged and lemmatized with spacy, stopword will be removed. Finally every item of every label type will be mapped to an index. Node attributes are saved as arrays. To facilitate working with embeddings in Tensorflow, information about the label lengths and vocabulary size is added as graph attributes.
# Required Data
- data.-files from WordNet 3.0
- stopwords file from WordNet 3.0
# Dependencies
For make_wn_graph_step1.py and make_wn_graph_step2.py
-networkx for building the graph
-re for searching regular expressions
-os for path joining
-pickle to save data in a pickle file
-spacy for tokenization, pos-tagging and lemmatization