Skip to content
Snippets Groups Projects
Commit c7d02667 authored by nwarslan's avatar nwarslan
Browse files

wn readme

parent 4701ee43
No related branches found
No related tags found
No related merge requests found
# AUTHORS
Lyuba Dimitrova, Nadia Arslan, Nicolas Weber, Utaemon Toyota
# PROJECT
Softwareprojekt WS2018/19
Betreuerin: Prof. Dr. Anette Frank
Graph Embedding Propagation
# WordNet Preprocessing
To build a WordNet graph for Embedding Propagation from WordNet data.-files preprocessing is split into two steps.
Both steps create a graph (wn_graph_step1.pkl and wn_graph_step2.pkl), which is saved in a pickle file.
Step 1 (make_wn_graph_step1.py) creates a graph with string information, reading all synsets from data.-files as nodes of the graph. Label information like synset_id, pos, lexical file number, lemmata and gloss are added as node attributes. Relations between synsets are added as edges between the corresponding nodes. Nodes without edges are removed from the graph.
Step 2 (make_wn_graph_step2.py) takes the graph created in step 1. The gloss will be tokenized, pos-tagged and lemmatized with spacy, stopword will be removed. Finally every item of every label type will be mapped to an index. Node attributes are saved as arrays. To facilitate working with embeddings in Tensorflow, information about the label lengths and vocabulary size is added as graph attributes.
# Required Data
- data.-files from WordNet 3.0
- stopwords file from WordNet 3.0
# Dependencies
For make_wn_graph_step1.py and make_wn_graph_step2.py
-networkx for building the graph
-re for searching regular expressions
-os for path joining
-pickle to save data in a pickle file
-spacy for tokenization, pos-tagging and lemmatization
-numpy for arrays
-json to dump mapping
# Running instructions
python[3] make_wn_graph_step1.py
python[3] make_wn_graph_step2.py
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment