This project provides our re-implementation of the Graph Embedding Propagation (EP) algorithm based on the approach of Duran and Niepert, 2017. Our algorithm EP-SP was applied to the the task (1) of node classification over the cora dataset and to the task (2) of Word Sense Disambiguation (WSD) over WordNet 3.0.
(1) provides an evaluation of our graph and makes a comparison to the results presented in the original paper possible.
(2) provides two approaches for WSD using synset embeddings from applying EP-SP to WordNet 3.0. The task operates over the english-all-word datasets of SensEval-2 and SensEval-3.
(2) provides two approaches for WSD using synset embeddings from applying EP-SP to WordNet 3.0. The task operates over the English All-Word datasets of SensEval-2 and SensEval-3.
For more details please read our project report.
# Project structure
|_ cli.py
| The command line interface for our project. Imports and runs the files in scripts/. See Usage for more info.
|
|_ data/
| All necessary inputs and the generated outputs.
| |
| |_ cora
| | |_embeddings - the output of EP on Cora
| | |_graph - the input for EP - a pickled networkX graph
| | |_models - the saved TF models; can be used to save the embeddings after 40 epochs, for instance
| | |_raw - the raw Cora data, input for the Cora preprocessing and the node classification
| | |_summaries - the TF loss summaries for the two Cora label types; used to produce the figures in the report
| |
| |_ other - files that don't belong to a particular dataset
| |
| |_ senseval2/senseval3
| | |_processed - the processed raw S2 or S3 data; input for the WSD
| | |_raw - the raw S2 or S3 data
| | |_wsd-answers - outputs of the WSD on the S2 or S3 data
| |
| |_ wordnet
| |_embeddings - the output of EP on WordNet
| |_graph - the input for EP - a pickled networkX graph
| |_models - the saved TF models; can be used to save the embeddings after 50 epochs, for instance
| |_raw - the raw WN data, input for the preprocessing scripts
| |_summaries - the TF loss summaries for the five WN label types; used to produce the figures in the report
| |_mappings - various mappings for the synset IDs, lemmata, WN3.0->WN1.7 etc.; used in the WSD
|
|_ __init__.py
|
|_ README.md
| This file.
|
|_ requirements.txt
| An image of the virtualenv. -> pip install -r requirements.txt
|
|_ scripts/
| Python scripts for EP, preprocessing, node classification and WSD.
| |
| |_ embedding_propagation - the EP algorithm
| |_ node_classification - the NC experiment on the Cora dataset
| |_ preprocessing - preprocessing scripts for Cora, SensEval, and WordNet. Not part of the CLI, so partly with aux. files.
| |_ scoring - the official S2 and S3 All Words Task scorer for the WSD
| |_ wsd - the two WSD methods
|
# Usage
cli.py ...
The CLI provides 6 different commands.
List the commands:
python[3] cli.py --help
1. embedding-propagation
Runs EP on an input graph and saves the learnt embeddings.
Help and options: python[3] cli.py embedding-propagation --help
Example usage: python[3] cli.py embedding-propagation // runs on the WordNet graph by default
python[3] cli.py embedding-propagation -i data/cora/graphs/cora_graph.pkl -o data/cora/ -b 64 // to run on Cora
2. node-classification
Runs the NC experiment and print the results on the console.
Help and options: python[3] cli.py node-classification --help
Example usage: python[3] cli.py node-classification -s 10 // only one experiment, using 10 for the random seed
python[3] cli.py node-classification -i 50 // 50 runs with different random seeds
3. process-cora
Creates a networkx graph from the raw Cora data.
Help and options: python[3] cli.py process-cora --help
Example usage: python[3] cli.py process-cora -o test/graph.pkl
4. score-wsd
Calls the official WSD English All Word Task scorer.
Help and options: python[3] cli.py score-wsd --help
Example usage: python[3] cli.py score-wsd -s 3 -a wsd2_5
5. wsd-1
Calls the WSD method #1.
Help and options: python[3] cli.py wsd-1 --help
Example usage: python[3] cli.py wsd-1 -o wsd1 // after, you can call python[3] cli.py score-wsd -a wsd1 in order to score it
6. wsd-2
Calls the WSD method #2.
Help and options: python[3] cli.py wsd-2 --help
Example usage: python[3] cli.py wsd-2 -c 15 -o wsd2 // after, you can call python[3] cli.py score-wsd -a wsd2_15 in order to score it
# Data structure
...
# Licenses
This software is distributed under the MIT License.