@@ -4,8 +4,7 @@ This is an implementation of the CHERTOY system for the Word Sense Induction tas
...
@@ -4,8 +4,7 @@ This is an implementation of the CHERTOY system for the Word Sense Induction tas
This project also contains an implementation of the baseline and 40 experiments with it.
This project also contains an implementation of the baseline and 40 experiments with it.
We experiment with language models, specific features and clustering algorithms based on the sense2vec and the sent2vec systems.
We experiment with language models, specific features and clustering algorithms based on the sense2vec and the sent2vec systems.
With a detailed research over 40 experiments we got an interesting insight on the effects of several
After having performed 40 carefully designed experiments we obtained interesting insights on the effects of several feature combinations which resulted in our WSI system CHERTOY.
feature combinations which resulted in our WSI system CHERTOY.
The system creates semantic related clusters from the given snippets (the text fragments get back from the search engine) for each pre-defined ambiguous topic.
The system creates semantic related clusters from the given snippets (the text fragments get back from the search engine) for each pre-defined ambiguous topic.
It makes the preprocessing of the input data, creates a language model using vector representations for each snippet with sense2vec and vector misture model (BOW representation with summarization for each snippet) and creates semantic clusters with the Mean Shift clustering algorithm.
It makes the preprocessing of the input data, creates a language model using vector representations for each snippet with sense2vec and vector misture model (BOW representation with summarization for each snippet) and creates semantic clusters with the Mean Shift clustering algorithm.
...
@@ -111,10 +110,7 @@ The folder experiments contains an implementation of the baseline and 40 differe
...
@@ -111,10 +110,7 @@ The folder experiments contains an implementation of the baseline and 40 differe
* lib
* lib
The folder contains code for preprocessing Wikipedia Dataset to train own sent2vec models for the experiments,
The folder contains code for preprocessing Wikipedia Dataset to train own sent2vec models for the experiments and a README file. Our preprocessed Wikipedia 2017 dataset and two self-trained models of the Wikipedia 2017 dataset, that we used in our experiments with sent2vec, are provided on /proj/toyota on the server of the Institut.
preprocessed Wikipedia 2017 Dataset,
two self-trained models of the Wikipedia 2017 Dataset, that we used in our experiments with sent2vec,
README file,
Other models that we used during our experiments can be found in sense2vec and sent2vec repositories.
Other models that we used during our experiments can be found in sense2vec and sent2vec repositories.