diff --git a/bin/chertoy.py b/bin/chertoy.py index 81f0f8610c16af62460f8a542ce568af51df069b..70e9436d232aebc565a77be00e2b0eb7672133b1 100644 --- a/bin/chertoy.py +++ b/bin/chertoy.py @@ -11,9 +11,11 @@ ------------------- DESCRIPTION ------------------- -The pipeline system performs the 17th variant of the system for WSI (word sense induction) task (the Task 11 at SemEval 2013), which showed the best performance on the trial data. +The pipeline system performs the 17th variant of the system for WSI (word sense induction) task (the Task 11 at SemEval 2013), +which showed the best performance on the trial data. -The system creates semantic related clusters from the given snippets (the text fragments we get back from the search engine) for each pre-defined ambigue topic. +The system creates semantic related clusters from the given snippets (the text fragments we get back from the search engine) +for each pre-defined ambigue topic. ------------------- METHODS ------------------- @@ -22,7 +24,8 @@ For the WSI purposes it uses the following methods: - For pre-rpocessing: tokenization + remove punctuation - Language model: sense2vec (paper: https://arxiv.org/abs/1511.06388, code: https://github.com/explosion/sense2vec) - Compositional semantics: vector mixture model (BOW (bag-of-words) representation with summarization for each snippet) -- Clustering: Mean Shift clustering with sklearn.cluster (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html#sklearn.cluster.MeanShift) with default parameters. +- Clustering: Mean Shift clustering with sklearn.cluster +(http://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html#sklearn.cluster.MeanShift) with default parameters. """ import sys