Skip to content
Snippets Groups Projects
Commit 642789bc authored by Victor Zimmermann's avatar Victor Zimmermann
Browse files

Evaluate with this.

parent 414db2ce
No related branches found
No related tags found
No related merge requests found
Showing
with 34958 additions and 0 deletions
================================================================================
EVALUATION TOOL FOR SEMEVAL-2013 TASK #11:
Evaluating Word Sense Induction & Disambiguation within An End-User Application
http://www.cs.york.ac.uk/semeval-2013/task11/
================================================================================
The aim of this task is to provide a framework for the objective evaluation and
comparison of Word Sense Disambiguation and Induction algorithms in an end-user
application, namely Web Search Result Clustering.
Web Search Result Clustering is a task consisting of grouping into clusters
the snippet results returned by a search engine for an input query. Results
in a given cluster are assumed to be semantically related to each other and
each cluster is expected to represent a specific meaning of the input query.
A Word Sense Induction (WSI) system will be asked to identify the meaning of
the input query and cluster the snippets into semantically-related groups
according to their meanings. Instead, a Word Sense Disambiguation (WSD)
system will be requested to sense-tag the above snippets with the appropriate
senses of the input query and this, again, will implicitly result in a
clustering of snippets (i.e., one cluster per sense).
WSD and WSI systems will then be evaluated in an end-user application, i.e.,
according to their ability to diversify the search results for the input query.
This evaluation scheme, previously proposed for WSI by Navigli and Crisafulli
(2010) and Di Marco and Navigli (2013), is extended here to WSD and WSI systems
and is aimed at overcoming the limitations of in vitro evaluations. In fact,
the quality of the output clusters will be assessed in terms of their ability
to diversify the snippets across the query meanings.
No training data will be provided.
---------------------------------------
IMPORTANT CONSTRAINTS ABOUT THE SYSTEMS
---------------------------------------
Since both WSI and WSD systems can participate, we will perform two separate
comparative evaluations: one involving sense induction systems, the other
involving disambiguation systems.
The following constraints apply:
- WSI systems can ONLY use RAW corpora for inducing the senses of a query;
no sense annotated corpus, existing sense inventory or lexical resource (such
as WordNet or BabelNet) is allowed.
- WSD systems can use ANY kind of information, as declared by the participants.
For instance, supervised systems will use training data, knowledge-rich
systems will use resources like WordNet, BabelNet, etc.
Since search results come with URL, title and text snippet, systems can exploit
any of this information for performing the induction/disambiguation task.
If systems want to exploit the URL for retrieving additional information, they
will have to declare this at submission time.
All the information about the submitted systems (such as corpora, resources, URL,
web page, etc. used by the system) will be reported in the task paper.
------------------
ABOUT THIS PACKAGE
------------------
This package contains the evaluation tool for the Semeval-2013 task #11.
The tool performs:
1) an intrinsic evaluation in terms of Adjusted RandIndex, Jaccard Index and F1 measure
2) an extrinsic evaluation in terms of the degree of Web search snippet diversification
calculated as Subtopic Recall@K and Precision@r
The above measures are calculated as explained in the following reference paper:
A. Di Marco, R. Navigli. Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction. Computational Linguistics, 39(4), MIT Press, 2013.
--------
CONTENTS
--------
This package contains the following main components:
README.txt # this file
datasets # folder containing the MORESQUE development query dataset together with its gold standard assignments
# NOTE: this is NOT the test dataset (the test dataset will, instead, be used for evaluating the participating systems)
clustering_example # folder containing an example of clustering output produced by a WSI algorithm
config # configuration folder
WSI-Evaluator.jar # WSI evaluation system
------------
INSTALLATION
------------
# unpack the archive
tar xvfz semeval-2013_task11_evaluator.tar.gz
---------------------------------
USING WSI-Evaluator.jar
---------------------------------
The evaluator needs as input:
1) a query dataset together with gold standard assignments, e.g. datasets/MORESQUE
(obtained from http://lcl.uniroma1.it/moresque/, and already included in the package)
Note that the MORESQUE dataset is provided for development purposes, but
a new dataset of ambiguous queries will be provided before the start of the competition.
2) the output of your Word Sense Induction/Disambiguation algorithm applied to the snippets returned
for each query. For an example, refer to the clustering_example folder.
To run the evaluator do the following:
# run
java -jar WSI-Evaluator.jar <query dataset folder> <WSI algorithm output file>
For instance, the following command evaluates the clustering example provided
in this package on the MORESQUE development dataset:
# output format from SemEval-2010 Word Sense Induction & Disambiguation Task #14
java -jar WSI-Evaluator.jar datasets/MORESQUE clustering_example/MORESQUE/output_2010.txt
# output format of SemEval-2013 Evaluating Word Sense Induction & Disambiguation within An End-User Application Task #11
java -jar WSI-Evaluator.jar datasets/MORESQUE clustering_example/MORESQUE/output_2013.txt
Note that two possible formats are allowed for the clustering output by your
Word Sense Induction/Disambiguation algorithm (recognized automatically by the evaluator):
- The same format used in the SemEval-2010 Word Sense Induction & Disambiguation Task #14
(http://www.cs.york.ac.uk/semeval2010_WSI/)
- A new format structured as follows. The first line is fixed:
subTopicID resultID
All other lines are pairs of the following kind:
<queryID.subtopicID> <queryID.snippetID>
- queryID is the ID for a given query, coming from the file datasets/<DATASET>/topics.txt
- queryID.subtopicID is a system ID for a certain cluster output by the system
- queryID.snippetID comes from the file datasets/<DATASET>/results.txt.
For example 80.1 denotes the air sense of liquid_air: Liquid air - Wikipedia, the free encyclopedia For the automobile, see Liquid Air. Liquid air is air that has been cooled to very low ... Liquid air can absorb heat rapidly and revert to its gaseous state. ...
The results.txt file contains the results returned by the search engine for the input queries. For instance, the following line in MORESQUE/results.txt:
80.1 http://en.wikipedia.org/wiki/Liquid_air Liquid air - Wikipedia, the free encyclopedia For the automobile, see Liquid Air. Liquid air is air that has been cooled to very low ... Liquid air can absorb heat rapidly and revert to its gaseous state. ...
is the first result returned for query 80 (liquid_air) and contains:
- URL (http://en.wikipedia.org/wiki/Liquid_air);
- title (Liquid air - Wikipedia, the free encyclopedia)
- snippet text (For the automobile, see Liquid Air. Liquid air is air etc...)
Note that a hard clustering is assumed, i.e. each snippet has to belong to at most one cluster.
------------------
EVALUATOR'S OUTPUT
------------------
As a result of running the evaluator, several operations are reported in the wsi_eval.log file.
The final evaluation output, with all the evaluation measures, can be found in result.log.
----------
CONCLUSION
----------
Good Luck!
-------
AUTHORS
-------
Roberto Navigli, Sapienza University of Rome
(navigli@di.uniroma1.it)
Daniele Vannella, Sapienza University of Rome
(vannella@di.uniroma1.it)
-------
CONTACT
-------
Please feel free to get in touch with us for any question or problem you
may have using the following Google group:
http://groups.google.com/group/semeval-2013-wsi-in-application?hl=en
---------------
ACKNOWLEDGMENTS
---------------
The authors gratefully acknowledge the support of the MultiJEDI ERC Starting Grant No. 259234
(http://lcl.uniroma1.it/multijedi) and the CASPUR High-Performance Computing Grant 475/2011 and 117/2012.
File added
This diff is collapsed.
This diff is collapsed.
############################
# WSI EVALUATION PROPERTIES
############################
evaluation.createBucketCluster = TRUE
##############
# EVALUATION #
##############
# K for Subtopic-Recall@K
evaluation.subtopicRecall.K = 100
# The minimum value of recall from which to begin calculating the precision (expressed in percentage)
evaluation.subtopicPrecision.r = 0.4
==========================================================
MORESQUE Dataset
==========================================================
MORESQUE (More Sense-tagged QUEries) is a dataset designed for evaluation of
subtopic information retrieval. For details, please refer to our EMNLP-2010
paper:
Roberto Navigli & Giuseppe Crisafulli. Inducing Word Senses to Improve Web
Search Result Clustering. In Proceedings of the 2010 Conference on
Empirical Methods in Natural Language Processing (EMNLP 2010), MIT Stata
Center, Massachusetts, USA, 9-11 October 2010, pp. 116-126.
CONTENTS
This package contains version 1.0 of MORESQUE which consists of 114 topics,
each with a set of subtopics and a list of 100 top-ranking documents.
The topics were selected from the list of ambiguous Wikipedia entries;
i.e., those with "disambiguation" in the title (see
http://en.wikipedia.org/wiki/Wikipedia:Links_to_%28disambiguation%29_pages)
Because the dataset has been developed as a complement for AMBIENT
(http://credo.fub.it/ambient/), it includes queries of length ranged
between 2 and 4 words and numbered from 45.
The 100 documents associated with each topic were collected from the Yahoo!
search engine as of early 2010, and they were subsequently annotated with
subtopic relevance judgments.
The MORESQUE dataset consists of four files where each row is terminated by
Linefeed (ASCII 10) and fields are separated by Tab (ASCII 9). The four
files are described below:
==================== topics.txt ========================
It contains topic ID and description
ID description
45 the_block
46 stephen_king
47 soul_food
.........
==========================================================
==================== subTopics.txt ========================
It contains subtopic ID (formed by topic ID and subtopic number) and
description;
ID description
45.1 The Block (Sydney), the first Aboriginal land handback
45.2 The Block at Orange, an open-air shopping and entertainment mall located in Southern California
45.3 The Block (Baltimore), an adult-entertainment area
.........
==========================================================
==================== results.txt ========================
It contains result ID (formed by topic ID and search engine rank of
result), URL, title, and snippet
ID url title snippet
45.1 http://www.blockatorange.com/ The Block at Orange
45.2 http://en.wikipedia.org/wiki/The_Block_(album) The Block (album) - Wikipedia, the free encyclopedia The Block was released on September 2, 2008 and debuted at number one on the ... New Kids on the Block · Hangin' Tough · Merry, Merry Christmas · Step by Step ...
45.3 http://www.blockattahoe.com/ The Block at Tahoe Hotel designed by and for snowboarders. Photo gallery of rooms, details of special events.
.........
==========================================================
==================== STRel.txt ========================
It contains subtopic ID (formed by topic ID and subtopic number) and result
ID (formed by topic ID and search engine rank of result)
subTopicID resultID
45.2 45.45
45.2 45.24
45.2 45.5
.........
==========================================================
COPYRIGHT AND LICENSE
This work is licensed under the Creative Commons
Attribution-Noncommercial-Share Alike 3.0 Unported License. To view a
copy of this license, visit
http://creativecommons.org/licenses/by-nc-sa/3.0/
or send a letter to
Creative Commons
171 Second Street, Suite 300
San Francisco, California
94105, USA
MORE INFO
If you have any questions or comments, please contact Roberto Navigli
<navigli@di.uniroma1.it> or Antonio Di Marco <dimarco@di.uniroma1.it>.
CHANGES
1.0 First release
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
id description
45 the_block
46 stephen_king
47 soul_food
48 cool_water
49 space_raiders
50 gay_bar
51 la_mancha
52 robert_watts
53 dive_bomber
54 manor_house
55 indy_500
56 new_england
57 heavy_rotation
58 jump_cut
59 dial_tone
60 fight_night
61 agent_blue
62 mount_huxley
63 junk_mail
64 sean_fallon
65 double_negative
66 richard_tracey
67 civil_war
68 volcanic_rock
69 space_opera
70 family_reunion
71 radioactive_man
72 tai_chi
73 heart_attack
74 gorky_park
75 mark_forster
76 bus_driver
77 bat_boy
78 james_bond
79 special_edition
80 liquid_air
81 little_brother
82 magic_lantern
83 alpha_dog
84 micro_chip
85 middle_ages
86 black_hole
87 civil_disobedience
88 american_beauty
89 iron_butterly
90 neutron_star
91 fata_morgana
92 death_by_chocolate
93 division_by_zero
94 rain_or_shine
95 before_the_storm
96 stranger_in_town
97 seduced_and_abandoned
98 ten_little_indians
99 fly_with_me
100 train_of_thought
101 the_blue_bird
102 catch_a_fire
103 amarillo_by_morning
104 the_whole_truth
105 ray_of_light
106 hall_of_justice
107 all_that_jazz
108 across_the_universe
109 lost_in_space
110 fast_food_nation
111 cats_and_dogs
112 heart_to_heart
113 speed_of_light
114 radius_of_curvature
115 food_for_thought
116 burden_of_proof
117 field_of_fire
118 crown_of_thorns
119 charles_de_gaulle
120 tower_of_babel
121 cassius_marcellus_clay
122 bed_of_roses
123 ace_of_spades
124 reign_of_terror
125 tree_of_knowledge
126 private_practice
127 soldier_of_fortune
128 citizen_of_the_world
129 the_da_vinci_code
130 freedom_of_the_seas
131 freedom_of_the_press
132 queen_of_the_night
133 another_day_in_paradise
134 in_the_name_of_love
135 bad_to_the_bone
136 the_glass_bead_game
137 beer_for_my_horses
138 attack_of_the_mutant
139 the_colour_of_magic
140 hair_of_the_dog
141 the_marquise_of_o
142 the_game_of_life
143 rookie_of_the_year
144 storm_in_a_teacup
145 lake_of_the_woods
146 sign_of_the_cross
147 music_of_the_spheres
148 trip_the_light_fantastic
149 twilight_of_the_gods
150 survival_of_the_fittest
151 the_edge_of_heaven
152 down_in_the_valley
153 look_back_in_anger
154 bullet_in_the_head
155 a_bell_for_adano
156 peter_and_the_wolf
157 top_of_the_pops
158 the_art_of_seduction
================================================================================= SEMEVAL-2013 TASK #18: Evaluating Word Sense Induction & Disambiguation within An End-User Application ================================================================================= CONTENTS This package contains the trial dataset of the SemEval-2013 task #18, consisting of 4 topics (i.e., queries), each with a set of subtopics and a list of 100 ranked documents. This trial dataset is a subset of MORESQUE (More Sense-tagged QUEries), a dataset designed for evaluating subtopic information retrieval. For details, please refer to our EMNLP-2010 paper: Roberto Navigli & Giuseppe Crisafulli. Inducing Word Senses to Improve Web Search Result Clustering. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), MIT Stata Center, Massachusetts, USA, 9-11 October 2010, pp. 116-126. The topics were selected from the list of ambiguous Wikipedia entries, i.e., those with "(disambiguation)" in the title (see http://en.wikipedia.org/wiki/Wikipedia:Links_to_%28disambiguation%29_pages) The format of the trial dataset follows that of AMBIENT (http://credo.fub.it/ambient/), a dataset of mainly 1-word and 2-word queries. The 100 documents associated with each topic were collected from the Yahoo! search engine as of early 2010, and they were subsequently annotated with subtopic relevance judgments. The trial dataset consists of four files where each row is terminated by Linefeed (ASCII 10) and fields are separated by Tab (ASCII 9). The four files are described below: ==================== topics.txt ======================== Contains topic ID and description ID description 45 the_block 46 stephen_king 47 soul_food ......... ========================================================== ==================== subTopics.txt ======================== Contains subtopic ID (formed by topic ID and subtopic number) and description; ID description 45.1 The Block (Sydney), the first Aboriginal land handback 45.2 The Block at Orange, an open-air shopping and entertainment mall located in Southern California 45.3 The Block (Baltimore), an adult-entertainment area ......... ========================================================== ==================== results.txt ======================== Contains result ID (formed by topic ID and search engine rank of result), url, title, and snippet ID url title snippet 45.1 http://www.blockatorange.com/ The Block at Orange 45.2 http://en.wikipedia.org/wiki/The_Block_(album) The Block (album) - Wikipedia, the free encyclopedia The Block was released on September 2, 2008 and debuted at number one on the ... New Kids on the Block · Hangin' Tough · Merry, Merry Christmas · Step by Step ... 45.3 http://www.blockattahoe.com/ The Block at Tahoe Hotel designed by and for snowboarders. Photo gallery of rooms, details of special events. ......... ========================================================== ==================== STRel.txt ======================== Contains subtopic ID (formed by topic ID and subtopic number) and result ID (formed by topic ID and search engine rank of result) subTopicID resultID 45.2 45.45 45.2 45.24 45.2 45.5 ......... ========================================================== COPYRIGHT AND LICENSE This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons 171 Second Street, Suite 300 San Francisco, California 94105, USA MORE INFO If you have any questions or comments, please contact Antonio Di Marco <dimarco@di.uniroma1.it> or Roberto Navigli <navigli@di.uniroma1.it>. CHANGES 1.0 First release
\ No newline at end of file
subTopicID resultID
45.2 45.45
45.2 45.24
45.2 45.5
45.2 45.11
45.2 45.1
45.5 45.28
45.5 45.25
45.5 45.56
45.6 45.64
45.6 45.2
45.6 45.94
45.6 45.82
45.6 45.43
45.6 45.8
45.6 45.97
45.6 45.55
45.6 45.79
45.6 45.12
45.6 45.21
45.6 45.29
45.6 45.91
45.6 45.66
45.6 45.100
45.6 45.76
45.6 45.53
45.6 45.75
45.6 45.67
46.1 46.40
46.1 46.3
46.1 46.52
46.1 46.77
46.1 46.13
46.1 46.4
46.1 46.7
46.1 46.6
46.1 46.37
46.1 46.42
46.1 46.85
46.1 46.41
46.1 46.19
46.1 46.84
46.1 46.30
46.1 46.62
46.1 46.97
46.1 46.39
46.1 46.83
46.1 46.100
46.1 46.86
46.1 46.2
46.1 46.94
46.1 46.49
46.1 46.65
46.1 46.38
46.1 46.96
46.1 46.14
46.1 46.75
46.1 46.53
46.1 46.8
46.1 46.74
46.1 46.78
46.1 46.66
46.1 46.35
46.1 46.76
46.1 46.67
46.1 46.18
46.1 46.87
46.1 46.26
46.1 46.29
46.1 46.63
46.1 46.92
46.1 46.56
46.1 46.64
46.1 46.25
46.1 46.81
46.1 46.21
46.1 46.57
46.1 46.10
46.1 46.60
46.1 46.1
46.1 46.27
46.1 46.28
46.1 46.93
46.1 46.90
46.1 46.22
46.1 46.80
46.1 46.70
46.1 46.71
46.1 46.16
46.1 46.82
46.1 46.17
46.1 46.34
46.1 46.91
46.1 46.59
46.1 46.69
46.1 46.33
46.1 46.45
46.1 46.58
46.1 46.48
46.1 46.31
46.1 46.72
46.1 46.11
46.1 46.15
46.1 46.36
46.1 46.73
46.1 46.47
46.1 46.46
46.1 46.88
46.1 46.68
46.1 46.44
46.1 46.98
46.1 46.50
46.1 46.32
46.1 46.43
46.1 46.20
46.1 46.79
46.1 46.55
46.1 46.23
46.1 46.24
46.1 46.12
46.1 46.99
46.1 46.89
46.1 46.5
46.1 46.51
46.1 46.54
46.1 46.9
46.1 46.95
47.1 47.52
47.1 47.83
47.1 47.32
47.1 47.20
47.1 47.50
47.1 47.37
47.1 47.87
47.1 47.35
47.1 47.100
47.1 47.56
47.1 47.29
47.1 47.36
47.1 47.99
47.1 47.80
47.1 47.51
47.1 47.12
47.1 47.46
47.1 47.54
47.1 47.70
47.1 47.96
47.1 47.53
47.1 47.30
47.1 47.63
47.1 47.2
47.1 47.38
47.1 47.8
47.1 47.43
47.1 47.14
47.1 47.27
47.1 47.67
47.1 47.39
47.1 47.17
47.1 47.90
47.1 47.33
47.1 47.86
47.1 47.26
47.1 47.74
47.1 47.75
47.1 47.66
47.1 47.40
47.1 47.88
47.1 47.11
47.1 47.76
47.1 47.42
47.1 47.55
47.1 47.98
47.1 47.64
47.1 47.49
47.1 47.21
47.1 47.4
47.1 47.45
47.1 47.65
47.1 47.79
47.1 47.60
47.1 47.94
47.1 47.71
47.1 47.44
47.1 47.15
47.1 47.5
47.1 47.7
47.1 47.10
47.1 47.62
47.1 47.78
47.1 47.41
47.1 47.58
47.1 47.85
47.1 47.48
47.1 47.28
47.1 47.72
47.1 47.1
47.1 47.93
47.1 47.57
47.1 47.82
47.1 47.31
47.1 47.91
47.1 47.95
47.1 47.69
47.1 47.84
47.2 47.77
47.2 47.19
47.2 47.6
47.2 47.24
47.2 47.34
47.2 47.3
47.2 47.55
47.2 47.34
47.3 47.13
47.3 47.9
47.3 47.25
47.3 47.23
47.3 47.97
47.3 47.22
47.5 47.73
47.5 47.18
48.1 48.8
48.1 48.40
48.2 48.5
48.2 48.89
48.2 48.73
48.2 48.31
48.2 48.42
48.2 48.23
48.2 48.48
48.2 48.43
48.2 48.16
48.2 48.69
48.2 48.51
48.2 48.2
48.2 48.46
48.2 48.35
48.2 48.21
48.2 48.47
48.2 48.14
48.2 48.63
48.2 48.28
48.2 48.27
48.2 48.83
48.3 48.4
48.3 48.34
This diff is collapsed.
ID description
45.1 The Block (Sydney), the first Aboriginal land handback
45.2 The Block at Orange, an open-air shopping and entertainment mall located in Southern California
45.3 The Block (Baltimore), an adult-entertainment area
45.4 The Block (Philippines), one of two annex buildings in SM City North EDSA located in North Avenue, Quezon City
45.5 A set of TV series of similar format:
45.6 The Block (album), the fifth studio album from the New Kids On The Block
45.7 The Block (American Football) - Jerry Kramer's block in the 1967 NFL Championship Game that led to the winning touchdown
45.8 Bloc Québécois, a Canadian political party often referred to as "The Bloc"
45.9 The wooden block used in the beheading of a condemned person with an axe
46.1 Stephen King is an American author.
46.2 Steve King (radio), American radio personality
46.3 Stephen King (conservationist), New Zealand conservationist
46.4 Steve King, U.S. Representative from Iowa
46.5 Steve King (Colorado legislator), Colorado state representative
46.6 Stephen King (soccer), American Major League Soccer player
46.7 Steve King (American football), American football player
46.8 Steve King (football manager), English football manager
46.9 Steven King (footballer), Australian football player
46.10 Steven King (ice hockey), American ice hockey player, born 1969
46.11 Steve King (ice hockey), Canadian ice hockey player, born 1948
46.12 Stephen King (paedophile), British convict
47.1 Soul food is a type of cuisine.
47.2 Soul Food (film)
47.3 Soul Food (TV series)
47.4 Soul Food (Def Jef album)
47.5 Soul Food (Goodie Mob album),
47.6 Soul Food (The Oblivians album)
48.1 "Cool Water" is a song written in 1936 by Bob Nolan
48.2 Cool Water is a famous perfume introduced in 1988 by Davidoff
48.3 Cool Water is the 1994 album released by Caravan.
id description
45 the_block
46 stephen_king
47 soul_food
48 cool_water
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment