Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Absinth - A Small World of Semantic Similarity
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Container Registry
Model registry
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Victor Zimmermann
Absinth - A Small World of Semantic Similarity
Commits
6ba4aad4
Commit
6ba4aad4
authored
7 years ago
by
Victor Zimmermann
Browse files
Options
Downloads
Patches
Plain Diff
Spelling and lemmatisaton.
parent
87812127
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
src/absinth.py
+13
-13
13 additions, 13 deletions
src/absinth.py
with
13 additions
and
13 deletions
src/absinth.py
+
13
−
13
View file @
6ba4aad4
...
...
@@ -168,7 +168,7 @@ def process_file(context_list, target_string, node_freq_dict, edge_freq_dict):
# Add only tokens with allowed tags to nodes.
elif
token
.
tag_
in
allowed_tag_list
:
token_set
.
add
(
token
.
text
)
token_set
.
add
(
token
.
lemma_
)
context_size
=
len
(
token_set
)
...
...
@@ -209,20 +209,20 @@ def build_graph(node_freq_dict, edge_freq_dict):
tokens within every context the target occurs in.
Returns:
cooccurence_graph: Filtered undirected dice weighted small word
cooccurence graph for a given target entity.
cooccur
r
ence_graph: Filtered undirected dice weighted small word
cooccur
r
ence graph for a given target entity.
"""
min_node_freq
=
config
.
min_node_freq
min_edge_freq
=
config
.
min_edge_freq
max_weight
=
config
.
max_weight
cooccurence_graph
=
nx
.
Graph
()
cooccur
r
ence_graph
=
nx
.
Graph
()
for
node
,
frequency
in
node_freq_dict
.
items
():
if
frequency
>=
min_node_freq
:
cooccurence_graph
.
add_node
(
node
)
cooccur
r
ence_graph
.
add_node
(
node
)
for
node_tuple
,
frequency
in
edge_freq_dict
.
items
():
...
...
@@ -230,11 +230,11 @@ def build_graph(node_freq_dict, edge_freq_dict):
continue
elif
node_tuple
[
0
]
not
in
cooccurence_graph
.
nodes
:
elif
node_tuple
[
0
]
not
in
cooccur
r
ence_graph
.
nodes
:
continue
elif
node_tuple
[
1
]
not
in
cooccurence_graph
.
nodes
:
elif
node_tuple
[
1
]
not
in
cooccur
r
ence_graph
.
nodes
:
continue
...
...
@@ -247,25 +247,25 @@ def build_graph(node_freq_dict, edge_freq_dict):
prob_0
=
cooccurrence_frequency
/
node0_frequency
prob_1
=
cooccurrence_frequency
/
node1_frequency
#
best_weight = 1 - max(prob_0, prob_1)
dice_weight
=
1
-
((
prob_0
+
prob_1
)
/
2
)
best_weight
=
1
-
max
(
prob_0
,
prob_1
)
#
dice_weight = 1 - ((prob_0 + prob_1) / 2)
if
dice
_weight
<=
max_weight
:
if
best
_weight
<=
max_weight
:
cooccurence_graph
.
add_edge
(
*
node_tuple
,
weight
=
dice
_weight
)
cooccur
r
ence_graph
.
add_edge
(
*
node_tuple
,
weight
=
best
_weight
)
else
:
pass
return
cooccurence_graph
return
cooccur
r
ence_graph
def
root_hubs
(
graph
,
edge_freq_dict
):
"""
Identifies senses (root hubs) by choosing nodes with high degrees
Selects root hubs according to the algorithm in Véronis (2004). Nodes with
high degree and neighbors with low weights (high cooccurence) are chosen
high degree and neighbors with low weights (high cooccur
r
ence) are chosen
until there are no more viable candidates. A root hub candidate is every
node that is not already a hub and is not a neighbor of one.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment