Skip to content
Snippets Groups Projects
Commit ef77038e authored by engel's avatar engel
Browse files

Add new readme info

parent 768f410b
No related branches found
No related tags found
No related merge requests found
# Lsem-RC in nominal compounds
# Lexical Semantic Relation Classification for nominal compounds (Lsem-RC-NC) 🤖
Ein System soll mit Substantiv Verbindung der Form NC = noun1 noun2 und Paraphrasen,
die die Relation zwischen w1 und w2 beschreiben, trainiert werden. Es soll nun überprüft
werden, ob semantische Beziehungen zwischen Komponenten von Substantivverbindung
gelernt wurden und reproduziert werden können. Dafür wird getestet, inwiefern die in
Paraphrasen maskierten Komponenten - die Verben - maschinell ergänzt werden können.
## Introduction 👋🏼
This repository is part of our project for the course ```Formale Semantik``` at the University Heidelberg. The project task can be summarized as the classification of lexical semantic relations between the components of nominal compounds. The Project Report offers a detailed insight into the project and its outcomes.
## Task 📝
A system is to be trained with a noun compound of the form NC = noun1 noun2 and paraphrases describing the relation between noun1 and noun2. It was now to be tested whether semantic relations between the two components of a noun compound, head and modifier, have been learned and can be reproduced. For this purpose, we tested to what extent the components masked in paraphrases - the verbs - can be completed by a machine and how well relations can be predicted for a nominal compound occurring in a sentence by a fine-tuned model.
## Prerequisites 🗂
In order to be able to run the code in this repository certain prerequisites need to be met. The needed packages and modules are listed in . The requirements can be installed using the following command:
```
pip install -r requirements.txt
```
## Structure of this Repository 🧶
| subdirectory | content | README
| ---- | ---- | ---- |
| data | contains all data needed for probing and fine-tuning | [README]()
| documents | contains the first plan for our [**Project Outline**]() and the final [**Project Report**](documents/Gruppe_9__NC-RC_-_Outline.pdf)
| fine_tuning | contains code to fine-tune models, the fine-tuned models, test results and evaluation| [README](fine_tuning/README.md)
| probing | contains code for probing and its evaluation | [README](probing/README.md)
## Authors & Contact 📮
Feel free to reach out to us! 💥 <br>
The ELSE-Team consists of
[**E**ike Burkhardt](mailto:eike.burkhardt@gmx.net), [Janine **L**üger](mailto:lueger@stud.uni-heidelberg.de),[**S**tella Wernicke](mailto:stella.wernicke@stud.uni-heidelberg.de) and [Katharina **E**ngel](mailto:a-katharina.engel@stud.uni-heidelberg.de).
\ No newline at end of file
......@@ -73,13 +73,13 @@ Visualized results for fine grained relations:
</div></p>
For each batch size a similar behavior can be seen. While the training loss is the highest after the first epoch of training it is reduced to under 0.1 in every scenario after the second epoch. The training loss gets even more reduced with more epochs, whereas the validation loss is quite high after one epoch, remains roughly on the same level only between the first and second epoch and increases after that. This is a sign of overfitting, which means that our trained model corresponds too closely to the train data. In our case the increase of the validation loss is not drastic but shows that two epochs are sufficient for training our model. <br>
The accuracy the model achieves stays more or less the same over all epochs and for every batch size. <br>
The accuracy the model achieves stays more or less the same over all epochs, learning rates and for every batch size. <br>
If the three plots are compared one can see that the results for the different learning rates are very close to another and they also vary with every execution of the training process. As a trade-off between economy (learning rate not too small) and performance (high accuracy) we decided on a **batch size of 32** and a **learning rate of 3e-05** that lead to the validation loss of around 1.89 and the accuracy of 0.66. <br>
The same goes for the coarse grained relations. A comparable behavior can be seen in the plot below.
<p><div align="center">
![](plots/model_sep_coarse_50_train_set_16.png)
![](plots/model_sep_coarse_50_train_set_32.png)
</div></p>
......
......@@ -96,11 +96,11 @@ complement | book title | A {n1} {n2} is a {n2} that {mask} the {n1} | character
containment | cigarette pack | A {n1} {n2} is a {n2} that {mask} of {n1}s | consists
loc_part_whole | hotel lobby | A {n1} {n2} is a {n2} that is {mask} in {n1} | located
objective | pipeline explosion | A {n1} {n2} is a {n2} that {mask} on {n1} | focusses
other | fund system |
other | fund system | ?? | ??
owner_emp_use | navy commander | A {n1} {n2} is a {n2} that {mask} by a {n1} | owned
purpose | alarm clock | A {n1} {n2} is a {n2} that is {mask} for {n1} | used
time | night work | A {n1} {n2} is a {n2} that {mask} place at {n1} | takes
topical | student database |
topical | student database | A {n1} {n2} is a {n2} that {mask} {n1} | regards
</div>
......@@ -116,39 +116,39 @@ Relation | Example | Template | Gold Verb
ADJ-LIKE_NOUN | mass destruction | A {n1} {n2} is a {n2} that is {mask} as a {n1} | considered
AMOUNT-OF | credit volume | A {n1} {n2} is a {n2} that {mask} {n1} | quantifies
CONTAIN | photo album | A {n1} {n2} is a {n2} that {mask} a {n1} | contains
CREATE-PROVIDE-GENERATE-SELL | drug office |
CREATE-PROVIDE-GENERATE-SELL | drug office | A {n1} {n2} is a {n2} that {mask} {n1} | creates
CREATOR-PROVIDER-CAUSE_OF | hate crime | A {n1} {n2} is a {n2} that is {mask} by {n1} | caused
EMPLOYER | government agent | A {n1} {n2} is a {n2} that is {mask} by the {n1} | employed
EQUATIVE | consultation process | A {n1} {n2} is a {n2} that {mask} for a {n1} | is
EXPERIENCER-OF-EXPERIENCE | consumer comfort | A {n1} {n2} is a {n2} that is {mask} by a {n1} | experienced
JUSTIFICATION | conspiracy trial |
JUSTIFICATION | conspiracy trial | A {n1} {n2} is a {n2} that is {mask} by a {n1} | justified
LEXICALIZED | pay phone | A {n1} {n2} is a {n2} that {mask} for {n1} | is
LOCATION | eye makeup | A {n1} {n2} is a {n2} that is {mask} in a {n1} | located
MEANS | hunger strike |
MEANS | hunger strike | A {n1} {n2} is a {n2} that {mask} {n1} | uses
MEASURE | mile track | A {n1} {n2} is a {n2} that {mask} over the {n1} | extends
MITIGATE&OPPOSE | crime unit | A {n1} {n2} is a {n2} that is used to {mask} the {n1} | mitigate
OBJECTIVE | poverty reduction |
OBJECTIVE | poverty reduction | ?? | ??
OBTAIN&ACCESS&SEEK | ownership right | A {n1} {n2} is a {n2} that has the function to {mask} a {n1} | achieve
ORGANIZE&SUPERVISE&AUTHORITY | government authority |
ORGANIZE&SUPERVISE&AUTHORITY | government authority | A {n1} {n2} is a {n2} that {mask} {n1} | supervises
OTHER | sports organization | A {n1} {n2} is a {n2} that is {mask} to {n1} | related
OWNER-USER | government property
PARTIAL_ATTRIBUTE_TRANSFER | mushroom lamp
OWNER-USER | government property | A {n1} {n2} is a {n2} that is {mask} by a {n1} | owned
PARTIAL_ATTRIBUTE_TRANSFER | mushroom lamp | A {n1} {n2} is a {n2} that {mask} a {n1} | resembles
PART&MEMBER_OF_COLLECTION&CONFIG&SERIES | citizen committee | A {n1} {n2} is a {n2} that {mask} of {n1}s | consists
PERFORM&ENGAGE_IN | fax machine
PERSONAL_NAME | Josh Gorbachev | -
PERSONAL_TITLE | Ms. Patterson | -
PERFORM&ENGAGE_IN | fax machine | A {n1} {n2} is a {n2} that {mask} a {n1} | performs
PERSONAL_NAME | Josh Gorbachev | -- | --
PERSONAL_TITLE | Ms. Patterson | -- | --
PURPOSE | court system | A {n1} {n2} is a {n2} that is {mask} for {n1} | used
RELATIONAL-NOUN-COMPLEMENT | eye color | A {n1} {n2} is a {n2} that is {mask} to a {n1} | related
SUBJECT | family violence |
SUBJECT | family violence | A {n1} {n2} is a {n2} that is {mask} by a {n1} | conducted
SUBSTANCE-MATERIAL-INGREDIENT | aluminum can | A {n1} {n2} is a {n2} that is {mask} of {n1} | made
TIME-OF1 | advance sale | A {n1} {n2} is a {n2} that {mask} during a {n1} | happens
TIME-OF2 | watermelon season |
TOPIC | soundtrack album |
TOPIC_OF_COGNITION&EMOTION | phone fanatic |
TIME-OF2 | watermelon season | A {n1} {n2} is a {n2} that {mask} the {n1} | dates
TOPIC | soundtrack album | A {n1} {n2} is a {n2} that {mask} with {n1} | deals
TOPIC_OF_COGNITION&EMOTION | phone fanatic | A {n1} {n2} is a {n2} that {mask} to the {n1} | relates
TOPIC_OF_EXPERT | technology expert | A {n1} {n2} is a {n2} that {mask} with {n1} | deals
USER_RECIPIENT | employee benefit | A {n1} {n2} is a {n2} that is {mask} for a {n1} | provided
VARIETY&GENUS_OF | grape variety | A {n1} {n2} is a {n2} that {mask} down {n1} | breaks
WHOLE+ATTRIBUTE&FEATURE& QUALITY_VALUE_IS_CHARACTERISTIC_OF | butter texture |
WHOLE+ATTRIBUTE&FEATURE& QUALITY_VALUE_IS_CHARACTERISTIC_OF | butter texture | A {n1} {n2} is a {n2} that {mask} {n1} | characterizes
WHOLE+PART_OR_MEMBER_OF | whale tongue | A {n1} {n2} is a {n2} that {mask} part of a {n1} | is
</div>
......
transformers
gensim
tensorflow
tqdm
seaborn
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment