Add new readme info

ef77038e · engel · 768f410b · ef77038e · ef77038e · ef77038e
Commit ef77038e authored 3 years ago by engel
--- a/README.md
+++ b/README.md
-# Lsem-RC in nominal compounds
+# Lexical Semantic Relation Classification for nominal compounds (Lsem-RC-NC) 🤖

-Ein System soll mit Substantiv Verbindung der Form NC = noun1 noun2 und Paraphrasen,
-die die Relation zwischen w1 und w2 beschreiben, trainiert werden. Es soll nun überprüft
-werden, ob semantische Beziehungen zwischen Komponenten von Substantivverbindung
-gelernt wurden und reproduziert werden können. Dafür wird getestet, inwiefern die in
-Paraphrasen maskierten Komponenten - die Verben - maschinell ergänzt werden können.
+
+## Introduction 👋🏼
+This repository is part of our project for the course ```Formale Semantik``` at the University Heidelberg. The project task can be summarized as the classification of lexical semantic relations between the components of nominal compounds. The Project Report offers a detailed insight into the project and its outcomes.
+## Task 📝
+A system is to be trained with a noun compound of the form NC = noun1 noun2 and paraphrases describing the relation between noun1 and noun2. It was now to be tested whether semantic relations between the two components of a noun compound, head and modifier, have been learned and can be reproduced. For this purpose, we tested to what extent the components masked in paraphrases - the verbs - can be completed by a machine and how well relations can be predicted for a nominal compound occurring in a sentence by a fine-tuned model.
+## Prerequisites 🗂
+In order to be able to run the code in this repository certain prerequisites need to be met. The needed packages and modules are listed in  . The requirements can be installed using the following command:
+```
+pip install -r requirements.txt
+```
+## Structure of this Repository 🧶
+| subdirectory | content | README
+| ---- | ---- | ---- |
+| data | contains all data needed for probing and fine-tuning | [README]()
+| documents | contains the first plan for our [**Project Outline**]() and the final [**Project Report**](documents/Gruppe_9__NC-RC_-_Outline.pdf)
+| fine_tuning | contains code to fine-tune models, the fine-tuned models, test results and evaluation| [README](fine_tuning/README.md)
+| probing | contains code for probing and its evaluation | [README](probing/README.md)
+
+
+
+## Authors & Contact 📮
+Feel free to reach out to us! 💥 <br>
+The ELSE-Team consists of 
+[**E**ike Burkhardt](mailto:eike.burkhardt@gmx.net), [Janine **L**üger](mailto:lueger@stud.uni-heidelberg.de),[**S**tella Wernicke](mailto:stella.wernicke@stud.uni-heidelberg.de) and [Katharina **E**ngel](mailto:a-katharina.engel@stud.uni-heidelberg.de).
\ No newline at end of file
--- a/Gruppe_9__NC-RC_-_Outline.pdf
+++ b/Gruppe_9__NC-RC_-_Outline.pdf
--- a/fine_tuning/README.md
+++ b/fine_tuning/README.md
@@ -73,13 +73,13 @@ Visualized results for fine grained relations:
 </div></p>

 For each batch size a similar behavior can be seen. While the training loss is the highest after the first epoch of training it is reduced to under 0.1 in every scenario after the second epoch. The training loss gets even more reduced with more epochs, whereas the validation loss is quite high after one epoch, remains roughly on the same level only between the first and second epoch and increases after that. This is a sign of overfitting, which means that our trained model corresponds too closely to the train data. In our case the increase of the validation loss is not drastic but shows that two epochs are sufficient for training our model. <br>
-The accuracy the model achieves stays more or less the same over all epochs and for every batch size. <br>
+The accuracy the model achieves stays more or less the same over all epochs, learning rates and for every batch size. <br>
 If the three plots are compared one can see that the results for the different learning rates are very close to another and they also vary with every execution of the training process. As a trade-off between economy (learning rate not too small) and performance (high accuracy) we decided on a **batch size of 32** and a **learning rate of 3e-05** that lead to the validation loss of around 1.89 and the accuracy of 0.66. <br>
 The same goes for the coarse grained relations. A comparable behavior can be seen in the plot below.

 <p><div align="center">

-![](plots/model_sep_coarse_50_train_set_16.png)
+![](plots/model_sep_coarse_50_train_set_32.png)

 </div></p>


--- a/probing/README.md
+++ b/probing/README.md
@@ -96,11 +96,11 @@ complement | book title | A {n1} {n2} is a {n2} that {mask} the {n1} | character
 containment | cigarette pack | A {n1} {n2} is a {n2} that {mask} of {n1}s | consists
 loc_part_whole | hotel lobby | A {n1} {n2} is a {n2} that is {mask} in {n1} | located
 objective | pipeline explosion | A {n1} {n2} is a {n2} that {mask} on {n1} | focusses
-other | fund system |
+other | fund system | ?? | ??
 owner_emp_use | navy commander | A {n1} {n2} is a {n2} that {mask} by a {n1} | owned
 purpose | alarm clock | A {n1} {n2} is a {n2} that is {mask} for {n1} | used
 time | night work | A {n1} {n2} is a {n2} that {mask} place at {n1} | takes
-topical | student database |
+topical | student database | A {n1} {n2} is a {n2} that {mask} {n1} | regards

 </div>

@@ -116,39 +116,39 @@ Relation | Example | Template | Gold Verb
 ADJ-LIKE_NOUN | mass destruction | A {n1} {n2} is a {n2} that is {mask} as a {n1} | considered
 AMOUNT-OF | credit volume | A {n1} {n2} is a {n2} that {mask} {n1} | quantifies
 CONTAIN | photo album | A {n1} {n2} is a {n2} that {mask} a {n1} | contains
-CREATE-PROVIDE-GENERATE-SELL | drug office |
+CREATE-PROVIDE-GENERATE-SELL | drug office | A {n1} {n2} is a {n2} that {mask} {n1} | creates
 CREATOR-PROVIDER-CAUSE_OF | hate crime | A {n1} {n2} is a {n2} that is {mask} by {n1} | caused
 EMPLOYER | government agent | A {n1} {n2} is a {n2} that is {mask} by the {n1} | employed
 EQUATIVE | consultation process | A {n1} {n2} is a {n2} that {mask} for a {n1} | is
 EXPERIENCER-OF-EXPERIENCE | consumer comfort | A {n1} {n2} is a {n2} that is {mask} by a {n1} | experienced
-JUSTIFICATION | conspiracy trial | 
+JUSTIFICATION | conspiracy trial | A {n1} {n2} is a {n2} that is {mask} by a {n1} | justified
 LEXICALIZED | pay phone | A {n1} {n2} is a {n2} that {mask} for {n1} | is
 LOCATION | eye makeup | A {n1} {n2} is a {n2} that is {mask} in a {n1} | located
-MEANS | hunger strike | 
+MEANS | hunger strike | A {n1} {n2} is a {n2} that {mask} {n1} | uses
 MEASURE | mile track | A {n1} {n2} is a {n2} that {mask} over the {n1} | extends
 MITIGATE&OPPOSE | crime unit | A {n1} {n2} is a {n2} that is used to {mask} the {n1} | mitigate
-OBJECTIVE | poverty reduction | 
+OBJECTIVE | poverty reduction | ?? | ??
 OBTAIN&ACCESS&SEEK | ownership right | A {n1} {n2} is a {n2} that has the function to {mask} a {n1} | achieve
-ORGANIZE&SUPERVISE&AUTHORITY | government authority | 
+ORGANIZE&SUPERVISE&AUTHORITY | government authority | A {n1} {n2} is a {n2} that {mask} {n1} | supervises
 OTHER | sports organization | A {n1} {n2} is a {n2} that is {mask} to {n1} | related
-OWNER-USER | government property
-PARTIAL_ATTRIBUTE_TRANSFER | mushroom lamp
+OWNER-USER | government property | A {n1} {n2} is a {n2} that is {mask} by a {n1} | owned
+PARTIAL_ATTRIBUTE_TRANSFER | mushroom lamp | A {n1} {n2} is a {n2} that {mask} a {n1} | resembles
 PART&MEMBER_OF_COLLECTION&CONFIG&SERIES | citizen committee | A {n1} {n2} is a {n2} that {mask} of {n1}s | consists
-PERFORM&ENGAGE_IN | fax machine
-PERSONAL_NAME | Josh Gorbachev | -
-PERSONAL_TITLE | Ms. Patterson | -
+PERFORM&ENGAGE_IN | fax machine | A {n1} {n2} is a {n2} that {mask} a {n1} | performs
+PERSONAL_NAME | Josh Gorbachev | -- | --
+PERSONAL_TITLE | Ms. Patterson | -- | --
 PURPOSE | court system | A {n1} {n2} is a {n2} that is {mask} for {n1} | used
 RELATIONAL-NOUN-COMPLEMENT | eye color | A {n1} {n2} is a {n2} that is {mask} to a {n1} | related
-SUBJECT | family violence | 
+SUBJECT | family violence | A {n1} {n2} is a {n2} that is {mask} by a {n1} | conducted
 SUBSTANCE-MATERIAL-INGREDIENT | aluminum can | A {n1} {n2} is a {n2} that is {mask} of {n1} | made
 TIME-OF1 | advance sale | A {n1} {n2} is a {n2} that {mask} during a {n1} | happens
-TIME-OF2 | watermelon season | 
-TOPIC | soundtrack album |
-TOPIC_OF_COGNITION&EMOTION | phone fanatic |
+TIME-OF2 | watermelon season | A {n1} {n2} is a {n2} that {mask} the {n1} | dates
+TOPIC | soundtrack album | A {n1} {n2} is a {n2} that {mask} with {n1} | deals
+TOPIC_OF_COGNITION&EMOTION | phone fanatic | A {n1} {n2} is a {n2} that {mask} to the {n1} | relates
 TOPIC_OF_EXPERT | technology expert | A {n1} {n2} is a {n2} that {mask} with {n1} | deals
 USER_RECIPIENT | employee benefit | A {n1} {n2} is a {n2} that is {mask} for a {n1} | provided
 VARIETY&GENUS_OF | grape variety | A {n1} {n2} is a {n2} that {mask} down {n1} | breaks
-WHOLE+ATTRIBUTE&FEATURE& QUALITY_VALUE_IS_CHARACTERISTIC_OF | butter texture |
+WHOLE+ATTRIBUTE&FEATURE& QUALITY_VALUE_IS_CHARACTERISTIC_OF | butter texture | A {n1} {n2} is a {n2} that {mask} {n1} | characterizes
 WHOLE+PART_OR_MEMBER_OF | whale tongue | A {n1} {n2} is a {n2} that {mask} part of a {n1} | is 

 </div>

--- a/requirements.txt
+++ b/requirements.txt
+transformers
+gensim
+tensorflow
+tqdm
+seaborn
\ No newline at end of file