Merge branch 'master' of...

Merge branch 'master' of https://gitlab.cl.uni-heidelberg.de/friebolin/swp-data-augmentation-for-metonymy-resolution

Merge branch 'master' of...
6dd4860e · kulcsar · 0cf0f9cc · 3b633885 · 6dd4860e · 6dd4860e
Commit 6dd4860e authored 2 years ago by kulcsar
--- a/Code/train.py
+++ b/Code/train.py
@@ -110,6 +110,7 @@ def train(model, name,train_dataset, test_dataset, seed, batch_size, test_batch_
 								'mixepoch': True,
 								'mixlayer': mixlayer,
 								'lambda_value': lambda_value}
+						labels=batch[5]
 				

 			if name[0] == "r":

--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ This README gives a rough overview of the project. The full documentation and ad

 - 📝 [Research Plan](documentation/organization/research_plan.pdf)
 - 🧭 [Specification Presentation](documentation/organization/specification_presentation.pdf)
- 📖 [Project Report](LINK) ---------> ADD 
+- 📖 [Project Report](documentation/organization/project_report.pdf)
 - 🎤 [Final Presentation](documentation/organization/final_presentation.pdf)

 ***
@@ -185,9 +185,9 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
 | Command | Functionality | Arguments |
 | ------- | ------------- |-----------|
 | <center> **General** </center>|
-|🔛 **`--architecture`** | Defines which model is used. | Choose `bert-base-uncased` or `roberta`  |
-|🔛 **`--model_type`** | How to initialize the Classification Model | Choose `separate` or `one`  |
-|🔛 **`--tokenizer`**|Which tokenizer to use when preprocessing the datasets.|Choose `swp` for our tokenizer, `li ` for the tokenizer of Li et al. [^6], or `salami` for the tokenizer used by another [student project](https://gitlab.cl.uni-heidelberg.de/salami-hd/salami/-/tree/master/)|
+|🔛 **`--architecture`** | Defines which model is used. | Choose `bert-base-uncased` or `roberta`.|
+|🔛 **`--model_type`** | How to initialize the Classification Model | Choose `separate` or `one`. ⚠️ *TMix* only works with `one`.|
+|🔛 **`--tokenizer`**|Which tokenizer to use when preprocessing the datasets.|Choose `swp` for our tokenizer, `li ` for the tokenizer of Li et al. [^6], or `salami` for the tokenizer used by another [student project](https://gitlab.cl.uni-heidelberg.de/salami-hd/salami/-/tree/master/).|
 |**`-tc`**/**`--tcontext`**|Whether or not to preprocess the training set with context.||
 |**`-vc`**/**`--vcontext`**|Whether or not to preprocess the test set with context.||
 |🔛 **`-max`**/**`--max_length`**|Defines the maximum sequence length when tokenizing the sentences.|Typically choose $256$ or $512$.|
@@ -200,17 +200,17 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
 |**`-sd`**/**`--save_directory`**|This option specifies the destination directory for the output results of the run.||
 |**`-msp`**/**`--model_save_path`**|This option specifies the destination directory for saving the model.|We recommend saving models in [Code/saved_models](Code/saved_models).|
 |**`--masking`**|Whether or not to mask the target word.||
-|🌐 **`--mixlayer`**| Specify in which `layer` the interpolation takes place. Only select one layer at a time. | Choose from ${0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}$ |
-|🍸, 🌐 **`-lambda`**/**`--lambda_value`**|Speficies the lambda value for interpolation of *MixUp* and *TMix*|Choose any value between $0$ and $1$, `type=float`|
+|🌐 **`--mixlayer`**| Specify in which `layer` the interpolation takes place. Only select one layer at a time. | Choose from ${0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}$.|
+|🍸, 🌐 **`-lambda`**/**`--lambda_value`**|Specifies the lambda value for interpolation of *MixUp* and *TMix*|Choose any value between $0$ and $1$, `type=float`.|
 | <center>| **MixUp** specific </center>||
 |🍸 **`-mixup`**/**`--mix_up`**| Whether or not to use *MixUp*. If yes, please specify `lambda` and `-mixepoch`| |
-|🍸 **`-mixepoch`**/**`--mixepoch`**|Specifies the epoch(s) in which to apply *MixUp*.|Default is `None`|
+|🍸 **`-mixepoch`**/**`--mixepoch`**|Specifies the epoch(s) in which to apply *MixUp*.|Default is `None`.|
 | <center>| **TMix** specific </center>| Default is `None`.|
-|🌐 **`--tmix`**| Whether or not to use *TMix*. If yes, please specify `-mixlayer` and `-lambda`| |
+|🌐 **`--tmix`**| Whether or not to use *TMix*. If yes, please specify `-mixlayer` and `-lambda`.| |
 | <center>| **Datasets** specific </center>||
-|🔛 **`-t`**/**`"--train_dataset`**|Defines which dataset is chosen for training.|Choose any of the datasets from [original_datasets](data/original_datasets), [fused_datasets](data/fused_datasets) or [paraphrases](data/paraphrases)|
-|🔛 **`-v`**/**`--test_dataset`**|Defines which dataset is chosen for testing.|Choose from ["semeval_test.txt"](data/original_datasets/semeval_test.txt), ["companies_test.txt"](data/original_datasets/companies_test.txt) or ["relocar_test.txt"](data/original_datasets/relocar_test.txt)|
-|**`--imdb`**| Whether or not to use the [IMDB](https://huggingface.co/datasets/imdb) dataset. Note that this is only relevant for validating our *TMix* implementation.||
+|🔛 **`-t`**/**`"--train_dataset`**|Defines which dataset is chosen for training.|Choose any of the datasets from [original_datasets](data/original_datasets), [fused_datasets](data/fused_datasets) or [paraphrases](data/paraphrases).|
+|🔛 **`-v`**/**`--test_dataset`**|Defines which dataset is chosen for testing.|Choose from ["semeval_test.txt"](data/original_datasets/semeval_test.txt), ["companies_test.txt"](data/original_datasets/companies_test.txt) or ["relocar_test.txt"](data/original_datasets/relocar_test.txt).|
+|**`--imdb`**| Whether or not to use the [IMDB](https://huggingface.co/datasets/imdb) dataset. Note that this is only relevant for validating our *TMix* implementation.|⚠️ Only works wit `BERT`.|
 |🔛 **`-b`**/**`--batch_size`**|Defines the batch size for the training process.|Default is $32$.|
 |🔛 **`-tb`**/**`--test_batch_size`**|Specifies the batch size for the test process.|Default is $16$.|


--- a/documentation/organization/project_report.pdf
+++ b/documentation/organization/project_report.pdf
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,7 +3,7 @@ numpy==1.23.5
 pandas==1.5.2
 torch==1.13.0+cu116
 tqdm==4.64.1
-
+python==3.9.*
 evaluate ==0.3.0
 matplotlib==3.5.2
 scikit_lean==1.2.1