@@ -185,31 +185,34 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
| Command | Functionality | Arguments |
| ------- | ------------- |-----------|
| <center>**General**</center>|
|**`--architecture`** | Defines which model is used. | Choose `bert-base-uncased` or `roberta` |
|**`--model_type`** | How to initialize the Classification Model | Choose `separate` or `one` |
|**`--mixlayer`**| Specify in which `layer` the interpolation takes place. Only select one layer at a time. | Choose from ${0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}$ |
|**`--tokenizer`**|Which tokenizer to use when preprocessing the datasets.|Choose `swp` for our tokenizer, `li ` for the tokenizer of Li et al. [^6], or `salami` for the tokenizer used by another [student project](https://gitlab.cl.uni-heidelberg.de/salami-hd/salami/-/tree/master/)|
|**`-max`**/**`--max_length`**|Defines the maximum sequence length when tokenizing the sentences.|⚠️ Always choose 256 for *TMix* and 512 for the other models.|
|**`--train_loop`**|Defines which train loop to use.|Choose `swp` for our train loop implementation and `salami` for the one of the [salami](https://gitlab.cl.uni-heidelberg.de/salami-hd/salami) student project.|
|**`-e`**/**`--epochs`**|Number of epochs for training.||
|**`-lr`**/**`--learning_rate`**|Learning rate for training.|`type=float`|
|**`-rs`**/**`--random_seed`**|Random seed for initialization of the model.|Default is $42$.|
|**`-sd`**/**`--save_directory`**|This option specifies the destination directory for the output results of the run.||
|**`-msp`**/**`--model_save_path`**|This option specifies the destination directory for saving the model.|We recommend saving models in [Code/saved_models](Code/saved_models).|
|🔛 **`--architecture`** | Defines which model is used. | Choose `bert-base-uncased` or `roberta` |
|🔛 **`--model_type`** | How to initialize the Classification Model | Choose `separate` or `one` |
|🔛 **`--tokenizer`**|Which tokenizer to use when preprocessing the datasets.|Choose `swp` for our tokenizer, `li ` for the tokenizer of Li et al. [^6], or `salami` for the tokenizer used by another [student project](https://gitlab.cl.uni-heidelberg.de/salami-hd/salami/-/tree/master/)|
|**`-tc`**/**`--tcontext`**|Whether or not to preprocess the training set with context.||
|**`-vc`**/**`--vcontext`**|Whether or not to preprocess the test set with context.||
|🔛 **`-max`**/**`--max_length`**|Defines the maximum sequence length when tokenizing the sentences.|⚠️ Always choose 256 for *TMix* and 512 for the other models.|
|🔛 **`--train_loop`**|Defines which train loop to use.|Choose `swp` for our train loop implementation and `salami` for the one of the [salami](https://gitlab.cl.uni-heidelberg.de/salami-hd/salami) student project.|
|🔛 **`-e`**/**`--epochs`**|Number of epochs for training.||
|🔛 **`-lr`**/**`--learning_rate`**|Learning rate for training.|`type=float`|
|**`-lrtwo`**/**`--second_learning_rate`**| Separate learning rate for multi layer perceptron.|Default is `None`.|
|**`--mlp`**| Whether or not to use two layer MLP as classifier.| |
|🔛 **`-rs`**/**`--random_seed`**|Random seed for initialization of the model.|Default is $42$.|
|🔛 **`-sd`**/**`--save_directory`**|This option specifies the destination directory for the output results of the run.||
|**`-msp`**/**`--model_save_path`**|This option specifies the destination directory for saving the model.|We recommend saving models in [Code/saved_models](Code/saved_models).|
|**`--masking`**|Whether or not to mask the target word.||
|**`-lambda`**/**`--lambda_value`**|Speficies the lambda value for interpolation of *MixUp* and *TMix*|Default is $0.4$, `type=float`|
|🌐 **`--mixlayer`**| Specify in which `layer` the interpolation takes place. Only select one layer at a time. | Choose from ${0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}$ |
|🍸, 🌐 **`-lambda`**/**`--lambda_value`**|Speficies the lambda value for interpolation of *MixUp* and *TMix*|Default is $0.4$, `type=float`|
| <center>| **MixUp** specific </center>||
|**`-mixup`**/**`--mix_up`**| Whether or not to use *MixUp*. If yes, please specify `lambda` and `-mixepoch`| |
|**`-mixepoch`**/**`--mixepoch`**|Specifies the epoch(s) in which to apply *MixUp*.|Default is `None`|
| <center>| **TMix** specific </center>||
|**`--tmix`**| Whether or not to use *TMix*. If yes, please specify `-mixlayer` and `-lambda`| |
|🍸 **`-mixup`**/**`--mix_up`**| Whether or not to use *MixUp*. If yes, please specify `lambda` and `-mixepoch`| |
|🍸 **`-mixepoch`**/**`--mixepoch`**|Specifies the epoch(s) in which to apply *MixUp*.|Default is `None`|
| <center>| **TMix** specific </center>| Default is `None`.|
|🌐 **`--tmix`**| Whether or not to use *TMix*. If yes, please specify `-mixlayer` and `-lambda`| |
| <center>| **Datasets** specific </center>||
|**`-t`**/**`"--train_dataset`**|Defines which dataset is chosen for training.|Choose any of the datasets from [original_datasets](data/original_datasets), [fused_datasets](data/fused_datasets) or [paraphrases](data/paraphrases)|
|**`-v`**/**`--test_dataset`**|Defines which dataset is chosen for testing.|Choose from ["semeval_test.txt"](data/original_datasets/semeval_test.txt), ["companies_test.txt"](data/original_datasets/companies_test.txt) or ["relocar_test.txt"](data/original_datasets/relocar_test.txt)|
|🔛 **`-t`**/**`"--train_dataset`**|Defines which dataset is chosen for training.|Choose any of the datasets from [original_datasets](data/original_datasets), [fused_datasets](data/fused_datasets) or [paraphrases](data/paraphrases)|
|🔛 **`-v`**/**`--test_dataset`**|Defines which dataset is chosen for testing.|Choose from ["semeval_test.txt"](data/original_datasets/semeval_test.txt), ["companies_test.txt"](data/original_datasets/companies_test.txt) or ["relocar_test.txt"](data/original_datasets/relocar_test.txt)|
|**`--imdb`**| Whether or not to use the [IMDB](https://huggingface.co/datasets/imdb) dataset. Note that this is only relevant for validating our *TMix* implementation.||
|**`-b`**/**`--batch_size`**|Defines the batch size for the training process.|Default is $32$.|
|**`-tb`**/**`--test_batch_size`**|Specifies the batch size for the test process.|Default is $16$.|
|🔛 **`-b`**/**`--batch_size`**|Defines the batch size for the training process.|Default is $32$.|
|🔛 **`-tb`**/**`--test_batch_size`**|Specifies the batch size for the test process.|Default is $16$.|