@@ -89,7 +89,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele
- ✅ For each sentence, the top 5 paraphrases are kept, using [nucleus/topp](https://fairseq.readthedocs.io/en/latest/command_line_tools.html) as our sampling method, likewise for diversity reasons.
- 🔥 We test two versions: Generating paraphrases using a lower (0.8) and higher (1.2) **`temperature`**. This hyperparameter determines how *creative* the translation model becomes: higher `temperature` leads to more linguistic variety, lower `temperature` to results closer to the original sentence.
- 🔥 We test two versions: Generating paraphrases using a lower (0.8) and higher (1.2) `temperature`. This hyperparameter determines how *creative* the translation model becomes: higher `temperature` leads to more linguistic variety, lower `temperature` to results closer to the original sentence.
- 🌈 The diversity of the paraphrases is evaluated via the Longest Common Subsequence [(LCS)](https://docs.python.org/3/library/difflib.html#sequencematcher-objects) score in comparison to their respective original sentence.
...
...
@@ -191,18 +191,18 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
## 🏯 Code-Structure <a name="code-structure"></a>
- ⚙️ `requirements.txt`: All necessary modules to install.
- 📱 `main.py`: Our main code file which does ...
- 💻 `code`: Here, you can find all code files for our different models and data augmentation methods.
- 📀 `data`: Find all datasets in this folder.
- 🗂️ `original_datasets`: *Semeval_loc*, *Semeval_org*, *Relocar* in their original form.