diff --git a/README.md b/README.md index 925420046eb2c574ca5393a3509400fc3d8cacca..f72afc03c48cd528099d6db48b5b77c89c27af59 100644 --- a/README.md +++ b/README.md @@ -89,7 +89,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele - ✅ For each sentence, the top 5 paraphrases are kept, using [nucleus/topp](https://fairseq.readthedocs.io/en/latest/command_line_tools.html) as our sampling method, likewise for diversity reasons. - - 🔥 We test two versions: Generating paraphrases using a lower (0.8) and higher (1.2) **`temperature`**. This hyperparameter determines how *creative* the translation model becomes: higher `temperature` leads to more linguistic variety, lower `temperature` to results closer to the original sentence. + - 🔥 We test two versions: Generating paraphrases using a lower (0.8) and higher (1.2) `temperature`. This hyperparameter determines how *creative* the translation model becomes: higher `temperature` leads to more linguistic variety, lower `temperature` to results closer to the original sentence. - 🌈 The diversity of the paraphrases is evaluated via the Longest Common Subsequence [(LCS)](https://docs.python.org/3/library/difflib.html#sequencematcher-objects) score in comparison to their respective original sentence. @@ -191,18 +191,18 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w ## 🯠Code-Structure <a name="code-structure"></a> -- âš™ï¸ `requirements.txt`: All necessary modules to install. -- 📱 `main.py`: Our main code file which does ... -- 💻 `code`: Here, you can find all code files for our different models and data augmentation methods. -- 📀 `data`: Find all datasets in this folder. - - ðŸ—‚ï¸ `original_datasets`: *Semeval_loc*, *Semeval_org*, *Relocar* in their original form. - - ðŸ—‚ï¸ `backtranslation`: Contains unfiltered generated paraphrases. - - ðŸ—‚ï¸ `paraphrases`: Contains only filtered paraphrases. - - ðŸ—‚ï¸ `fused_datasets`: Contains original datasets fused with filtered paraphrases. Ready to be used for training the models. -- 📠`documentation`: Contains our organizational data and visualizations. - - ðŸ—‚ï¸ `organization`: Our research plan, presentation, final reports. - - ðŸ—‚ï¸ `images`: Contains all relevant visualizations. - - ðŸ—‚ï¸ `results`: Find tables of our results. +- âš™ï¸ [`requirements.txt`](requirements.txt): All necessary modules to install. +- 📱 [`main.py`](main.py): Our main code file which does ... +- 💻 [`Code`](code): Here, you can find all code files for our different models and data augmentation methods. +- 📀 [`data`](data): Find all datasets in this folder. + - ðŸ—‚ï¸ [`backtranslations`](data/backtranslations): Contains unfiltered generated paraphrases. + - ðŸ—‚ï¸ [`fused_datasets`](data/fused_datasets): Contains original datasets fused with filtered paraphrases. Ready to be used for training the models. + - ðŸ—‚ï¸ [`original_datasets`](data/original_datasets): *Semeval_loc*, *Semeval_org*, *Relocar* in their original form. + - ðŸ—‚ï¸ [`paraphrases`](data/paraphrases): Contains only filtered paraphrases. +- 📠[`documentation`](documentation): Contains our organizational data and visualizations. + - ðŸ—‚ï¸ [`images`](documentation/images): Contains all relevant visualizations. + - ðŸ—‚ï¸ [`organization`](documentation/organization): Our research plan, presentation, final reports. + - ðŸ—‚ï¸ [`results`](documentation/results): Find tables of our results. ***