diff --git a/README.md b/README.md
index b1358a7f5b4e3055162dbf9a8b647baa1dcd0827..925420046eb2c574ca5393a3509400fc3d8cacca 100644
--- a/README.md
+++ b/README.md
@@ -7,16 +7,16 @@ Members of the project:
 - Mira Umlauf [umlauf@cl.uni-heidelberg.de](mailto:umlauf@cl.uni-heidelberg.de)
 
 # Table of contents
-1. ðŸ“š [Project documents](#documents)
+1. ðŸ“š [Project documents](#project-documents)
 2. ðŸ”Ž [Metonomy Resolution](#metonomy)
-3. ðŸ“ˆ [Data Augmentation](#augmentation)
+3. ðŸ“ˆ [Data Augmentation](#data-augmentation)
 4. ðŸ’¡ [Methods](#methods)
     1. ðŸ“ [Backtranslation](#backtranslation)
     2. ðŸ¸ [MixUp](#mixup)
 5. ðŸ—ƒï¸ [Data](#data)
-6. ðŸ› ï¸ [Set Up](#setup)
+6. ðŸ› ï¸ [Set Up](#set-up)
 7. âš™ï¸ [Usage](#usage)
-8. ðŸ¯ [Code Structure](#structure)
+8. ðŸ¯ [Code Structure](#code-structure)
 9. ðŸ“‘ [References](#references)
 
 ***
@@ -24,15 +24,15 @@ Members of the project:
 ## ðŸ“š Project documents <a name="documents"></a>
 This README gives a rough overview of the project. The full documentation and additional information can be found in the documents listed below.
 
-- ðŸ“ [Research Plan](Organization/research_plan.pdf)
-- ðŸ§ [Specification Presentation](LINK)
-- ðŸ“– [Project Report](LINK)
-- ðŸŽ¤ [Final Presentation](LINK)
+- ðŸ“ [Research Plan](documentation/organization/research_plan.pdf)
+- ðŸ§ [Specification Presentation](documentation/organization/specification_presentation.pdf)
+- ðŸ“– [Project Report](LINK) ---------> ADD 
+- ðŸŽ¤ [Final Presentation](LINK) ---------> ADD 
 
 ***
 
 ## ðŸ”Ž Metonymy Resolution <a name="metonymy"></a>
-A metonymy is the replacement of the actual expression by another one that is closely related to the first one. 
+A metonymy is the replacement of the actual expression by another one that is closely associated with it [^4]. 
 
 Metonymies use a contiguity relation between two domains.
 
@@ -52,11 +52,11 @@ Metonymies use a contiguity relation between two domains.
 
 **Metonymy resolution** is about determining whether a potentially metonymic word is used metonymically in a particular context. In this project we focus on `metonymic` and `literal` readings for locations and organizations. 
 
-â„¹ï¸ Sentences that allow for mixed readings, where both a literal and metonymic sense is evoked, are considered `metonymic` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading.
+â„¹ï¸ Sentences that allow for mixed readings, where a literal and metonymic sense is evoked, are considered `non-literal` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading.
     
-- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [ZITAT: SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007]
+- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [^8]
 
-âž¡ï¸ Hence, we use the two classes `non-literal` and `literal` for this binary classification task.
+âž¡ï¸ Hence, we use the two classes `non-literal` and `literal` for our binary classification task.
 
 ***
 
@@ -75,13 +75,12 @@ Consequently, it is a vital technique for evaluating the robustness of models, a
 ***
 
 ## ðŸ’¡ Methods <a name="methods"></a>
-When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^6] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g.
-case changing of single characters or embedding replacements [ZITAT einfÃ¼gen?]), we chose more innovative and challenging methods.
+When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^3] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g. case changing of single characters or embedding replacements [^1]), we chose more innovative and challenging methods.
 
 To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space.
 
 ### ðŸ“ 1. Backtranslation (Data Space)<a name="backtranslation"></a>
-As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq[^1] [[Ott et al., 2019] ]. Similar to Chen et al.[^2] we use the pre-trained single models :
+As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2] we use the pre-trained single models :
 
     - [`transformer.wmt19.en-de.single_model`](https://huggingface.co/facebook/wmt19-en-de)
     - [`transformer.wmt19.de-en.single_model`](https://huggingface.co/facebook/wmt19-de-en)
@@ -100,7 +99,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele
 - **EN - DE:** *BMW und Nissan* bringen Elektroautos auf den Markt.
 - **DE - EN:** *BMW and Nissan* **are bringing** electric cars **to the market**.
 
-ðŸš® **Filtering:**|
+ðŸš® **Filtering:**
 
 - All paraphrases that did not contain the original (metonymic) target word or had syntactic variations were filtered out.
 
@@ -108,7 +107,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele
 
 
 ### ðŸ¸ 2. MixUp (Feature Space)<a name="mixup"></a>
-Our method adopts the framework of the mixup transformer proposed by Sun et al. [^4]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, BERT-base-uncased [^6]).
+Our method adopts the framework of the *MixUp* transformer proposed by Sun et al. [^10]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, `BERT-base-uncased`).
 
 To derive the interpolated hidden representation and corresponding label, we use the following formulas on the representation of two data samples:
 
@@ -120,23 +119,21 @@ $$\hat{x} = \lambda T(x_i) + (1- \lambda)T(x_j)$$
 
 $$\hat{y} = \lambda T(y_i) + (1- \lambda)T(y_j)$$
 
-Here, $T(x_i)$ and $T(x_j)$ 
-represent the hidden representations of the two instances, $T(y_i)$ 
-and $T(y_j)$ represent their corresponding labels, and $\lambda$ is a mixing coefficient that determines the degree of interpolation.
-We used a fixed $\lambda$ which was set for the entire training process. In the following the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
-The MixUp process can be used dynamically during training at any epoch.
+Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two instances, and $T(y_i)$ and $T(y_j)$ represent their corresponding labels. $\lambda$ is a mixing coefficient that determines the degree of the interpolation.
+
+We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
+The *MixUp* process can be used dynamically during training at any epoch.
 
 ***
 
 ## ðŸ—ƒï¸ Data <a name="data"></a>
-The datasets used in this project will be taken from Li et al.[^5] We confine ourselves to the following three:
-
-- **SemEval:** [ZITAT: Markert and Nissim, 2007 ]
+The datasets used in this project will be taken from Li et al.[^6] We confine ourselves to the following three:
 
-| 1. **SemEval: Locations** | 2. **SemEval: Companies & Organizations** | 3. **ReLocar: Locations** |
-| ---------------- | -------------------------------- | -----------------------------------------|
-|[ZITAT: Markert and Nissim, 2007 ]| [ZITAT: Markert and Nissim, 2007 ]|[ZITAT: Gritta et al., 2017 ]|
-| <img src="documentation/images/semeval_loc_metonym_ratio.png"> |  <img src="documentation/images/semeval_org_metonym_ratio.png"> | <img src="documentation/images/relocar_metonym_ratio.png">
+| **SemEval: Locations** & **SemEval: Companies & Organizations** | **ReLocar: Locations**                |
+| --------------------------------------------------------------- | --------------------------------------|
+| 3800 sentences from the BNC corpus [^8]                         | Wikipedia-baseddataset containing 2026 sentences [^5] |
+| <img src="documentation/images/semeval_loc_metonym_ratio.png">  | <img src="documentation/images/relocar_metonym_ratio.png"> |
+|<img src="documentation/images/semeval_org_metonym_ratio.png"> ||
     
 ðŸ–Šï¸ **Data Point Example:** 
 
@@ -212,22 +209,22 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
 
 ## ðŸ“‘ References <a name="references"></a>
 
-- [Link to Li et al. Paper](https://github.com/haonan-li/TWM-metonymy-resolution/blob/main/metonymy_resolution_long.pdf)
+[^1]: Bayer, Markus, Kaufhold, Marc-AndrÃ© & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021.
+
+[^2]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.
+
+[^3]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018. 
+
+[^4]: English Oxford Dictionary. ["Metonymy"](https://www.oxfordbibliographies.com/view/document/obo-9780199772810/obo-9780199772810-0252.xml)
+
+[^5]: Gritta, Milan, Pilehvar, Mohammad, Taher, Limsopatham, Nut & Collier, Nigel. ["Vancouver welcomes you! minimalist location metonymy resolution."](https://aclanthology.org/P17-1115) Proceedings of the 55th Annual Meeting of the Association for Computational, 2017.
 
-- [Github Repository](https://github.com/haonan-li/TWM-metonymy-resolution) of Li et al.: "Target Word Masking for Location Metonymy Resolution"
+[^6]: Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.
 
-    - [Link to TWM datasets](https://github.com/haonan-li/TWM-metonymy-resolution/tree/main/data)
+[^7]: Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.
 
-    - [Downloaded TWM datasets](./swp/swp-data-augmentation-for-metonymy-resolution/datasets/li_twm)
+[^8]: Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.
 
-[^1]: Fairseq Tool.
-[^2]: Backtranslation paper.
-[^3]: Zhang, Hongyi, CissÃ©, Moustapha, Dauphin, Yann N. & Lopez-Paz, David. mixup: Beyond empirical risk minimization. *CoRR*, 2017.
-[^4]: Sun, L., Xia, C., Yin, W., Liang, T., Yu, P. S., & He, L. (2020). Mixup-transformer: dynamic data augmentation for NLP tasks. arXiv preprint arXiv:2010.02394.
-[^5]: Li et al.
-[^6]: Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
-[^7]: Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
+[^9]:  Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
 
-[^note]:
-    not listed footnote
-    rfngkjn
+[^10]: Sun, Lichao, Xia, Congying, Yin, Wenpeng, Liang, Tingting, Yu, Philip S. & He, Lifang. ["Mixup-transformer: dynamic data augmentation for NLP tasks."](https://arxiv.org/abs/2010.02394) 2020.