diff --git a/README.md b/README.md index b1358a7f5b4e3055162dbf9a8b647baa1dcd0827..925420046eb2c574ca5393a3509400fc3d8cacca 100644 --- a/README.md +++ b/README.md @@ -7,16 +7,16 @@ Members of the project: - Mira Umlauf [umlauf@cl.uni-heidelberg.de](mailto:umlauf@cl.uni-heidelberg.de) # Table of contents -1. 📚 [Project documents](#documents) +1. 📚 [Project documents](#project-documents) 2. 🔎 [Metonomy Resolution](#metonomy) -3. 📈 [Data Augmentation](#augmentation) +3. 📈 [Data Augmentation](#data-augmentation) 4. 💡 [Methods](#methods) 1. 📠[Backtranslation](#backtranslation) 2. 🸠[MixUp](#mixup) 5. ðŸ—ƒï¸ [Data](#data) -6. ðŸ› ï¸ [Set Up](#setup) +6. ðŸ› ï¸ [Set Up](#set-up) 7. âš™ï¸ [Usage](#usage) -8. 🯠[Code Structure](#structure) +8. 🯠[Code Structure](#code-structure) 9. 📑 [References](#references) *** @@ -24,15 +24,15 @@ Members of the project: ## 📚 Project documents <a name="documents"></a> This README gives a rough overview of the project. The full documentation and additional information can be found in the documents listed below. -- 📠[Research Plan](Organization/research_plan.pdf) -- 🧠[Specification Presentation](LINK) -- 📖 [Project Report](LINK) -- 🎤 [Final Presentation](LINK) +- 📠[Research Plan](documentation/organization/research_plan.pdf) +- 🧠[Specification Presentation](documentation/organization/specification_presentation.pdf) +- 📖 [Project Report](LINK) ---------> ADD +- 🎤 [Final Presentation](LINK) ---------> ADD *** ## 🔎 Metonymy Resolution <a name="metonymy"></a> -A metonymy is the replacement of the actual expression by another one that is closely related to the first one. +A metonymy is the replacement of the actual expression by another one that is closely associated with it [^4]. Metonymies use a contiguity relation between two domains. @@ -52,11 +52,11 @@ Metonymies use a contiguity relation between two domains. **Metonymy resolution** is about determining whether a potentially metonymic word is used metonymically in a particular context. In this project we focus on `metonymic` and `literal` readings for locations and organizations. -â„¹ï¸ Sentences that allow for mixed readings, where both a literal and metonymic sense is evoked, are considered `metonymic` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading. +â„¹ï¸ Sentences that allow for mixed readings, where a literal and metonymic sense is evoked, are considered `non-literal` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading. -- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [ZITAT: SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007] +- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [^8] -âž¡ï¸ Hence, we use the two classes `non-literal` and `literal` for this binary classification task. +âž¡ï¸ Hence, we use the two classes `non-literal` and `literal` for our binary classification task. *** @@ -75,13 +75,12 @@ Consequently, it is a vital technique for evaluating the robustness of models, a *** ## 💡 Methods <a name="methods"></a> -When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^6] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g. -case changing of single characters or embedding replacements [ZITAT einfügen?]), we chose more innovative and challenging methods. +When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^3] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g. case changing of single characters or embedding replacements [^1]), we chose more innovative and challenging methods. To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space. ### 📠1. Backtranslation (Data Space)<a name="backtranslation"></a> -As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq[^1] [[Ott et al., 2019] ]. Similar to Chen et al.[^2] we use the pre-trained single models : +As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2] we use the pre-trained single models : - [`transformer.wmt19.en-de.single_model`](https://huggingface.co/facebook/wmt19-en-de) - [`transformer.wmt19.de-en.single_model`](https://huggingface.co/facebook/wmt19-de-en) @@ -100,7 +99,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele - **EN - DE:** *BMW und Nissan* bringen Elektroautos auf den Markt. - **DE - EN:** *BMW and Nissan* **are bringing** electric cars **to the market**. -🚮 **Filtering:**| +🚮 **Filtering:** - All paraphrases that did not contain the original (metonymic) target word or had syntactic variations were filtered out. @@ -108,7 +107,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele ### 🸠2. MixUp (Feature Space)<a name="mixup"></a> -Our method adopts the framework of the mixup transformer proposed by Sun et al. [^4]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, BERT-base-uncased [^6]). +Our method adopts the framework of the *MixUp* transformer proposed by Sun et al. [^10]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, `BERT-base-uncased`). To derive the interpolated hidden representation and corresponding label, we use the following formulas on the representation of two data samples: @@ -120,23 +119,21 @@ $$\hat{x} = \lambda T(x_i) + (1- \lambda)T(x_j)$$ $$\hat{y} = \lambda T(y_i) + (1- \lambda)T(y_j)$$ -Here, $T(x_i)$ and $T(x_j)$ -represent the hidden representations of the two instances, $T(y_i)$ -and $T(y_j)$ represent their corresponding labels, and $\lambda$ is a mixing coefficient that determines the degree of interpolation. -We used a fixed $\lambda$ which was set for the entire training process. In the following the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction. -The MixUp process can be used dynamically during training at any epoch. +Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two instances, and $T(y_i)$ and $T(y_j)$ represent their corresponding labels. $\lambda$ is a mixing coefficient that determines the degree of the interpolation. + +We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction. +The *MixUp* process can be used dynamically during training at any epoch. *** ## ðŸ—ƒï¸ Data <a name="data"></a> -The datasets used in this project will be taken from Li et al.[^5] We confine ourselves to the following three: - -- **SemEval:** [ZITAT: Markert and Nissim, 2007 ] +The datasets used in this project will be taken from Li et al.[^6] We confine ourselves to the following three: -| 1. **SemEval: Locations** | 2. **SemEval: Companies & Organizations** | 3. **ReLocar: Locations** | -| ---------------- | -------------------------------- | -----------------------------------------| -|[ZITAT: Markert and Nissim, 2007 ]| [ZITAT: Markert and Nissim, 2007 ]|[ZITAT: Gritta et al., 2017 ]| -| <img src="documentation/images/semeval_loc_metonym_ratio.png"> | <img src="documentation/images/semeval_org_metonym_ratio.png"> | <img src="documentation/images/relocar_metonym_ratio.png"> +| **SemEval: Locations** & **SemEval: Companies & Organizations** | **ReLocar: Locations** | +| --------------------------------------------------------------- | --------------------------------------| +| 3800 sentences from the BNC corpus [^8] | Wikipedia-baseddataset containing 2026 sentences [^5] | +| <img src="documentation/images/semeval_loc_metonym_ratio.png"> | <img src="documentation/images/relocar_metonym_ratio.png"> | +|<img src="documentation/images/semeval_org_metonym_ratio.png"> || ðŸ–Šï¸ **Data Point Example:** @@ -212,22 +209,22 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w ## 📑 References <a name="references"></a> -- [Link to Li et al. Paper](https://github.com/haonan-li/TWM-metonymy-resolution/blob/main/metonymy_resolution_long.pdf) +[^1]: Bayer, Markus, Kaufhold, Marc-André & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021. + +[^2]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020. + +[^3]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018. + +[^4]: English Oxford Dictionary. ["Metonymy"](https://www.oxfordbibliographies.com/view/document/obo-9780199772810/obo-9780199772810-0252.xml) + +[^5]: Gritta, Milan, Pilehvar, Mohammad, Taher, Limsopatham, Nut & Collier, Nigel. ["Vancouver welcomes you! minimalist location metonymy resolution."](https://aclanthology.org/P17-1115) Proceedings of the 55th Annual Meeting of the Association for Computational, 2017. -- [Github Repository](https://github.com/haonan-li/TWM-metonymy-resolution) of Li et al.: "Target Word Masking for Location Metonymy Resolution" +[^6]: Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020. - - [Link to TWM datasets](https://github.com/haonan-li/TWM-metonymy-resolution/tree/main/data) +[^7]: Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019. - - [Downloaded TWM datasets](./swp/swp-data-augmentation-for-metonymy-resolution/datasets/li_twm) +[^8]: Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007. -[^1]: Fairseq Tool. -[^2]: Backtranslation paper. -[^3]: Zhang, Hongyi, Cissé, Moustapha, Dauphin, Yann N. & Lopez-Paz, David. mixup: Beyond empirical risk minimization. *CoRR*, 2017. -[^4]: Sun, L., Xia, C., Yin, W., Liang, T., Yu, P. S., & He, L. (2020). Mixup-transformer: dynamic data augmentation for NLP tasks. arXiv preprint arXiv:2010.02394. -[^5]: Li et al. -[^6]: Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. -[^7]: Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. +[^9]: Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019. -[^note]: - not listed footnote - rfngkjn +[^10]: Sun, Lichao, Xia, Congying, Yin, Wenpeng, Liang, Tingting, Yu, Philip S. & He, Lifang. ["Mixup-transformer: dynamic data augmentation for NLP tasks."](https://arxiv.org/abs/2010.02394) 2020.