A metonymy is the replacement of the actual expression by another one that is closely related to the first one.
A metonymy is the replacement of the actual expression by another one that is closely associated with it [^4].
Metonymies use a contiguity relation between two domains.
...
...
@@ -52,11 +52,11 @@ Metonymies use a contiguity relation between two domains.
**Metonymy resolution** is about determining whether a potentially metonymic word is used metonymically in a particular context. In this project we focus on `metonymic` and `literal` readings for locations and organizations.
ℹ️ Sentences that allow for mixed readings, where both a literal and metonymic sense is evoked, are considered `metonymic` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading.
ℹ️ Sentences that allow for mixed readings, where a literal and metonymic sense is evoked, are considered `non-literal` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading.
- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [ZITAT: SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007]
- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [^8]
➡️ Hence, we use the two classes `non-literal` and `literal` for this binary classification task.
➡️ Hence, we use the two classes `non-literal` and `literal` for our binary classification task.
***
...
...
@@ -75,13 +75,12 @@ Consequently, it is a vital technique for evaluating the robustness of models, a
***
## 💡 Methods <a name="methods"></a>
When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^6] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g.
case changing of single characters or embedding replacements [ZITAT einfügen?]), we chose more innovative and challenging methods.
When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^3] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g. case changing of single characters or embedding replacements [^1]), we chose more innovative and challenging methods.
To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space.
As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq[^1] [[Ott et al., 2019] ]. Similar to Chen et al.[^2] we use the pre-trained single models :
As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq[^9]. Adapting the approach of Chen et al.[^2] we use the pre-trained single models :
Our method adopts the framework of the mixup transformer proposed by Sun et al. [^4]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, BERT-base-uncased [^6]).
Our method adopts the framework of the *MixUp* transformer proposed by Sun et al. [^10]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, `BERT-base-uncased`).
To derive the interpolated hidden representation and corresponding label, we use the following formulas on the representation of two data samples:
represent the hidden representations of the two instances, $T(y_i)$
and $T(y_j)$ represent their corresponding labels, and $\lambda$ is a mixing coefficient that determines the degree of interpolation.
We used a fixed $\lambda$ which was set for the entire training process. In the following the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
The MixUp process can be used dynamically during training at any epoch.
Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two instances, and $T(y_i)$ and $T(y_j)$ represent their corresponding labels. $\lambda$ is a mixing coefficient that determines the degree of the interpolation.
We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
The *MixUp* process can be used dynamically during training at any epoch.
***
## 🗃️ Data <a name="data"></a>
The datasets used in this project will be taken from Li et al.[^5] We confine ourselves to the following three:
-**SemEval:** [ZITAT: Markert and Nissim, 2007 ]
The datasets used in this project will be taken from Li et al.[^6] We confine ourselves to the following three:
@@ -212,24 +207,22 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
## 📑 References <a name="references"></a>
[^1]:Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.
[^2]:Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018.
[^3]:Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.
[^1]:Bayer, Markus, Kaufhold, Marc-André & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021.
[^4]:Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.
[^2]:Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.
[^5]:Longpre,Shayne, Wang, Yu & DuBois, Chris. ["How effective is task-agnostic data augmentation for pretrained transformers?"](https://aclanthology.org/) Findings of the Association for Computational Linguistics: EMNLP 2020, 2020.
[^3]:Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018.
[^6]:Markert,Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.
[^5]:Gritta, Milan, Pilehvar, Mohammad, Taher, Limsopatham, Nut & Collier, Nigel. ["Vancouver welcomes you! minimalist location metonymy resolution."](https://aclanthology.org/P17-1115) Proceedings of the 55th Annual Meeting of the Association for Computational, 2017.
[^6]:Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.
[^7]:Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
[^7]:Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.
[^8]:Sun, L., Xia, C., Yin, W., Liang, T., Yu, P. S., & He, L. (2020). Mixup-transformer: dynamic data augmentation for NLP tasks. arXiv preprint arXiv:2010.02394.
[^8]:Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.
[^9]:Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.