Skip to content
Snippets Groups Projects
Commit 391b6af7 authored by friebolin's avatar friebolin
Browse files

Finish references

parent ddef8b4d
No related branches found
No related tags found
No related merge requests found
......@@ -24,15 +24,15 @@ Members of the project:
## 📚 Project documents <a name="documents"></a>
This README gives a rough overview of the project. The full documentation and additional information can be found in the documents listed below.
- 📝 [Research Plan](Organization/research_plan.pdf)
- 🧭 [Specification Presentation](LINK)
- 📖 [Project Report](LINK)
- 🎤 [Final Presentation](LINK)
- 📝 [Research Plan](documentation/organization/research_plan.pdf)
- 🧭 [Specification Presentation](documentation/organization/specification_presentation.pdf)
- 📖 [Project Report](LINK) ---------> ADD
- 🎤 [Final Presentation](LINK) ---------> ADD
***
## 🔎 Metonymy Resolution <a name="metonymy"></a>
A metonymy is the replacement of the actual expression by another one that is closely related to the first one.
A metonymy is the replacement of the actual expression by another one that is closely associated with it [^4].
Metonymies use a contiguity relation between two domains.
......@@ -52,11 +52,11 @@ Metonymies use a contiguity relation between two domains.
**Metonymy resolution** is about determining whether a potentially metonymic word is used metonymically in a particular context. In this project we focus on `metonymic` and `literal` readings for locations and organizations.
ℹ️ Sentences that allow for mixed readings, where both a literal and metonymic sense is evoked, are considered `metonymic` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading.
ℹ️ Sentences that allow for mixed readings, where a literal and metonymic sense is evoked, are considered `non-literal` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading.
- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [ZITAT: SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007]
- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [^8]
➡️ Hence, we use the two classes `non-literal` and `literal` for this binary classification task.
➡️ Hence, we use the two classes `non-literal` and `literal` for our binary classification task.
***
......@@ -75,13 +75,12 @@ Consequently, it is a vital technique for evaluating the robustness of models, a
***
## 💡 Methods <a name="methods"></a>
When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^6] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g.
case changing of single characters or embedding replacements [ZITAT einfügen?]), we chose more innovative and challenging methods.
When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^3] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g. case changing of single characters or embedding replacements [^1]), we chose more innovative and challenging methods.
To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space.
### 📝 1. Backtranslation (Data Space)<a name="backtranslation"></a>
As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq[^1] [[Ott et al., 2019] ]. Similar to Chen et al.[^2] we use the pre-trained single models :
As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2] we use the pre-trained single models :
- [`transformer.wmt19.en-de.single_model`](https://huggingface.co/facebook/wmt19-en-de)
- [`transformer.wmt19.de-en.single_model`](https://huggingface.co/facebook/wmt19-de-en)
......@@ -108,7 +107,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele
### 🍸 2. MixUp (Feature Space)<a name="mixup"></a>
Our method adopts the framework of the mixup transformer proposed by Sun et al. [^4]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, BERT-base-uncased [^6]).
Our method adopts the framework of the *MixUp* transformer proposed by Sun et al. [^10]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, `BERT-base-uncased`).
To derive the interpolated hidden representation and corresponding label, we use the following formulas on the representation of two data samples:
......@@ -120,22 +119,18 @@ $$\hat{x} = \lambda T(x_i) + (1- \lambda)T(x_j)$$
$$\hat{y} = \lambda T(y_i) + (1- \lambda)T(y_j)$$
Here, $T(x_i)$ and $T(x_j)$
represent the hidden representations of the two instances, $T(y_i)$
and $T(y_j)$ represent their corresponding labels, and $\lambda$ is a mixing coefficient that determines the degree of interpolation.
We used a fixed $\lambda$ which was set for the entire training process. In the following the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
The MixUp process can be used dynamically during training at any epoch.
Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two instances, and $T(y_i)$ and $T(y_j)$ represent their corresponding labels. $\lambda$ is a mixing coefficient that determines the degree of the interpolation.
We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
The *MixUp* process can be used dynamically during training at any epoch.
***
## 🗃️ Data <a name="data"></a>
The datasets used in this project will be taken from Li et al.[^5] We confine ourselves to the following three:
- **SemEval:** [ZITAT: Markert and Nissim, 2007 ]
The datasets used in this project will be taken from Li et al.[^6] We confine ourselves to the following three:
| 1. **SemEval: Locations** | 2. **SemEval: Companies & Organizations** | 3. **ReLocar: Locations** |
| 1. **SemEval: Locations** [^8]| 2. **SemEval: Companies & Organizations** [^8]| 3. **ReLocar: Locations** [^5]|
| ---------------- | -------------------------------- | -----------------------------------------|
|[ZITAT: Markert and Nissim, 2007 ]| [ZITAT: Markert and Nissim, 2007 ]|[ZITAT: Gritta et al., 2017 ]|
| <img src="documentation/images/semeval_loc_metonym_ratio.png"> | <img src="documentation/images/semeval_org_metonym_ratio.png"> | <img src="documentation/images/relocar_metonym_ratio.png">
🖊️ **Data Point Example:**
......@@ -212,24 +207,22 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
## 📑 References <a name="references"></a>
[^1]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.
[^2]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018.
[^3]: Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.
[^1]: Bayer, Markus, Kaufhold, Marc-André & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021.
[^4]: Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.
[^2]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.
[^5]: Longpre, Shayne, Wang, Yu & DuBois, Chris. ["How effective is task-agnostic data augmentation for pretrained transformers?"](https://aclanthology.org/) Findings of the Association for Computational Linguistics: EMNLP 2020, 2020.
[^3]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018.
[^4]: English Oxford Dictionary. ["Metonymy"](https://www.oxfordbibliographies.com/view/document/obo-9780199772810/obo-9780199772810-0252.xml)
[^6]: Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.
[^5]: Gritta, Milan, Pilehvar, Mohammad, Taher, Limsopatham, Nut & Collier, Nigel. ["Vancouver welcomes you! minimalist location metonymy resolution."](https://aclanthology.org/P17-1115) Proceedings of the 55th Annual Meeting of the Association for Computational, 2017.
[^6]: Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.
[^7]: Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
[^7]: Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.
[^8]: Sun, L., Xia, C., Yin, W., Liang, T., Yu, P. S., & He, L. (2020). Mixup-transformer: dynamic data augmentation for NLP tasks. arXiv preprint arXiv:2010.02394.
[^8]: Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.
[^9]: Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
[^9]: Zhang, Hongyi, Cissé, Moustapha, Dauphin, Yann N. & Lopez-Paz, David. mixup: Beyond empirical risk minimization. *CoRR*, 2017.
[^10]: Sun, Lichao, Xia, Congying, Yin, Wenpeng, Liang, Tingting, Yu, Philip S. & He, Lifang. ["Mixup-transformer: dynamic data augmentation for NLP tasks."](https://arxiv.org/abs/2010.02394) 2020.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment