Skip to content
Snippets Groups Projects
Commit e16fa7bc authored by friebolin's avatar friebolin
Browse files

Add Tmix

parent 16fd7ca2
No related branches found
No related tags found
No related merge requests found
......@@ -13,6 +13,7 @@ Members of the project:
4. 💡 [Methods](#methods)
1. 📝 [Backtranslation](#backtranslation)
2. 🍸 [MixUp](#mixup)
3. 🌐[TMix](#tmix)
5. 🗃️ [Data](#data)
6. 🛠️ [Set Up](#set-up)
7. ⚙️ [Usage](#usage)
......@@ -80,7 +81,7 @@ When selecting methods for our task, the main goal was to find a tradeoff betwee
To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space.
### 📝 1. Backtranslation (Data Space)<a name="backtranslation"></a>
As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2] we use the pre-trained single models :
As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2a] we use the pre-trained single models :
- [`transformer.wmt19.en-de.single_model`](https://huggingface.co/facebook/wmt19-en-de)
- [`transformer.wmt19.de-en.single_model`](https://huggingface.co/facebook/wmt19-de-en)
......@@ -124,6 +125,10 @@ Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two inst
We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
The *MixUp* process can be used dynamically during training at any epoch.
### 🌐 3. TMix (Feature Space)<a name="tmix"></a>
We use the same set fixed $\lambda$, but in contrast to *MixUp*, *TMix* is applied in all epochs [^2b]. It can dynamically be used in any layer, and we focus our experiments on the transformer layers 7 and 9 for interpolation, since they have been found to contain the syntactic and semantic information.
***
## 🗃️ Data <a name="data"></a>
......@@ -211,7 +216,10 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
[^1]: Bayer, Markus, Kaufhold, Marc-André & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021.
[^2]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.
[^2a]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-Supervised Models via Data Augmentation for Classifying Interactive Affective Responses."](https://arxiv.org/abs/2004.10972) 2020.
[^2b]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification."](https://aclanthology.org/2020.acl-main.194) 2020.
[^3]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment