From e16fa7bc79da373635a44629e84ba876ae265f5f Mon Sep 17 00:00:00 2001 From: friebolin <friebolin@cl.uni-heidelberg.de> Date: Fri, 24 Feb 2023 14:12:44 +0100 Subject: [PATCH] Add Tmix --- README.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index f72afc0..4a40193 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,7 @@ Members of the project: 4. 💡 [Methods](#methods) 1. 📠[Backtranslation](#backtranslation) 2. 🸠[MixUp](#mixup) + 3. ðŸŒ[TMix](#tmix) 5. ðŸ—ƒï¸ [Data](#data) 6. ðŸ› ï¸ [Set Up](#set-up) 7. âš™ï¸ [Usage](#usage) @@ -80,7 +81,7 @@ When selecting methods for our task, the main goal was to find a tradeoff betwee To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space. ### 📠1. Backtranslation (Data Space)<a name="backtranslation"></a> -As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2] we use the pre-trained single models : +As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2a] we use the pre-trained single models : - [`transformer.wmt19.en-de.single_model`](https://huggingface.co/facebook/wmt19-en-de) - [`transformer.wmt19.de-en.single_model`](https://huggingface.co/facebook/wmt19-de-en) @@ -124,6 +125,10 @@ Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two inst We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction. The *MixUp* process can be used dynamically during training at any epoch. + +### 🌠3. TMix (Feature Space)<a name="tmix"></a> +We use the same set fixed $\lambda$, but in contrast to *MixUp*, *TMix* is applied in all epochs [^2b]. It can dynamically be used in any layer, and we focus our experiments on the transformer layers 7 and 9 for interpolation, since they have been found to contain the syntactic and semantic information. + *** ## ðŸ—ƒï¸ Data <a name="data"></a> @@ -211,7 +216,10 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w [^1]: Bayer, Markus, Kaufhold, Marc-André & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021. -[^2]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020. +[^2a]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-Supervised Models via Data Augmentation for Classifying Interactive Affective Responses."](https://arxiv.org/abs/2004.10972) 2020. + +[^2b]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification."](https://aclanthology.org/2020.acl-main.194) 2020. + [^3]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018. -- GitLab