From e16fa7bc79da373635a44629e84ba876ae265f5f Mon Sep 17 00:00:00 2001
From: friebolin <friebolin@cl.uni-heidelberg.de>
Date: Fri, 24 Feb 2023 14:12:44 +0100
Subject: [PATCH] Add Tmix

---
 README.md | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index f72afc0..4a40193 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,7 @@ Members of the project:
 4. 💡 [Methods](#methods)
     1. 📝 [Backtranslation](#backtranslation)
     2. 🍸 [MixUp](#mixup)
+    3. 🌐[TMix](#tmix)
 5. 🗃️ [Data](#data)
 6. 🛠️ [Set Up](#set-up)
 7. ⚙️ [Usage](#usage)
@@ -80,7 +81,7 @@ When selecting methods for our task, the main goal was to find a tradeoff betwee
 To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space.
 
 ### 📝 1. Backtranslation (Data Space)<a name="backtranslation"></a>
-As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2] we use the pre-trained single models :
+As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2a] we use the pre-trained single models :
 
     - [`transformer.wmt19.en-de.single_model`](https://huggingface.co/facebook/wmt19-en-de)
     - [`transformer.wmt19.de-en.single_model`](https://huggingface.co/facebook/wmt19-de-en)
@@ -124,6 +125,10 @@ Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two inst
 We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
 The *MixUp* process can be used dynamically during training at any epoch.
 
+
+### 🌐 3. TMix (Feature Space)<a name="tmix"></a>
+We use the same set fixed $\lambda$, but in contrast to *MixUp*, *TMix* is applied in all epochs [^2b]. It can dynamically be used in any layer, and we focus our experiments on the transformer layers 7 and 9 for interpolation, since they have been found to contain the syntactic and semantic information. 
+
 ***
 
 ## 🗃️ Data <a name="data"></a>
@@ -211,7 +216,10 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w
 
 [^1]: Bayer, Markus, Kaufhold, Marc-André & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021.
 
-[^2]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.
+[^2a]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-Supervised Models via Data Augmentation for Classifying Interactive Affective Responses."](https://arxiv.org/abs/2004.10972) 2020.
+
+[^2b]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification."](https://aclanthology.org/2020.acl-main.194) 2020.
+
 
 [^3]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018. 
 
-- 
GitLab