updated experiments and result details

e0e5b0f9 · holzinger · 451774bb · e0e5b0f9
Commit e0e5b0f9 authored 2 years ago by holzinger
--- a/README.md
+++ b/README.md
@@ -5,10 +5,18 @@ Seminar project for the course [(Trans|Lin|Long|...)former: Self-Attention Mecha
 [TOC]

 ## 1. (M|N)LP-Mixer :computer:
-  
+
+The work "[MLP-Mixer: An all-MLP Architecture for Vision](https://arxiv.org/pdf/2105.01601.pdf)" introduced an architecture called MLP-Mixer, that relies on multi-layer perceptrons (MLPs), applies it to image classification and offers an alternative to Transformer Networks. There are two different layer types in MLP-Mixer: one MLP-layer that acts on single image segments ("token mixing") and one MLP-layer that acts across those segments ("channel mixing"). For details see the following figure:
+
 <img src="./mlp-mixer.png" width="500px"></img>

+The aims of this project are:
+* reconfigure the MLP-Mixer architecture for NLP tasks (➜ NLP-Mixer)
+* create a comparable (convolutional) Baseline
+* conduct experiments on datasets that are also used by other efficient transformer approaches
+

+## 2. Instructions
 ### Installing the required modules

 ```
@@ -31,7 +39,7 @@ arxiv/
 dataloader = Data("arxiv")
 dataloader.path = "/home/students/holzinger/tmp/arxiv"
 ```
-## 2. Datasets :floppy_disk:
+## 3. Datasets :floppy_disk:

 To compare against results from other publications ([Big Bird](https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html), [Linformer](https://arxiv.org/abs/2006.04768) and [Longformer](https://arxiv.org/abs/2004.05150)) the following text classification datasets are used:

@@ -43,14 +51,16 @@ To compare against results from other publications ([Big Bird](https://proceedin
 |  max.words |  2470 | 52  |  104698 |


-## 3. Experiments :microscope:
+## 4. Experiments :microscope:

-### 3.1 Baseline
+For all experiments the same parameters (epochs, learning rate, layer dimensionality, etc.) are used and can be accessed in the [result files](/results/). Only the embedding type (default / pretrained) is changed.

-Instead of Mlp-Mixer-Blocks, Convolution-Blocks are used - otherwise the architecture stays the same.
+### 4.1 Baseline

+Instead of Mlp-Mixer-Blocks, Convolution-Blocks are used -- otherwise the architecture stays the same.

-## 4. Results :bar_chart:
+
+## 5. Results :bar_chart:

 Accuracy for the text classification datasets:

@@ -66,15 +76,15 @@ Accuracy for the text classification datasets:
    </tr>
    <tr>
        <td>Baseline</td>
-        <td>85.2</td>
-        <td>77.0</td>
-        <td>90.9.</td>
+        <td>84.92</td>
+        <td>82.45</td>
+        <td>90.94</td>
    </tr>
    <tr>
        <td>NLP-Mixer</td>
-        <td>83.2</td>
-        <td></td>
-        <td>93.6</td>
+        <td>85.00</td>
+        <td>80.50</td>
+        <td>93.43</td>
    </tr>
    <tr>
        <td colspan="4" style="text-align: center; vertical-align: middle;">with pretraining</td>
@@ -99,26 +109,21 @@ Accuracy for the text classification datasets:
    </tr>
    <tr>
        <td>Baseline</td>
-        <td></td>
-        <td></td>
-        <td>---**</td>
+        <td>88.40</td>
+        <td>91.86</td>
+        <td>92.97</td>
    </tr>
    <tr>
        <td>NLP-Mixer</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>88.27</td>
+        <td>92.89</td>
+        <td>95.00</td>
    </tr>
 </table>

 \*  *all 6 labels*  

-** *Out of Memory*
-
-Speed comparison:  
-TBD
-
-## 5. Conclusion :bulb:
+## 6. Conclusion :bulb: