Skip to content
Snippets Groups Projects
Commit e0e5b0f9 authored by holzinger's avatar holzinger
Browse files

updated experiments and result details

parent 451774bb
No related branches found
No related tags found
No related merge requests found
......@@ -5,10 +5,18 @@ Seminar project for the course [(Trans|Lin|Long|...)former: Self-Attention Mecha
[TOC]
## 1. (M|N)LP-Mixer :computer:
The work "[MLP-Mixer: An all-MLP Architecture for Vision](https://arxiv.org/pdf/2105.01601.pdf)" introduced an architecture called MLP-Mixer, that relies on multi-layer perceptrons (MLPs), applies it to image classification and offers an alternative to Transformer Networks. There are two different layer types in MLP-Mixer: one MLP-layer that acts on single image segments ("token mixing") and one MLP-layer that acts across those segments ("channel mixing"). For details see the following figure:
<img src="./mlp-mixer.png" width="500px"></img>
The aims of this project are:
* reconfigure the MLP-Mixer architecture for NLP tasks (➜ NLP-Mixer)
* create a comparable (convolutional) Baseline
* conduct experiments on datasets that are also used by other efficient transformer approaches
## 2. Instructions
### Installing the required modules
```
......@@ -31,7 +39,7 @@ arxiv/
dataloader = Data("arxiv")
dataloader.path = "/home/students/holzinger/tmp/arxiv"
```
## 2. Datasets :floppy_disk:
## 3. Datasets :floppy_disk:
To compare against results from other publications ([Big Bird](https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html), [Linformer](https://arxiv.org/abs/2006.04768) and [Longformer](https://arxiv.org/abs/2004.05150)) the following text classification datasets are used:
......@@ -43,14 +51,16 @@ To compare against results from other publications ([Big Bird](https://proceedin
| max.words | 2470 | 52 | 104698 |
## 3. Experiments :microscope:
## 4. Experiments :microscope:
### 3.1 Baseline
For all experiments the same parameters (epochs, learning rate, layer dimensionality, etc.) are used and can be accessed in the [result files](/results/). Only the embedding type (default / pretrained) is changed.
Instead of Mlp-Mixer-Blocks, Convolution-Blocks are used - otherwise the architecture stays the same.
### 4.1 Baseline
Instead of Mlp-Mixer-Blocks, Convolution-Blocks are used -- otherwise the architecture stays the same.
## 4. Results :bar_chart:
## 5. Results :bar_chart:
Accuracy for the text classification datasets:
......@@ -66,15 +76,15 @@ Accuracy for the text classification datasets:
</tr>
<tr>
<td>Baseline</td>
<td>85.2</td>
<td>77.0</td>
<td>90.9.</td>
<td>84.92</td>
<td>82.45</td>
<td>90.94</td>
</tr>
<tr>
<td>NLP-Mixer</td>
<td>83.2</td>
<td></td>
<td>93.6</td>
<td>85.00</td>
<td>80.50</td>
<td>93.43</td>
</tr>
<tr>
<td colspan="4" style="text-align: center; vertical-align: middle;">with pretraining</td>
......@@ -99,26 +109,21 @@ Accuracy for the text classification datasets:
</tr>
<tr>
<td>Baseline</td>
<td></td>
<td></td>
<td>---**</td>
<td>88.40</td>
<td>91.86</td>
<td>92.97</td>
</tr>
<tr>
<td>NLP-Mixer</td>
<td></td>
<td></td>
<td></td>
<td>88.27</td>
<td>92.89</td>
<td>95.00</td>
</tr>
</table>
\* *all 6 labels*
** *Out of Memory*
Speed comparison:
TBD
## 5. Conclusion :bulb:
## 6. Conclusion :bulb:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment