@@ -5,10 +5,18 @@ Seminar project for the course [(Trans|Lin|Long|...)former: Self-Attention Mecha
[TOC]
## 1. (M|N)LP-Mixer :computer:
The work "[MLP-Mixer: An all-MLP Architecture for Vision](https://arxiv.org/pdf/2105.01601.pdf)" introduced an architecture called MLP-Mixer, that relies on multi-layer perceptrons (MLPs), applies it to image classification and offers an alternative to Transformer Networks. There are two different layer types in MLP-Mixer: one MLP-layer that acts on single image segments ("token mixing") and one MLP-layer that acts across those segments ("channel mixing"). For details see the following figure:
<imgsrc="./mlp-mixer.png"width="500px"></img>
The aims of this project are:
* reconfigure the MLP-Mixer architecture for NLP tasks (➜ NLP-Mixer)
* create a comparable (convolutional) Baseline
* conduct experiments on datasets that are also used by other efficient transformer approaches
To compare against results from other publications ([Big Bird](https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html), [Linformer](https://arxiv.org/abs/2006.04768) and [Longformer](https://arxiv.org/abs/2004.05150)) the following text classification datasets are used:
...
...
@@ -43,14 +51,16 @@ To compare against results from other publications ([Big Bird](https://proceedin
| max.words | 2470 | 52 | 104698 |
## 3. Experiments :microscope:
## 4. Experiments :microscope:
### 3.1 Baseline
For all experiments the same parameters (epochs, learning rate, layer dimensionality, etc.) are used and can be accessed in the [result files](/results/). Only the embedding type (default / pretrained) is changed.
Instead of Mlp-Mixer-Blocks, Convolution-Blocks are used - otherwise the architecture stays the same.
### 4.1 Baseline
Instead of Mlp-Mixer-Blocks, Convolution-Blocks are used -- otherwise the architecture stays the same.
## 4. Results :bar_chart:
## 5. Results :bar_chart:
Accuracy for the text classification datasets:
...
...
@@ -66,15 +76,15 @@ Accuracy for the text classification datasets: