Skip to content
Snippets Groups Projects
Commit 08bd6661 authored by holzinger's avatar holzinger
Browse files

added data sources and results table

parent 2b0d7771
No related branches found
No related tags found
No related merge requests found
......@@ -2,39 +2,136 @@
Seminar project for the course [(Trans|Lin|Long|...)former: Self-Attention Mechanisms](https://www.cl.uni-heidelberg.de/courses/ws21/sam/)
## Table of Contents
[TOC]
## 1. MLP-Mixer :computer:
## 1. (M|N)LP-Mixer :computer:
<img src="./mlp-mixer.png" width="500px"></img>
### Requirements
### Installing the required modules
```
pip install -r requirements.txt
```
## 2. Datasets :file_folder:
### Building arXiv dataset
## 3. Experiments :microscope:
1. download [cs.AI.rar](https://github.com/LiqunW/Long-document-dataset/raw/master/cs.AI.rar) and [cs.NE.rar](https://github.com/LiqunW/Long-document-dataset/raw/master/cs.AI.rar)
2. extract files and create the following folder structure:
```
arxiv/
cs.AI/
*.txt
cs.NE/
*.txt
```
3. set Data.path from the [Dataloader-module](Dataloader.py) to the path where your arxiv folder is (e.g. */home/students/holzinger/tmp/arxiv*)
## 2. Datasets :floppy_disk:
### 3.1 Baselines
To compare against results from other publications ([Big Bird](https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html), [Linformer](https://arxiv.org/abs/2006.04768) and [Longformer](https://arxiv.org/abs/2004.05150)) the following text classification datasets are used:
| Dataset | [IMDb](https://ai.stanford.edu/~amaas/data/sentiment/) | [SST-2](https://nlp.stanford.edu/sentiment/index.html) | [arXiv](https://github.com/LiqunW/Long-document-dataset) |
|---|---:|---:|---:|
| #classes | 2 | 2 | 2 |
| #examples | 25000 | 67349 | 6007 |
| avg.words | 233 | 9 | 8096 |
| max.words | 2470 | 52 | 104698 |
## 3. Experiments :microscope:
### 3.2 Ablation
### 3.1 Baseline
Instead of Mlp-Mixer-Blocks, Convolution-Blocks are used - otherwise the architecture stays the same.
## 4. Results :bar_chart:
Accuracy for the text classification datasets:
<table>
<tr>
<td></td>
<td>IMDb</td>
<td>SST-2</td>
<td>arXiv</td>
</tr>
<tr>
<td colspan="4" style="text-align: center; vertical-align: middle;">without pretraining</td>
</tr>
<tr>
<td>Baseline</td>
<td>85.2</td>
<td>77.0</td>
<td>90.9.</td>
</tr>
<tr>
<td>NLP-Mixer</td>
<td>83.2</td>
<td></td>
<td>93.6</td>
</tr>
<tr>
<td colspan="4" style="text-align: center; vertical-align: middle;">with pretraining</td>
</tr>
<tr>
<td>Longformer</td>
<td>95.7</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>Linformer</td>
<td>94.2</td>
<td>93.4</td>
<td>---</td>
</tr>
<tr>
<td>BigBird</td>
<td>95.2</td>
<td>94.6</td>
<td>92.3*</td>
</tr>
<tr>
<td>Baseline</td>
<td></td>
<td></td>
<td>---**</td>
</tr>
<tr>
<td>NLP-Mixer</td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
\* *all 6 labels*
** *Out of Memory*
Speed comparison:
TBD
## 5. Conclusion :bulb:
## 7. Resources :notebook_with_decorative_cover:
## Resources :notebook_with_decorative_cover:
:page_facing_up: Tolstikhin, I. O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). **Mlp-mixer: An all-mlp architecture for vision**. *Advances in Neural Information Processing Systems*, 34. [[PDF]](https://arxiv.org/pdf/2105.01601.pdf)
:page_facing_up: Johannes Kiesel, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. **SemEval-2019 Task 4: Hyperpartisan News Detection**. In *13th International Workshop on Semantic Evaluation (SemEval 2019)*, pages 829-839, June 2019. Association for Computational Linguistics. [[PDF]](https://aclanthology.org/S19-2145.pdf)
:page_facing_up: Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). **Learning word vectors for sentiment analysis**. In *Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies* (pp. 142-150). [[PDF]](https://aclanthology.org/P11-1015.pdf)
:page_facing_up: Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., ... & Ahmed, A. (2020). **Big bird: Transformers for longer sequences**. *Advances in Neural Information Processing Systems, 33*, 17283-17297. [[PDF]](https://arxiv.org/pdf/2007.14062.pdf)
:page_facing_up: Wang, S., Li, B. Z., Khabsa, M., Fang, H., & Ma, H. (2020). **Linformer: Self-attention with linear complexity**. *arXiv preprint arXiv:2006.04768.* [[PDF]](https://arxiv.org/pdf/2006.04768.pdf)
:page_facing_up: Beltagy, I., Peters, M. E., & Cohan, A. (2020). **Longformer: The long-document transformer**. *arXiv preprint arXiv:2004.05150.* [[PDF]](https://arxiv.org/pdf/2004.05150.pdf)
## LICENSE
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment