Seminar project for the course [(Trans|Lin|Long|...)former: Self-Attention Mechanisms](https://www.cl.uni-heidelberg.de/courses/ws21/sam/)
## Table of Contents
[TOC]
## 1. MLP-Mixer :computer:
## 1. (M|N)LP-Mixer :computer:
<imgsrc="./mlp-mixer.png"width="500px"></img>
### Requirements
### Installing the required modules
```
pip install -r requirements.txt
```
## 2. Datasets :file_folder:
### Building arXiv dataset
## 3. Experiments :microscope:
1. download [cs.AI.rar](https://github.com/LiqunW/Long-document-dataset/raw/master/cs.AI.rar) and [cs.NE.rar](https://github.com/LiqunW/Long-document-dataset/raw/master/cs.AI.rar)
2. extract files and create the following folder structure:
```
arxiv/
cs.AI/
*.txt
cs.NE/
*.txt
```
3. set Data.path from the [Dataloader-module](Dataloader.py) to the path where your arxiv folder is (e.g. */home/students/holzinger/tmp/arxiv*)
## 2. Datasets :floppy_disk:
### 3.1 Baselines
To compare against results from other publications ([Big Bird](https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html), [Linformer](https://arxiv.org/abs/2006.04768) and [Longformer](https://arxiv.org/abs/2004.05150)) the following text classification datasets are used:
:page_facing_up: Tolstikhin, I. O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). **Mlp-mixer: An all-mlp architecture for vision**. *Advances in Neural Information Processing Systems*, 34. [[PDF]](https://arxiv.org/pdf/2105.01601.pdf)
:page_facing_up: Johannes Kiesel, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. **SemEval-2019 Task 4: Hyperpartisan News Detection**. In *13th International Workshop on Semantic Evaluation (SemEval 2019)*, pages 829-839, June 2019. Association for Computational Linguistics. [[PDF]](https://aclanthology.org/S19-2145.pdf)
:page_facing_up: Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). **Learning word vectors for sentiment analysis**. In *Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies* (pp. 142-150). [[PDF]](https://aclanthology.org/P11-1015.pdf)
:page_facing_up: Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., ... & Ahmed, A. (2020). **Big bird: Transformers for longer sequences**. *Advances in Neural Information Processing Systems, 33*, 17283-17297. [[PDF]](https://arxiv.org/pdf/2007.14062.pdf)
:page_facing_up: Wang, S., Li, B. Z., Khabsa, M., Fang, H., & Ma, H. (2020). **Linformer: Self-attention with linear complexity**. *arXiv preprint arXiv:2006.04768.*[[PDF]](https://arxiv.org/pdf/2006.04768.pdf)
:page_facing_up: Beltagy, I., Peters, M. E., & Cohan, A. (2020). **Longformer: The long-document transformer**. *arXiv preprint arXiv:2004.05150.*[[PDF]](https://arxiv.org/pdf/2004.05150.pdf)
## LICENSE
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details