added data sources and results table

08bd6661 · holzinger · 2b0d7771 · 08bd6661
Commit 08bd6661 authored 2 years ago by holzinger
--- a/README.md
+++ b/README.md
@@ -2,39 +2,136 @@

 Seminar project for the course [(Trans|Lin|Long|...)former: Self-Attention Mechanisms](https://www.cl.uni-heidelberg.de/courses/ws21/sam/)

-## Table of Contents
-
 [TOC]

-## 1. MLP-Mixer :computer:
+## 1. (M|N)LP-Mixer :computer:
+  
+<img src="./mlp-mixer.png" width="500px"></img>
+

-### Requirements
+### Installing the required modules

 ```
 pip install -r requirements.txt
 ```
-## 2. Datasets :file_folder:

+### Building arXiv dataset

-## 3. Experiments :microscope:
+1. download [cs.AI.rar](https://github.com/LiqunW/Long-document-dataset/raw/master/cs.AI.rar) and [cs.NE.rar](https://github.com/LiqunW/Long-document-dataset/raw/master/cs.AI.rar)
+2. extract files and create the following folder structure:
+ ```
+arxiv/                    
+    cs.AI/
+        *.txt
+    cs.NE/
+        *.txt
+```
+3. set Data.path from the [Dataloader-module](Dataloader.py) to the path where your arxiv folder is (e.g. */home/students/holzinger/tmp/arxiv*)
+## 2. Datasets :floppy_disk:

-### 3.1 Baselines
+To compare against results from other publications ([Big Bird](https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html), [Linformer](https://arxiv.org/abs/2006.04768) and [Longformer](https://arxiv.org/abs/2004.05150)) the following text classification datasets are used:

+| Dataset  | [IMDb](https://ai.stanford.edu/~amaas/data/sentiment/)  | [SST-2](https://nlp.stanford.edu/sentiment/index.html)  | [arXiv](https://github.com/LiqunW/Long-document-dataset)  |
+|---|---:|---:|---:|
+| #classes  |  2 |  2 | 2 |
+| #examples  | 25000  | 67349  | 6007  |
+|  avg.words | 233  |  9 | 8096  |
+|  max.words |  2470 | 52  |  104698 |
+
+
+## 3. Experiments :microscope:

-### 3.2 Ablation
+### 3.1 Baseline

+Instead of Mlp-Mixer-Blocks, Convolution-Blocks are used - otherwise the architecture stays the same.


 ## 4. Results :bar_chart:

+Accuracy for the text classification datasets:
+
+<table>
+    <tr>
+        <td></td>
+        <td>IMDb</td>
+        <td>SST-2</td>
+        <td>arXiv</td>
+    </tr>
+    <tr>
+        <td colspan="4" style="text-align: center; vertical-align: middle;">without pretraining</td>
+    </tr>
+    <tr>
+        <td>Baseline</td>
+        <td>85.2</td>
+        <td>77.0</td>
+        <td>90.9.</td>
+    </tr>
+    <tr>
+        <td>NLP-Mixer</td>
+        <td>83.2</td>
+        <td></td>
+        <td>93.6</td>
+    </tr>
+    <tr>
+        <td colspan="4" style="text-align: center; vertical-align: middle;">with pretraining</td>
+    </tr>
+    <tr>
+        <td>Longformer</td>
+        <td>95.7</td>
+        <td>---</td>
+        <td>---</td>
+    </tr>
+    <tr>
+        <td>Linformer</td>
+        <td>94.2</td>
+        <td>93.4</td>
+        <td>---</td>
+    </tr>
+    <tr>
+        <td>BigBird</td>
+        <td>95.2</td>
+        <td>94.6</td>
+        <td>92.3*</td>
+    </tr>
+    <tr>
+        <td>Baseline</td>
+        <td></td>
+        <td></td>
+        <td>---**</td>
+    </tr>
+    <tr>
+        <td>NLP-Mixer</td>
+        <td></td>
+        <td></td>
+        <td></td>
+    </tr>
+</table>
+
+\*  *all 6 labels*  
+
+** *Out of Memory*
+
+Speed comparison:  
+TBD
+
 ## 5. Conclusion :bulb:



-## 7. Resources :notebook_with_decorative_cover:
+## Resources :notebook_with_decorative_cover:

 :page_facing_up: Tolstikhin, I. O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). **Mlp-mixer: An all-mlp architecture for vision**. *Advances in Neural Information Processing Systems*, 34. [[PDF]](https://arxiv.org/pdf/2105.01601.pdf)

+:page_facing_up: Johannes Kiesel, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. **SemEval-2019 Task 4: Hyperpartisan News Detection**. In *13th International Workshop on Semantic Evaluation (SemEval 2019)*, pages 829-839, June 2019. Association for Computational Linguistics. [[PDF]](https://aclanthology.org/S19-2145.pdf)
+
+:page_facing_up: Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). **Learning word vectors for sentiment analysis**. In *Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies* (pp. 142-150). [[PDF]](https://aclanthology.org/P11-1015.pdf)
+
+:page_facing_up: Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., ... & Ahmed, A. (2020). **Big bird: Transformers for longer sequences**. *Advances in Neural Information Processing Systems, 33*, 17283-17297. [[PDF]](https://arxiv.org/pdf/2007.14062.pdf)
+
+:page_facing_up: Wang, S., Li, B. Z., Khabsa, M., Fang, H., & Ma, H. (2020). **Linformer: Self-attention with linear complexity**. *arXiv preprint arXiv:2006.04768.* [[PDF]](https://arxiv.org/pdf/2006.04768.pdf)
+
+:page_facing_up: Beltagy, I., Peters, M. E., & Cohan, A. (2020). **Longformer: The long-document transformer**. *arXiv preprint arXiv:2004.05150.* [[PDF]](https://arxiv.org/pdf/2004.05150.pdf)
+
 ## LICENSE

 This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
\ No newline at end of file