Update README.md

a1c9984d · Victor Zimmermann · c7baa277 · a1c9984d
Commit a1c9984d authored 6 years ago by Victor Zimmermann
--- a/README.md
+++ b/README.md
 # Pseudo-Bilingual Hate Speech Identification
-Using cross-lingual embeddings trained on Twitter and a BiLSTM for the hate speech classification task of GermEval2018
+We present several cross-lingual approaches to hate speech detection based on
+(pseudo-)cross-lingual embeddings trained on twitter data.
+This repository provides the embeddings, systems and tools used for the software
+project course in the summer of 2018.

 ## Getting Started

-These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
+These instructions will get you a copy of the project up and running on your 
+local machine for development and testing purposes. See deployment for notes on 
+how to deploy the project on a live system.

 ### Prerequisites

+Python 3.6 is highly recommended to deploy this system. Using older versions
+of Python may require manual changes in the code to run certain modules provided
+in this repository.
+
 ```
-pip3 install -r requirements.txt
+cd kernseife
+
+python3.6 -m venv env
+source env/bin/activate
+
+pip3.6 install -r requirements.txt
 ```

 ## Training Embeddings
+We used train_skipgram.py by Michael Egger (See *Build With*) to train our
+embeddings. The tweet corpora are not provided.
+
+```
+python training.py corpus_dir/ test.model -s 100 -w 5 -m 10
+```

 ### Clustering

@@ -22,27 +42,51 @@ Give an example

 ### Training

-Explain what these tests test and why
+We use the BiLSTM provided by Nils Reimers (See *Built With*). We advise to
+follow the README in the kernseife/bilstm/ subdirectory to modify the
+experimental setup. We provide multiple Train_*.py files, each corresponding to
+one of our experiments. Appropriate datasets have to be provided in the data/
+subdirectory. Embeddings have to be placed in the bilstm directory.

 ```
-Give an example
+cd bilstm
+
+python3.6 Train_faruqui.py
+```
+
+## Demo
+
+Our demo allows a user to input any string and let our system classify it as
+either OFFENSE or OTHER. Any model provided in models/ can be used in our demo.
+
+```
+cd bilstm
+
+python3.6 Demo.py models/demo.h5
 ```

 ## Deployment

-Add additional notes about how to deploy this on a live system
+```
+cd bilstm
+
+python3.6 RunModel.py modelPath inputPath language > outputPath
+```
+
+We provide our models in the models/ subdirectory. Input may be any .txt file
+with each string to be classified on its own line. Available languages are *en*
+and *de*

 ## Built With

 * [BiLSTM-CNN-CRF Implementation for Sequence Tagging](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf)
+* [train_skipgram.py](michael.egger@tsn.at)

 ## Authors

 * **Anne-Kathrin Bugert** - [GitLab](https://gitlab.cl.uni-heidelberg.de/bugert)
 * **Maja Hoffmann** - [GitLab](https://gitlab.cl.uni-heidelberg.de/hoffmann)
-* **Victor Zimmermann** - [Web](https://www.cl.uni-heidelberg.de/~zimmermann)
-
-See also the list of [contributors](https://github.com/your/project/contributors) who participated in this project.
+* **Victor Zimmermann** - [Web](https://en.axt-im-haus.eu)

 ## License