Skip to content
Snippets Groups Projects
Commit a1c9984d authored by Victor Zimmermann's avatar Victor Zimmermann
Browse files

Update README.md

parent c7baa277
No related branches found
No related tags found
No related merge requests found
# Pseudo-Bilingual Hate Speech Identification
Using cross-lingual embeddings trained on Twitter and a BiLSTM for the hate speech classification task of GermEval2018
We present several cross-lingual approaches to hate speech detection based on
(pseudo-)cross-lingual embeddings trained on twitter data.
This repository provides the embeddings, systems and tools used for the software
project course in the summer of 2018.
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
These instructions will get you a copy of the project up and running on your
local machine for development and testing purposes. See deployment for notes on
how to deploy the project on a live system.
### Prerequisites
Python 3.6 is highly recommended to deploy this system. Using older versions
of Python may require manual changes in the code to run certain modules provided
in this repository.
```
pip3 install -r requirements.txt
cd kernseife
python3.6 -m venv env
source env/bin/activate
pip3.6 install -r requirements.txt
```
## Training Embeddings
We used train_skipgram.py by Michael Egger (See *Build With*) to train our
embeddings. The tweet corpora are not provided.
```
python training.py corpus_dir/ test.model -s 100 -w 5 -m 10
```
### Clustering
......@@ -22,27 +42,51 @@ Give an example
### Training
Explain what these tests test and why
We use the BiLSTM provided by Nils Reimers (See *Built With*). We advise to
follow the README in the kernseife/bilstm/ subdirectory to modify the
experimental setup. We provide multiple Train_*.py files, each corresponding to
one of our experiments. Appropriate datasets have to be provided in the data/
subdirectory. Embeddings have to be placed in the bilstm directory.
```
Give an example
cd bilstm
python3.6 Train_faruqui.py
```
## Demo
Our demo allows a user to input any string and let our system classify it as
either OFFENSE or OTHER. Any model provided in models/ can be used in our demo.
```
cd bilstm
python3.6 Demo.py models/demo.h5
```
## Deployment
Add additional notes about how to deploy this on a live system
```
cd bilstm
python3.6 RunModel.py modelPath inputPath language > outputPath
```
We provide our models in the models/ subdirectory. Input may be any .txt file
with each string to be classified on its own line. Available languages are *en*
and *de*
## Built With
* [BiLSTM-CNN-CRF Implementation for Sequence Tagging](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf)
* [train_skipgram.py](michael.egger@tsn.at)
## Authors
* **Anne-Kathrin Bugert** - [GitLab](https://gitlab.cl.uni-heidelberg.de/bugert)
* **Maja Hoffmann** - [GitLab](https://gitlab.cl.uni-heidelberg.de/hoffmann)
* **Victor Zimmermann** - [Web](https://www.cl.uni-heidelberg.de/~zimmermann)
See also the list of [contributors](https://github.com/your/project/contributors) who participated in this project.
* **Victor Zimmermann** - [Web](https://en.axt-im-haus.eu)
## License
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment