Skip to content
Snippets Groups Projects
README.md 1.82 KiB
Newer Older
Steffen Knapp's avatar
Steffen Knapp committed
# Sarcasm Detection In Amazon Reviews
Maximilian Blunck's avatar
Maximilian Blunck committed

Steffen Knapp's avatar
Steffen Knapp committed
## About
schoenwandt's avatar
schoenwandt committed
This app was developed and used for our software project (WS 17/18). 
The main purpose of the programme is to use a machine learning approach to detect irony in customer reviews. When running the main programme, several classifiers are trained and evaluated.
schoenwandt's avatar
schoenwandt committed
We use Elena Filatova's corpus containing ironic and non-ironic customer reviews from Amazon.com as our [data](https://github.com/ef2020/SarcasmAmazonReviewsCorpus/wiki).
schoenwandt's avatar
schoenwandt committed
## Setup 
We suggest running the `setup.sh` file. This creates a virtual python environment and installs all dependencies of the app.
Steffen Knapp's avatar
Steffen Knapp committed

schoenwandt's avatar
schoenwandt committed
	$ bash setup.sh

After running the setup, you will need to activate the virtualenv

	$ source sopro_env/bin/activate

Alternatively, you can manually install the following requirements.
Steffen Knapp's avatar
Steffen Knapp committed

## Requirements

schoenwandt's avatar
schoenwandt committed
The program requires NLTK, NumPy, SciPy, SciKit Learn, requests, textblob and matplotlib.
Steffen Knapp's avatar
Steffen Knapp committed
Please note that SciPy and NumPy need to be installed before SciKit Learn.

    $ pip install --upgrade pip
	$ pip install nltk
	$ pip install numpy
	$ pip install scipy
	$ pip install sklearn
	$ pip install requests
	$ pip install textblob
schoenwandt's avatar
schoenwandt committed
	$ python -mpip install -U matplotlib
Steffen Knapp's avatar
Steffen Knapp committed
	
schoenwandt's avatar
schoenwandt committed
## Run

To run the main programme, run `main.py`
schoenwandt's avatar
schoenwandt committed

	$ cd src/
	$ python3 main.py

With the default settings, several classifiers will be trained on 80% of the data and tested on the other 20%. Results will be then printed out and also saved to the `results/` directory. In this setting, a certain feature-combination is used which generated the best scores in various experiments.

Changes can be made in `config.py`. 
To generate cross-validation scores which can be compared to [Buschmeier et al.](http://acl2014.org/acl2014/W14-26/pdf/W14-2608.pdf), change the following variables:
Steffen Knapp's avatar
Steffen Knapp committed

schoenwandt's avatar
schoenwandt committed
	split_ratio = 1.0
	validate = True
Steffen Knapp's avatar
Steffen Knapp committed

schoenwandt's avatar
schoenwandt committed
See `config.py` itself for further options.