Skip to content
Snippets Groups Projects
Commit 4702cf04 authored by blunck's avatar blunck
Browse files

Updated readme

parent 321a3e33
No related branches found
No related tags found
No related merge requests found
......@@ -14,13 +14,7 @@ We suggest running the `setup.sh` file. This creates a virtual python environmen
$ bash setup.sh
After running the setup, you will need to activate the virtualenv
$ source sopro_env/bin/activate
Alternatively, you can manually install the following requirements.
## Requirements
Alternatively, you can manually install the following requirements:
The program requires NLTK, NumPy, SciPy, SciKit Learn, requests, textblob and matplotlib.
Please note that SciPy and NumPy need to be installed before SciKit Learn.
......@@ -36,17 +30,65 @@ Please note that SciPy and NumPy need to be installed before SciKit Learn.
## Run
To run the main programm run `main.py`
If not already activated, activate the virtualenv
$ source sopro_env/bin/activate
To run the main programm run `main.py`.
$ cd src/
$ python3 main.py
With the default settings, several classifiers will be trained on 80% of the data and tested on the other 20%. Results will be then printed out and also saved to the `results/` directory. In this setting, a certain feature-combination is used which generated the best scores in various experiments.
With the default settings, several classifiers will be trained on 80% of the data and tested on the other 20%. Results will be then printed out and also saved to the `results/` directory. In this setting, a certain feature-combination is used, which generated the best scores in prior experiments.
Changes can be made in `config.py`. Examples:
Changes can be made in `config.py`.
To generate cross-validation scores which can be compared to [Buschmeier et al.](http://acl2014.org/acl2014/W14-26/pdf/W14-2608.pdf), change the following variables:
To generate cross-validation scores which can be compared to [Buschmeier et al.](http://acl2014.org/acl2014/W14-26/pdf/W14-2608.pdf), change the following variables to:
split_ratio = 1.0
validate = True
See `config.py` itself for further options.
\ No newline at end of file
To choose a different combination of Features, modify the following variable:
feature_selection = ['f1', 'f4', 'f7']
If you'd like to run the programm for all possible combinations of the selected features, change the following variable to:
use_all_variants = True
Feature specific options like the n-parameter of the bag-of-n-grams feature can also be adjusted. Changing the following variable as shown will make the feature extract uni- and bigrams:
n_range_words = (1,2)
See `config.py` itself for further options.
## App Structure
### Main Programm
- main.py > entry point to App, calls machine_learning.py's run()-function
### Feature Related Files
- feature.py > provides an abstract Feature class
|- ngram_feature.py > inherites from Feature, offers method for extracting F1 feature
|- surface_patterns.py > inherites from NGramFeature, offers method for extracting F3 feature
|- pos_feature.py > inherites from Feature, offers method for extracting F2 feature
|- sent_rating_feature.py > inherites from Feature, offers method for extracting F4 feature
|- punctuation_feature.py > inherites from Feature, offers method for extracting F5 feature
|- contrast_feature.py > inherites from Feature, offers method for extracting F6 feature
|- stars_feature.py > inherites from Feature, offers method for extracting F7 feature
- feature_extraction.py > provides functions for extracting and concatenating feature vectors
### Machine Learning
- machine_learning.py > includes run-function, which incorperates all ML related steps (training,testing,..)
### Other
- corpus.py > contains a reading function to load corpus, can also be run to convert raw corpus
- utilities.py > collection of functions & helpers used throughout the app
- config.py > file for adjusting setting and options
### Directories
- src/ > holds all the source code above
- results/ > default location where test/validation results are saved
- corpus/ > contains complete corpus in a single csv-file (shuffled)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment