diff --git a/README.md b/README.md index 0032ee7c477953be3680660782ad7b9b69f0010f..f64ca3d483813ed80958f626226e6ed67f86adc0 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ This repo contains the resources (data and scripts) for the workshop entitled "P ## Repo structure -* _data/_ contains the untagged -- but annotated -- files of the [Old Bailey Corpus](http://fedora.clarin-d.uni-saarland.de/oldbailey/index.html) +* _data/_ contains the untagged -- but annotated -- files of the [Old Bailey Corpus](http://fedora.clarin-d.uni-saarland.de/oldbailey/index.html) (OBC) * _output/_ is used as the default output directory for created knowledge graphs * _output/gold/_ contains some sample knowledge graphs for the years 1720, 1820, and 1913 * _src/_ contains all scripts @@ -58,4 +58,38 @@ Install with pip: **NOTE**: _spacy_ must not be version 2.1.5! -## \ No newline at end of file +## How to create knowledge graphs + +The file to interact with to generate knowledge graphs from the OBC is _main.py_. + +To re-create the sample knowledge graph for the year 1913, simply type: + + python main.py -year 1913 -output_path "../output/example_graph_1913.json" + +To display all available options, type: + + python main.py -h + +This gives you: + + usage: main.py [-h] [-year YEAR [YEAR ...]] [-output_path OUTPUT_PATH] + [-text_node_simplification_mode {None,classifier,spacy_direct_object}] + [-verbose {0,1,2}] + [-prune_text_nodes_min_freq PRUNE_TEXT_NODES_MIN_FREQ] + + Building a graph from old bailey corpus. + + optional arguments: + -h, --help show this help message and exit + -year YEAR [YEAR ...] + what year(s) the graph should cover, leave empty to + cover all available years + -output_path OUTPUT_PATH + where to save output graph + -text_node_simplification_mode {None,classifier,spacy_direct_object} + simplify text nodes, e.g.: "murder of x , on the + Monday of(...)" --> "murder" + -verbose {0,1,2} logging level + -prune_text_nodes_min_freq PRUNE_TEXT_NODES_MIN_FREQ + how often a text description node has to occur to be + included in the graph \ No newline at end of file