diff --git a/README.md b/README.md index 0032ee7c477953be3680660782ad7b9b69f0010f..9bee1dc7325ab3f98445116c614ec2509e5e3c29 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,9 @@ This repo contains the resources (data and scripts) for the workshop entitled "P ## Repo structure -* _data/_ contains the untagged -- but annotated -- files of the [Old Bailey Corpus](http://fedora.clarin-d.uni-saarland.de/oldbailey/index.html) +* _data/_ contains the untagged -- but annotated -- files of the [Old Bailey Corpus](http://fedora.clarin-d.uni-saarland.de/oldbailey/index.html) (OBC) * _output/_ is used as the default output directory for created knowledge graphs - * _output/gold/_ contains some sample knowledge graphs for the years 1720, 1820, and 1913 + * _output/examples/_ contains some sample knowledge graphs for the years 1720, 1820, and 1913 * _src/_ contains all scripts * _visualization/_ contains the visualization suite (see separate README) @@ -58,4 +58,38 @@ Install with pip: **NOTE**: _spacy_ must not be version 2.1.5! -## \ No newline at end of file +## How to create knowledge graphs + +The file to interact with to generate knowledge graphs from the OBC is _main.py_. + +To re-create the sample knowledge graph for the year 1913, simply type: + + python main.py -year 1913 -output_path "../output/example_graph_1913.json" + +To display all available options, type: + + python main.py -h + +This gives you: + + usage: main.py [-h] [-year YEAR [YEAR ...]] [-output_path OUTPUT_PATH] + [-text_node_simplification_mode {None,classifier,spacy_direct_object}] + [-verbose {0,1,2}] + [-prune_text_nodes_min_freq PRUNE_TEXT_NODES_MIN_FREQ] + + Building a graph from old bailey corpus. + + optional arguments: + -h, --help show this help message and exit + -year YEAR [YEAR ...] + what year(s) the graph should cover, leave empty to + cover all available years + -output_path OUTPUT_PATH + where to save output graph + -text_node_simplification_mode {None,classifier,spacy_direct_object} + simplify text nodes, e.g.: "murder of x , on the + Monday of(...)" --> "murder" + -verbose {0,1,2} logging level + -prune_text_nodes_min_freq PRUNE_TEXT_NODES_MIN_FREQ + how often a text description node has to occur to be + included in the graph diff --git a/output/gold/example_graph_1720.json b/output/examples/example_graph_1720.json similarity index 100% rename from output/gold/example_graph_1720.json rename to output/examples/example_graph_1720.json diff --git a/output/gold/example_graph_1720_simplified_classifier.json b/output/examples/example_graph_1720_simplified_classifier.json similarity index 100% rename from output/gold/example_graph_1720_simplified_classifier.json rename to output/examples/example_graph_1720_simplified_classifier.json diff --git a/output/gold/example_graph_1720_simplified_spacy.json b/output/examples/example_graph_1720_simplified_spacy.json similarity index 100% rename from output/gold/example_graph_1720_simplified_spacy.json rename to output/examples/example_graph_1720_simplified_spacy.json diff --git a/output/gold/example_graph_1820.json b/output/examples/example_graph_1820.json similarity index 100% rename from output/gold/example_graph_1820.json rename to output/examples/example_graph_1820.json diff --git a/output/gold/example_graph_1913.json b/output/examples/example_graph_1913.json similarity index 100% rename from output/gold/example_graph_1913.json rename to output/examples/example_graph_1913.json