Skip to content
Snippets Groups Projects
Commit 4dc91cf6 authored by Samuel Innes's avatar Samuel Innes
Browse files

Fix spelling mistages in READMEW

parent bc590932
No related branches found
No related tags found
No related merge requests found
......@@ -589,7 +589,7 @@ Slovak: 0.01
Slovene: 0.01
Serbo-Croat: Cyrillic: 0.03
Silesian: 0.68
Ukranian: 0.0
Ukrainian: 0.0
Would you like to enter another text? (y/n)
```
......@@ -598,7 +598,7 @@ The next highest probability was Polish, which is to be expected due to the simi
It is therefore clear that, while no precise accuracy can be given for real-world data, the model works almost perfectly, given the document is greater than a particular length (around >50 words). One possible way of calculating more reliable metrics on this point without using annotation could be to use several single-language corpora and use that data to calculate metrics.
## Future work
This model was purely focussed on Slavic languages however could easily be extended to other language groups which have a Wikipedia. In order to do this, one would have to update the alphabet and it might be necessary to change the number of n-most common words in the BoW model. A more complex model could also use N-grams to capture morphological information in a more sophisticated way.
This model was purely focussed on Slavic languages however could easily be extended to other language groups which have a Wikipedia. In order to do this, one would have to update the alphabet, and it might be necessary to change the number of n-most common words in the BoW model. A more complex model could also use N-grams to capture morphological information in a more sophisticated way.
<img src="images/birch-forest-crop.jpg" width=2000/>
# Technical details
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment