@@ -598,7 +598,7 @@ The next highest probability was Polish, which is to be expected due to the simi
It is therefore clear that, while no precise accuracy can be given for real-world data, the model works almost perfectly, given the document is greater than a particular length (around >50 words). One possible way of calculating more reliable metrics on this point without using annotation could be to use several single-language corpora and use that data to calculate metrics.
## Future work
This model was purely focussed on Slavic languages however could easily be extended to other language groups which have a Wikipedia. In order to do this, one would have to update the alphabet and it might be necessary to change the number of n-most common words in the BoW model. A more complex model could also use N-grams to capture morphological information in a more sophisticated way.
This model was purely focussed on Slavic languages however could easily be extended to other language groups which have a Wikipedia. In order to do this, one would have to update the alphabet, and it might be necessary to change the number of n-most common words in the BoW model. A more complex model could also use N-grams to capture morphological information in a more sophisticated way.