Skip to content
Snippets Groups Projects
Commit b2187da8 authored by innes's avatar innes
Browse files

Fix broken links on page index

parent 423d1534
No related branches found
No related tags found
No related merge requests found
......@@ -26,8 +26,6 @@ This project takes as input a text in a particular Slavic language and returns w
- <a href="https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#technical-details">Technical details</a>
- <a href = "https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#system-requirements">System Requirements</a>
- <a href = "https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#installation">Installation</a>
- <a href = "https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#todos">TODOs</a>
- <a href = "https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#common-problems">Common Problems</a>
- <a href = "https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#author">Author</a>
- <a href="https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#references">References</a>
......@@ -503,7 +501,7 @@ As expected, MLP, Random Forest and Decision Tree all performed better than the
The learning curve shows how the accuracy increases as the size of the training data increases. For this the sklearn method `learning_curve()` was used, which can automatically plot a learning curve. The test sizes were the following: `[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000]`. On the graph you can also see the comparison with two baselines: Naive Bayes and Random Class classifier, as mentioned above. The results show a steady increase in accuracy up to around 1000, from which point the increase in accuracy slows, however is still present.
<img src="images/learning_curve_w_mlp_fade.png" width=800><!--IS THIS TOO CHAOTIC?!-->
<img src="images/learning_curve_w_mlp_fade.png" width=600>
UPDATE THIS DESCRIPTION
1000 datapoints is roughly 63 datapoints per class (assuming a uniform distribution). This is evidently enough to learn the most important features (as we can see <a href="https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#feature-importance">above</a>: the alphabets). Interesting to note is also that Decision Tree and Random Forest both have 100% accuracy on the training set, whereas Naive Bayes only has around 90%.
......@@ -603,12 +601,16 @@ It is therefore clear that, while no precise accuracy can be given for real-worl
This model was purely focussed on Slavic languages however could easily be extended to other language groups which have a Wikipedia. In order to do this, one would have to update the alphabet and it might be necessary to change the number of n-most common words in the BoW model. A more complex model could also use N-grams to capture morphological information in a more sophisticated way.
<img src="images/birch-forest-crop.jpg" width=2000/>
# Technical details
## System Requirements
Computer<br>
Python 3.9 (or similar)<br>
Scikit-learn<br>
## Installation
For details on how to download and use the model, see the <a href="https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/README.md">introductory README for the seminar</a>.
## Author
Samuel Innes: dd257@stud.uni-heidelberg.de
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment