Update Evaluation part of README to include MLP

e52e930f · innes · 8b38b6cc · e52e930f
Commit e52e930f authored 3 years ago by innes
--- a/project/README.md
+++ b/project/README.md
@@ -468,7 +468,7 @@ After generating the dataset, Cross-Validation with 5 folds was carried out imme
  <td>0.9738</td>
  <td>0.9773</td>
  <td>0.9712</td>
-  <td><i><b>0.9759</b></i></td>
+  <td><i>0.9759</i></td>
 </tr>
 <tr>
  <td><b>Decision Tree:</b></td>
@@ -486,7 +486,7 @@ After generating the dataset, Cross-Validation with 5 folds was carried out imme
  <td>0.9764</td>
  <td>0.9721</td>
  <td>0.9851</td>
-  <td><i>0.9778</i></td>
+  <td><i><b>0.9778</b></i></td>
 </tr>
 <tr>
  <td><b>Naive Bayes (baseline):</b></td>
@@ -499,7 +499,7 @@ After generating the dataset, Cross-Validation with 5 folds was carried out imme
 </tr>
 </table>

-As expected, MLP, Random Forest and Decision Tree all performed better than the Naive Bayes baseline, with an accuracy well over the initial goal of 90%. What is also clear, is that MLP and Random Forest have a better accuracy than Decision Tree, which was also expected. Also not unexpected was the fact that the MLP performed better than the other classifiers used. Although the corpus was small, Neural Networks are renowned for their ability to achieve high accuracy, since they can combine features in a more complex way than 'simpler' algorithms such as Decision Trees. The <a href="final_model.zip">final model</a> thus uses an optimised MLP classifier.
+As expected, MLP, Random Forest and Decision Tree all performed better than the Naive Bayes baseline, with an accuracy well over the initial goal of 90%. What is also clear, is that MLP and Random Forest have a better accuracy than Decision Tree, which was also expected. Also not unexpected was the result of the Neural Network, as they are renowned for their ability to achieve high accuracy, combining features in a more complex way than 'simpler' algorithms such as Decision Trees. Although the corpus was small, the MLP performed better than the other classifiers used, and thus the <a href="final_model.zip">final model</a> uses an optimised MLP classifier.

 The learning curve shows how the accuracy increases as the size of the training data increases. For this the sklearn method `learning_curve()` was used, which can automatically plot a learning curve. The test sizes were the following: `[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000]`. On the graph you can also see the comparison with two baselines: Naive Bayes and Random Class classifier, as mentioned above. The results show a steady increase in accuracy up to around 1000, from which point the increase in accuracy slows, however is still present.

@@ -537,13 +537,34 @@ weighted avg       0.98      0.98      0.98      1910

 As is clear from the F1-scores in the above table, the model is accurate and precise for all classes. It could of course have been the case that one class is never predicted and the accuracy could still be well over 90%, however, using the F-scores, which are a weighted average of accuracy and precision (in this case F1 where $`=2\cdot\frac{precision \cdot recall}{precision + recall}`$), we know that this is not the case here.

-However, the above metrics are generated by testing the model on the wikipedia corpus. This would logically lead to a higher accuracy than on "real-world" texts, since there are certain words which appear more often on wikipedia pages due to the nature of the topics (see the discussion on <a href= "https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#features">features</a>). Since testing the algorithm on real-world data requires annotation, which is beyond the realms of the resources of this project, it is impossible to provide a numerical value for the accuracy. However, using article 1 of the UN Declaration of Human Rights, all the languages for which a translation was available (12 out of 16) were tested.
+However, the above metrics are generated by testing the model on the wikipedia corpus. This would logically lead to a higher accuracy than on "real-world" texts, since there are certain words which appear more often on wikipedia pages due to the nature of the topics (see the discussion on <a href= "https://gitlab.cl.uni-heidelberg.de/innes/exp-ml-1/-/blob/master/project/README.md#features">features</a>). Since testing the algorithm on real-world data requires annotation, which is beyond the realms of the resources of this project, it is impossible to provide a numerical value for the accuracy. However, using Article 1 of the UN Declaration of Human Rights, all the languages for which a translation was available (12 out of 16) were tested.

 > All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

 > Все люди рождаются свободными и равными в своем достоинстве и правах. Они наделены разумом и совестью и должны поступать в отношении друг друга в духе братства.

-Article 1 is comparatively relatively short - the length of the English text is `30` words, in Russian `26` - so the model works less well, classifying 9 of the 12 languages correctly. Using the longer paragraph in the preamble right before - `73` in Russian - the model predicted all languages correctly. To check the remaining languages, paragraphs were taken from other sources, which was sometimes tricky as the remaining 4 languages as little to no internet presence. Below is an example of the first two lines of the Lord's Prayer, correctly classified as Silesian, and the output.
+Article 1 is comparatively relatively short - the length of the English text is `30` words, in Russian `26` - so the model works less well, classifying 9 of the 12 languages correctly. Using the somewhat longer article 2 in the preamble right before - `79` in Russian - the accuracy increased. The Article 1 translations were retrieved from <a href="https://omniglot.com/udhr/index.htm">Omniglot</a>, and the Article 2 texts from the <a href="https://www.ohchr.org/EN/UDHR/Pages/SearchByLang.aspx">United Nations official translations</a>. 
+
+| Supported languages: | Article 1 (MLP) | Article 1 (RF) | Article 2 (MLP) | Article 2 (RF) |
+| --- | --- | --- | --- | --- |
+| Russian | :x: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Polish | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Ukranian | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Slovenian | :x: | :white_check_mark: | :white_check_mark: | :x: |
+| Slovakian | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Bulgarian | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Belarussian | :x: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Czech | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
+| Macedonian | :white_check_mark: | :x: | :x: | :white_check_mark: |
+| Serbo-Croatian: Cyrillic | :white_check_mark: | :x: | :white_check_mark: | :x: |
+| Serbo-Croatian: Latin | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: |
+| Silesian | :white_check_mark: | :white_check_mark: | N/A | N/A |
+| Upper Sorbian | :white_check_mark: | :white_check_mark: | N/A | N/A |
+| Lower Sorbian | :white_check_mark: | :white_check_mark: | N/A | N/A |
+| Rusyn | :white_check_mark: | :white_check_mark: | N/A | N/A |
+| Old Church Slavonic: Cyrillic | :white_check_mark: | :white_check_mark: | N/A | N/A |
+
+To further test the languages for which Article 2 was not available, paragraphs were taken from other sources, which was sometimes tricky as the 5 languages as little to no internet presence. Below is an example of the first two lines of the Lord's Prayer, correctly classified as Silesian, and the output.

 ```
 Enter text to be classified: Ôjcze nŏsz, kery jeżeś we niebie, bydź poświyncōne miano Twoje.
@@ -575,24 +596,6 @@ Ukranian: 0.0
 Would you like to enter another text? (y/n)
 ```
 The next highest probability was Polish, which is to be expected due to the similarity of the two languages (some regard Silesian as a dialect of Polish), and it is mostly other languages written in the latin script, which are predicted, also unsurprising.
-| Supported languages: | Article 1 | Preamble |
-| --- | --- | --- |
-| Russian | :white_check_mark: |
-| Polish | :white_check_mark: |
-| Ukranian | :white_check_mark: |
-| Slovenian | :white_check_mark: |
-| Slovakian | :white_check_mark: |
-| Bulgarian | :white_check_mark: |
-| Belarussian | :white_check_mark: |
-| Czech | :white_check_mark: |
-| Macedonian | :x: |
-| Serbo-Croatian: Cyrillic | :x: |
-| Serbo-Croatian: Latin | :white_check_mark: |
-| Silesian | :white_check_mark: |
-| Upper Sorbian | :white_check_mark: |
-| Lower Sorbian | :white_check_mark: |
-| Rusyn | :white_check_mark: |
-| Old Church Slavonic: Cyrillic | :white_check_mark: |

 It is therefore clear that, while no precise accuracy can be given for real-world data, the model works almost perfectly, given the document is greater than a particular length (around >50 words). One possible way of calculating more reliable metrics on this point without using annotation could be to use several single-language corpora and use that data to calculate metrics.