From d2a356313f25475c1ed943fb8d1b11fa8c573950 Mon Sep 17 00:00:00 2001
From: igraf <igraf@cl.uni-heidelberg.de>
Date: Fri, 23 Feb 2024 21:50:35 +0100
Subject: [PATCH] Update random forest part

---
 project/README.md | 75 ++++++++++++++++++++---------------------------
 1 file changed, 32 insertions(+), 43 deletions(-)

diff --git a/project/README.md b/project/README.md
index de541e6..5f930d5 100644
--- a/project/README.md
+++ b/project/README.md
@@ -417,9 +417,7 @@ The best accuracy on the development set is **0.469** with the **HSV filters** (
 </figure>
 
 
-
-**Optimization:**
-- *"no filters"* = RGB values as features
+The following table shows an excerpt of the feature and size combination used:
 
 | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
 | ------- | -------- | -------- | --------------- | ---- |
@@ -428,66 +426,57 @@ The best accuracy on the development set is **0.469** with the **HSV filters** (
 | 50x50   | HSV + Sobel  | 0.469 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` |  => no improvement compared to only HSV
 | 50x50 |  Sobel only | 0.392 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` |  => a lot worse than HSV only
 | 50x50 | No filters + Sobel | 0.432 | `{'max_depth': 30, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | |
-| 125x125 | Canny only | 0.214 | `{'max_depth': 80, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | => poor results despite of many features |
+| 50x50 | Canny only | 0.203 |  `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | => poor results
 
+<br />
 
-Because we had strong results with HSV + Sobel, we also used this feature combination for another round of optimization with a different picture size (namely 75x75). The best accuracy we achieved was **0.469**, thus no improvement compared to 50x50 images.
+**Different image sizes:**
 
-==> Best results for 50x50 images with HSV + Sobel filters & HSV only 
-==> Therefore, we will use this feature combination for another round of optimization with a different picture size (namely 75x75)
+- 1.) Because we had strong results with HSV + Sobel, we also used this feature combination for another round of optimization with a different picture size (namely 75x75). 
+  - <details> <summary> The best accuracy we achieved was <b>0.469</b>, thus no improvement compared to 50x50 images. </summary> 
 
-| Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
-| ------- | -------- | -------- | --------------- | ---- |
-| 75x75   | HSV + Sobel (without normal pixel values) | 0.469 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | Time for optimization: 251 min |
+    | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
+    | ------- | -------- | -------- | --------------- | ---- |
+    | 75x75   | HSV + Sobel | 0.469 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => no improvement compared to 50x50 images |
+    </details>
+- 2.) As another experiment, we tested whether we can improve the results with no filters using different images sizes. 
+  - <details> <summary> The best accuracy we achieved was <b>0.417</b> with 75x75 images, though there are only minimal differences in performance between the different sizes. </summary> 
 
-==> no improvement compared to 50x50 images
+    | Resized | Features | Accuracy (Dev) | Tested Parameters |
+    | ------- | -------- | -------- | --------------- | 
+    | 30x30  | No filters (2700 features) | 0.412 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 
+    | 50x75 | No filters (11250 features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
+    | 75x75   | No filters (16875 features) | 0.417 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
+    | 100x100 | No filters (30000 features) | 0.413 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
+    | 125x125 | No filters (46874 features)  | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 
 
-We see that for all settings, a higher number of estimators leads to better results. Therefore, we again tested HSV only with up to 150 estimators. 
+<br />
 
-<details> <summary> Click to see parameter grid </summary> 
+**Higher number of estimators:**
 
-```
-param_grid = {"max_depth": list(range(10,81,10)) + [None], "n_estimators": [10,25,50,75,100,125,150], "max_features": ["sqrt"], "min_samples_leaf": [2], 'min_samples_split': [2]}
-```
-</details>
-This did indeed lead to better results: 100 estimators: 0.469, 150 estimators: 0.481
+We see that for all settings, a higher number of estimators leads to better results. Therefore, we again tested HSV only with up to 150 estimators. 
+
+<details> <summary> This did indeed lead to better results: 150 estimators: accuracy of <b>0.481</b> ✨ </summary> 
 
 | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
 | ------- | -------- | -------- | --------------- | ---- |
 | 50x50   | HSV only | 0.481| `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 150}` | Time for optimization: 251 min |
 
-![Grid](figures/random_forest/grid_search_results_rforest_50x50_hsv_max_depth_n_estimators.png)
-
 
+<img  align="center" src="figures/random_forest/grid_search_results_rforest_50x50_hsv_max_depth_n_estimators.png" alt= "Grid" width="80%" height="auto">
+</figure>
+</details>
 
-| Resized | Features | Accuracy (Dev) | Tested Parameters | Comments |
-| ------- | -------- | -------- | --------------- | ---- |
-| 30x30  | No filters (2700 Features) | 0.412 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 2.1 min | 0.005 min
-| 50x75 | No filters (11250 Features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 4.06 min | 0.006 min
-| 75x75   | No filters (16875 Features) | 0.417 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
-| 100x100 | No filters (30000 Features) | 0.413 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
-Results for RandomForestClassifier classifier on 100x100_standard images:
-| 50x50 | Canny 300 threshold | 0.203 |  `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
-| 125x125 | No filters | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 8.56 min | 0.011 min
-| 125x125 | Canny 300 threshold | 0.200 |  `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 0.97 min | 0.0007 min
+<br />
 
+**Confusion Matrices:**
+Confusion Matrix -  No filters  - best parameters        |  Confusion Matrix -  HSV  - best parameters
+:-------------------------:|:-------------------------:
+![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_standard_confusion_matrix_max_depth_70_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)  |  ![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_hsv-only_confusion_matrix_max_depth_40_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)
 
 - Observations:
     - Classifiers both make the same mistakes, e.g. confusing raspberries, redcurrants and strawberries :strawberry: (see bottom right corner of confusion matrix)
 
-- GridSearch:
-- the importance of the parameters can be seen in the following figures:
-    - using the grid like we did, many combinations of parameters are tested and the best combination is chosen
-    - if we also want to find out how the parameters influence the accuracy, we can visualize the results of the grid search as below; the code we used for this is slightly adapted from a [stackoverflow response](https://stackoverflow.com/questions/37161563/how-to-graph-grid-scores-from-gridsearchcv)
-        - :mag: the figure shows the accuracy when all parameters are fixed to their best value except for the one for which the accuracy is plotted (both for train and dev set)
-
-
-
-
-Confusion Matrix -  No filters  - best parameters        |  Confusion Matrix -  HSV features - best parameters
-:-------------------------:|:-------------------------:
-![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_standard_confusion_matrix_max_depth_70_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)  |  ![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_hsv-only_confusion_matrix_max_depth_40_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)
-
 
 
 ### CNN (Convolutional Neural Network)
-- 
GitLab