Update random forest part

d2a35631 · igraf · 334a442e · d2a35631
Commit d2a35631 authored 1 year ago by igraf
--- a/project/README.md
+++ b/project/README.md
@@ -417,9 +417,7 @@ The best accuracy on the development set is **0.469** with the **HSV filters** (
 </figure>
+The following table shows an excerpt of the feature and size combination used:
-**Optimization:**
- *"no filters"* = RGB values as features
 | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
 | ------- | -------- | -------- | --------------- | ---- |
@@ -428,66 +426,57 @@ The best accuracy on the development set is **0.469** with the **HSV filters** (
 | 50x50   | HSV + Sobel  | 0.469 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` |  => no improvement compared to only HSV
 | 50x50 |  Sobel only | 0.392 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` |  => a lot worse than HSV only
 | 50x50 | No filters + Sobel | 0.432 | `{'max_depth': 30, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | |
-| 125x125 | Canny only | 0.214 | `{'max_depth': 80, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | => poor results despite of many features |
+| 50x50 | Canny only | 0.203 |  `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | => poor results
+<br />
-Because we had strong results with HSV + Sobel, we also used this feature combination for another round of optimization with a different picture size (namely 75x75). The best accuracy we achieved was **0.469**, thus no improvement compared to 50x50 images.
+**Different image sizes:**
-==> Best results for 50x50 images with HSV + Sobel filters & HSV only 
+- 1.) Because we had strong results with HSV + Sobel, we also used this feature combination for another round of optimization with a different picture size (namely 75x75). 
-==> Therefore, we will use this feature combination for another round of optimization with a different picture size (namely 75x75)
+  - <details> <summary> The best accuracy we achieved was <b>0.469</b>, thus no improvement compared to 50x50 images. </summary> 
-| Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
+    | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
-| ------- | -------- | -------- | --------------- | ---- |
+    | ------- | -------- | -------- | --------------- | ---- |
-| 75x75   | HSV + Sobel (without normal pixel values) | 0.469 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | Time for optimization: 251 min |
+    | 75x75   | HSV + Sobel | 0.469 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => no improvement compared to 50x50 images |
+    </details>
+- 2.) As another experiment, we tested whether we can improve the results with no filters using different images sizes. 
+  - <details> <summary> The best accuracy we achieved was <b>0.417</b> with 75x75 images, though there are only minimal differences in performance between the different sizes. </summary> 
-==> no improvement compared to 50x50 images
+    | Resized | Features | Accuracy (Dev) | Tested Parameters |
+    | ------- | -------- | -------- | --------------- | 
+    | 30x30  | No filters (2700 features) | 0.412 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 
+    | 50x75 | No filters (11250 features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
+    | 75x75   | No filters (16875 features) | 0.417 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
+    | 100x100 | No filters (30000 features) | 0.413 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
+    | 125x125 | No filters (46874 features)  | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 
-We see that for all settings, a higher number of estimators leads to better results. Therefore, we again tested HSV only with up to 150 estimators. 
+<br />
-<details> <summary> Click to see parameter grid </summary> 
+**Higher number of estimators:**
-```
+We see that for all settings, a higher number of estimators leads to better results. Therefore, we again tested HSV only with up to 150 estimators. 
-param_grid = {"max_depth": list(range(10,81,10)) + [None], "n_estimators": [10,25,50,75,100,125,150], "max_features": ["sqrt"], "min_samples_leaf": [2], 'min_samples_split': [2]}
-```
+<details> <summary> This did indeed lead to better results: 150 estimators: accuracy of <b>0.481</b> ✨ </summary> 
-</details>
-This did indeed lead to better results: 100 estimators: 0.469, 150 estimators: 0.481
 | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
 | ------- | -------- | -------- | --------------- | ---- |
 | 50x50   | HSV only | 0.481| `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 150}` | Time for optimization: 251 min |
-![Grid](figures/random_forest/grid_search_results_rforest_50x50_hsv_max_depth_n_estimators.png)
+<img  align="center" src="figures/random_forest/grid_search_results_rforest_50x50_hsv_max_depth_n_estimators.png" alt= "Grid" width="80%" height="auto">
+</figure>
+</details>
-| Resized | Features | Accuracy (Dev) | Tested Parameters | Comments |
+<br />
-| ------- | -------- | -------- | --------------- | ---- |
-| 30x30  | No filters (2700 Features) | 0.412 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 2.1 min | 0.005 min
-| 50x75 | No filters (11250 Features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 4.06 min | 0.006 min
-| 75x75   | No filters (16875 Features) | 0.417 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
-| 100x100 | No filters (30000 Features) | 0.413 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
-Results for RandomForestClassifier classifier on 100x100_standard images:
-| 50x50 | Canny 300 threshold | 0.203 |  `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
-| 125x125 | No filters | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 8.56 min | 0.011 min
-| 125x125 | Canny 300 threshold | 0.200 |  `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 0.97 min | 0.0007 min
+**Confusion Matrices:**
+Confusion Matrix -  No filters  - best parameters        |  Confusion Matrix -  HSV  - best parameters
+:-------------------------:|:-------------------------:
+![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_standard_confusion_matrix_max_depth_70_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)  |  ![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_hsv-only_confusion_matrix_max_depth_40_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)
 - Observations:
    - Classifiers both make the same mistakes, e.g. confusing raspberries, redcurrants and strawberries :strawberry: (see bottom right corner of confusion matrix)
- GridSearch:
- the importance of the parameters can be seen in the following figures:
-    - using the grid like we did, many combinations of parameters are tested and the best combination is chosen
-    - if we also want to find out how the parameters influence the accuracy, we can visualize the results of the grid search as below; the code we used for this is slightly adapted from a [stackoverflow response](https://stackoverflow.com/questions/37161563/how-to-graph-grid-scores-from-gridsearchcv)
-        - :mag: the figure shows the accuracy when all parameters are fixed to their best value except for the one for which the accuracy is plotted (both for train and dev set)
-Confusion Matrix -  No filters  - best parameters        |  Confusion Matrix -  HSV features - best parameters
-:-------------------------:|:-------------------------:
-![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_standard_confusion_matrix_max_depth_70_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)  |  ![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_hsv-only_confusion_matrix_max_depth_40_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)
 ### CNN (Convolutional Neural Network)