From d2a356313f25475c1ed943fb8d1b11fa8c573950 Mon Sep 17 00:00:00 2001 From: igraf <igraf@cl.uni-heidelberg.de> Date: Fri, 23 Feb 2024 21:50:35 +0100 Subject: [PATCH] Update random forest part --- project/README.md | 75 ++++++++++++++++++++--------------------------- 1 file changed, 32 insertions(+), 43 deletions(-) diff --git a/project/README.md b/project/README.md index de541e6..5f930d5 100644 --- a/project/README.md +++ b/project/README.md @@ -417,9 +417,7 @@ The best accuracy on the development set is **0.469** with the **HSV filters** ( </figure> - -**Optimization:** -- *"no filters"* = RGB values as features +The following table shows an excerpt of the feature and size combination used: | Resized | Features | Accuracy (Dev) | Best Parameters | Comments | | ------- | -------- | -------- | --------------- | ---- | @@ -428,66 +426,57 @@ The best accuracy on the development set is **0.469** with the **HSV filters** ( | 50x50 | HSV + Sobel | 0.469 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => no improvement compared to only HSV | 50x50 | Sobel only | 0.392 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => a lot worse than HSV only | 50x50 | No filters + Sobel | 0.432 | `{'max_depth': 30, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | | -| 125x125 | Canny only | 0.214 | `{'max_depth': 80, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | => poor results despite of many features | +| 50x50 | Canny only | 0.203 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | => poor results +<br /> -Because we had strong results with HSV + Sobel, we also used this feature combination for another round of optimization with a different picture size (namely 75x75). The best accuracy we achieved was **0.469**, thus no improvement compared to 50x50 images. +**Different image sizes:** -==> Best results for 50x50 images with HSV + Sobel filters & HSV only -==> Therefore, we will use this feature combination for another round of optimization with a different picture size (namely 75x75) +- 1.) Because we had strong results with HSV + Sobel, we also used this feature combination for another round of optimization with a different picture size (namely 75x75). + - <details> <summary> The best accuracy we achieved was <b>0.469</b>, thus no improvement compared to 50x50 images. </summary> -| Resized | Features | Accuracy (Dev) | Best Parameters | Comments | -| ------- | -------- | -------- | --------------- | ---- | -| 75x75 | HSV + Sobel (without normal pixel values) | 0.469 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | Time for optimization: 251 min | + | Resized | Features | Accuracy (Dev) | Best Parameters | Comments | + | ------- | -------- | -------- | --------------- | ---- | + | 75x75 | HSV + Sobel | 0.469 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => no improvement compared to 50x50 images | + </details> +- 2.) As another experiment, we tested whether we can improve the results with no filters using different images sizes. + - <details> <summary> The best accuracy we achieved was <b>0.417</b> with 75x75 images, though there are only minimal differences in performance between the different sizes. </summary> -==> no improvement compared to 50x50 images + | Resized | Features | Accuracy (Dev) | Tested Parameters | + | ------- | -------- | -------- | --------------- | + | 30x30 | No filters (2700 features) | 0.412 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | + | 50x75 | No filters (11250 features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | + | 75x75 | No filters (16875 features) | 0.417 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | + | 100x100 | No filters (30000 features) | 0.413 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | + | 125x125 | No filters (46874 features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | -We see that for all settings, a higher number of estimators leads to better results. Therefore, we again tested HSV only with up to 150 estimators. +<br /> -<details> <summary> Click to see parameter grid </summary> +**Higher number of estimators:** -``` -param_grid = {"max_depth": list(range(10,81,10)) + [None], "n_estimators": [10,25,50,75,100,125,150], "max_features": ["sqrt"], "min_samples_leaf": [2], 'min_samples_split': [2]} -``` -</details> -This did indeed lead to better results: 100 estimators: 0.469, 150 estimators: 0.481 +We see that for all settings, a higher number of estimators leads to better results. Therefore, we again tested HSV only with up to 150 estimators. + +<details> <summary> This did indeed lead to better results: 150 estimators: accuracy of <b>0.481</b> ✨ </summary> | Resized | Features | Accuracy (Dev) | Best Parameters | Comments | | ------- | -------- | -------- | --------------- | ---- | | 50x50 | HSV only | 0.481| `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 150}` | Time for optimization: 251 min | - - +<img align="center" src="figures/random_forest/grid_search_results_rforest_50x50_hsv_max_depth_n_estimators.png" alt= "Grid" width="80%" height="auto"> +</figure> +</details> -| Resized | Features | Accuracy (Dev) | Tested Parameters | Comments | -| ------- | -------- | -------- | --------------- | ---- | -| 30x30 | No filters (2700 Features) | 0.412 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 2.1 min | 0.005 min -| 50x75 | No filters (11250 Features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 4.06 min | 0.006 min -| 75x75 | No filters (16875 Features) | 0.417 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | -| 100x100 | No filters (30000 Features) | 0.413 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | -Results for RandomForestClassifier classifier on 100x100_standard images: -| 50x50 | Canny 300 threshold | 0.203 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | -| 125x125 | No filters | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 8.56 min | 0.011 min -| 125x125 | Canny 300 threshold | 0.200 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 0.97 min | 0.0007 min +<br /> +**Confusion Matrices:** +Confusion Matrix - No filters - best parameters | Confusion Matrix - HSV - best parameters +:-------------------------:|:-------------------------: + |  - Observations: - Classifiers both make the same mistakes, e.g. confusing raspberries, redcurrants and strawberries :strawberry: (see bottom right corner of confusion matrix) -- GridSearch: -- the importance of the parameters can be seen in the following figures: - - using the grid like we did, many combinations of parameters are tested and the best combination is chosen - - if we also want to find out how the parameters influence the accuracy, we can visualize the results of the grid search as below; the code we used for this is slightly adapted from a [stackoverflow response](https://stackoverflow.com/questions/37161563/how-to-graph-grid-scores-from-gridsearchcv) - - :mag: the figure shows the accuracy when all parameters are fixed to their best value except for the one for which the accuracy is plotted (both for train and dev set) - - - - -Confusion Matrix - No filters - best parameters | Confusion Matrix - HSV features - best parameters -:-------------------------:|:-------------------------: - |  - ### CNN (Convolutional Neural Network) -- GitLab