Skip to content
Snippets Groups Projects
Commit d2a35631 authored by igraf's avatar igraf
Browse files

Update random forest part

parent 334a442e
No related branches found
No related tags found
No related merge requests found
...@@ -417,9 +417,7 @@ The best accuracy on the development set is **0.469** with the **HSV filters** ( ...@@ -417,9 +417,7 @@ The best accuracy on the development set is **0.469** with the **HSV filters** (
</figure> </figure>
The following table shows an excerpt of the feature and size combination used:
**Optimization:**
- *"no filters"* = RGB values as features
| Resized | Features | Accuracy (Dev) | Best Parameters | Comments | | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
| ------- | -------- | -------- | --------------- | ---- | | ------- | -------- | -------- | --------------- | ---- |
...@@ -428,66 +426,57 @@ The best accuracy on the development set is **0.469** with the **HSV filters** ( ...@@ -428,66 +426,57 @@ The best accuracy on the development set is **0.469** with the **HSV filters** (
| 50x50 | HSV + Sobel | 0.469 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => no improvement compared to only HSV | 50x50 | HSV + Sobel | 0.469 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => no improvement compared to only HSV
| 50x50 | Sobel only | 0.392 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => a lot worse than HSV only | 50x50 | Sobel only | 0.392 | `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => a lot worse than HSV only
| 50x50 | No filters + Sobel | 0.432 | `{'max_depth': 30, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | | | 50x50 | No filters + Sobel | 0.432 | `{'max_depth': 30, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | |
| 125x125 | Canny only | 0.214 | `{'max_depth': 80, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | => poor results despite of many features | | 50x50 | Canny only | 0.203 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | => poor results
<br />
Because we had strong results with HSV + Sobel, we also used this feature combination for another round of optimization with a different picture size (namely 75x75). The best accuracy we achieved was **0.469**, thus no improvement compared to 50x50 images. **Different image sizes:**
==> Best results for 50x50 images with HSV + Sobel filters & HSV only - 1.) Because we had strong results with HSV + Sobel, we also used this feature combination for another round of optimization with a different picture size (namely 75x75).
==> Therefore, we will use this feature combination for another round of optimization with a different picture size (namely 75x75) - <details> <summary> The best accuracy we achieved was <b>0.469</b>, thus no improvement compared to 50x50 images. </summary>
| Resized | Features | Accuracy (Dev) | Best Parameters | Comments | | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
| ------- | -------- | -------- | --------------- | ---- | | ------- | -------- | -------- | --------------- | ---- |
| 75x75 | HSV + Sobel (without normal pixel values) | 0.469 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | Time for optimization: 251 min | | 75x75 | HSV + Sobel | 0.469 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 100}` | => no improvement compared to 50x50 images |
</details>
- 2.) As another experiment, we tested whether we can improve the results with no filters using different images sizes.
- <details> <summary> The best accuracy we achieved was <b>0.417</b> with 75x75 images, though there are only minimal differences in performance between the different sizes. </summary>
==> no improvement compared to 50x50 images | Resized | Features | Accuracy (Dev) | Tested Parameters |
| ------- | -------- | -------- | --------------- |
| 30x30 | No filters (2700 features) | 0.412 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
| 50x75 | No filters (11250 features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
| 75x75 | No filters (16875 features) | 0.417 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
| 100x100 | No filters (30000 features) | 0.413 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
| 125x125 | No filters (46874 features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
We see that for all settings, a higher number of estimators leads to better results. Therefore, we again tested HSV only with up to 150 estimators. <br />
<details> <summary> Click to see parameter grid </summary> **Higher number of estimators:**
``` We see that for all settings, a higher number of estimators leads to better results. Therefore, we again tested HSV only with up to 150 estimators.
param_grid = {"max_depth": list(range(10,81,10)) + [None], "n_estimators": [10,25,50,75,100,125,150], "max_features": ["sqrt"], "min_samples_leaf": [2], 'min_samples_split': [2]}
``` <details> <summary> This did indeed lead to better results: 150 estimators: accuracy of <b>0.481</b></summary>
</details>
This did indeed lead to better results: 100 estimators: 0.469, 150 estimators: 0.481
| Resized | Features | Accuracy (Dev) | Best Parameters | Comments | | Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
| ------- | -------- | -------- | --------------- | ---- | | ------- | -------- | -------- | --------------- | ---- |
| 50x50 | HSV only | 0.481| `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 150}` | Time for optimization: 251 min | | 50x50 | HSV only | 0.481| `{'max_depth': 40, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 150}` | Time for optimization: 251 min |
![Grid](figures/random_forest/grid_search_results_rforest_50x50_hsv_max_depth_n_estimators.png)
<img align="center" src="figures/random_forest/grid_search_results_rforest_50x50_hsv_max_depth_n_estimators.png" alt= "Grid" width="80%" height="auto">
</figure>
</details>
| Resized | Features | Accuracy (Dev) | Tested Parameters | Comments | <br />
| ------- | -------- | -------- | --------------- | ---- |
| 30x30 | No filters (2700 Features) | 0.412 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 2.1 min | 0.005 min
| 50x75 | No filters (11250 Features) | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 4.06 min | 0.006 min
| 75x75 | No filters (16875 Features) | 0.417 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
| 100x100 | No filters (30000 Features) | 0.413 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
Results for RandomForestClassifier classifier on 100x100_standard images:
| 50x50 | Canny 300 threshold | 0.203 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` |
| 125x125 | No filters | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 8.56 min | 0.011 min
| 125x125 | Canny 300 threshold | 0.200 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 0.97 min | 0.0007 min
**Confusion Matrices:**
Confusion Matrix - No filters - best parameters | Confusion Matrix - HSV - best parameters
:-------------------------:|:-------------------------:
![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_standard_confusion_matrix_max_depth_70_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png) | ![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_hsv-only_confusion_matrix_max_depth_40_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)
- Observations: - Observations:
- Classifiers both make the same mistakes, e.g. confusing raspberries, redcurrants and strawberries :strawberry: (see bottom right corner of confusion matrix) - Classifiers both make the same mistakes, e.g. confusing raspberries, redcurrants and strawberries :strawberry: (see bottom right corner of confusion matrix)
- GridSearch:
- the importance of the parameters can be seen in the following figures:
- using the grid like we did, many combinations of parameters are tested and the best combination is chosen
- if we also want to find out how the parameters influence the accuracy, we can visualize the results of the grid search as below; the code we used for this is slightly adapted from a [stackoverflow response](https://stackoverflow.com/questions/37161563/how-to-graph-grid-scores-from-gridsearchcv)
- :mag: the figure shows the accuracy when all parameters are fixed to their best value except for the one for which the accuracy is plotted (both for train and dev set)
Confusion Matrix - No filters - best parameters | Confusion Matrix - HSV features - best parameters
:-------------------------:|:-------------------------:
![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_standard_confusion_matrix_max_depth_70_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png) | ![Random Forest Grid Search](figures/random_forest/RandomForestClassifier_50x50_hsv-only_confusion_matrix_max_depth_40_max_features_sqrt_min_samples_leaf_2_min_samples_split_2_n_estimators_100.png)
### CNN (Convolutional Neural Network) ### CNN (Convolutional Neural Network)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment