# Project: Fruit Image Classification :apple: :banana: :strawberry:
# Project: Fruit Image Classification :apple: :banana: :strawberry:
[[_TOC_]]
[[_TOC_]]
## Description & Motivation
## Description & Motivation
This folder contains the work for our final project of the EML Proseminar. The project is about classifying different types of fruit based on their images (➡️ **multi-class image classification**)
This folder contains the work for our final project of the EML Proseminar. The project is about classifying different types of fruit based on their images (➡️ **multi-class image classification**)
➡️ Given any picture, we want to make the correct prediction: *Is it a banana, an apple, a strawberry, ...?* 🍌 🍎 🍓
➡️ Given any picture, we want to make the correct prediction: *Is it a banana, an apple, a strawberry, ...?* 🍌 🍎 🍓
The motivation for this project is to explore the capabilities of machine learning models for image classification tasks. The inspiration for this project comes from the idea of automatic fruit recognition, which has **practical applications** for instance in supermarkets. Currently, customers have to manually select the type of fruit they are purchasing at self-service scales. This process is time-consuming and also inconvenient. Automating this process not only enhances efficiency but also provides a better customer experience, particularly in busy shopping environments.
The motivation for this project is to explore the capabilities of machine learning models for image classification tasks. The inspiration for this project comes from the idea of automatic fruit recognition, which has **practical applications** for instance in supermarkets. Currently, customers have to manually select the type of fruit they are purchasing at self-service scales. This process is time-consuming and also inconvenient. Automating this process not only enhances efficiency but also provides a better customer experience, particularly in busy shopping environments.
Moreover, the scalability of this approach extends beyond just fruits; it could be adapted for other grocery items like bread or vegetables, as well as various industries requiring object recognition, such as material sorting in recycling plants or inventory management in warehouses.
Moreover, the scalability of this approach extends beyond just fruits; it could be adapted for other grocery items like bread or vegetables, as well as various industries requiring object recognition, such as material sorting in recycling plants or inventory management in warehouses.
## Related Work
## Related Work
In our project, we looked related work on (fruit) image classification to gain insights into the best practices and build upon the existing knowledge in the field. We found two papers that were particularly relevant to our project:
In our project, we looked related work on (fruit) image classification to gain insights into the best practices and build upon the existing knowledge in the field. We found two papers that were particularly relevant to our project:
- the bachelor's thesis that created the dataset we use, which is essential for understanding its structure and development, and
- the bachelor's thesis that created the dataset we use, which is essential for understanding its structure and development, and
- a study on fruit recognition using deep learning, providing advanced insights into neural network applications in fruit classification.
- a study on fruit recognition using deep learning, providing advanced insights into neural network applications in fruit classification.
**1. Creating a Dataset and Models Based on Convolutional Neural Networks to Improve Fruit Classification** (Minuț & Iftene, 2021)
**1. Creating a Dataset and Models Based on Convolutional Neural Networks to Improve Fruit Classification** (Minuț & Iftene, 2021)
The dataset we are utilizing for our fruit image classification project was originally created as part of a bachelor's thesis, detailed in the paper "Creating a Dataset and Models Based on Convolutional Neural Networks to Improve Fruit Classification". This work involved compiling a diverse and extensive collection of fruit images from various sources, which were then meticulously processed to make them suitable for machine learning applications. The paper not only discusses the dataset creation but also delves into the development of Convolutional Neural Network (CNN) models specifically tailored for the task of fruit classification. The efforts in dataset preparation, including image filtering and resizing, and the insights into CNN model training provided in this thesis, form the backbone of the dataset we are using in our project. Another valuable finding of this work is that using RGB, HSV and grayscale features in combination can improve the models' performance.
The dataset we are utilizing for our fruit image classification project was originally created as part of a bachelor's thesis, detailed in the paper "Creating a Dataset and Models Based on Convolutional Neural Networks to Improve Fruit Classification". This work involved compiling a diverse and extensive collection of fruit images from various sources, which were then meticulously processed to make them suitable for machine learning applications. The paper not only discusses the dataset creation but also delves into the development of Convolutional Neural Network (CNN) models specifically tailored for the task of fruit classification. The efforts in dataset preparation, including image filtering and resizing, and the insights into CNN model training provided in this thesis, form the backbone of the dataset we are using in our project. Another valuable finding of this work is that using RGB, HSV and grayscale features in combination can improve the models' performance.
↪️ [Read more about the work here](https://ieeexplore.ieee.org/document/9700237).
↪️ [Read more about the work here](https://ieeexplore.ieee.org/document/9700237).
**2. Fruit Recognition from Images Using Deep Learning** (Muresan & Oltean, 2018)
**2. Fruit Recognition from Images Using Deep Learning** (Muresan & Oltean, 2018)
This paper explores fruit recognition using deep learning techniques. The authors developed a deep learning-based system for classifying fruits in images, addressing the challenges posed by variations in fruit appearance and environmental conditions. Their work on neural network model construction and optimization offers valuable insights for projects like ours, showcasing the effectiveness of deep learning in distinguishing between different fruit types.
This paper explores fruit recognition using deep learning techniques. The authors developed a deep learning-based system for classifying fruits in images, addressing the challenges posed by variations in fruit appearance and environmental conditions. Their work on neural network model construction and optimization offers valuable insights for projects like ours, showcasing the effectiveness of deep learning in distinguishing between different fruit types.
↪️ [Read the full paper here](https://arxiv.org/pdf/1712.00580.pdf).
↪️ [Read the full paper here](https://arxiv.org/pdf/1712.00580.pdf).
## Data
## Data
### Overview
### Overview
The dataset we are using for our project is based on the Fruit-262 **fruit image dataset**, available on [Kaggle](https://www.kaggle.com/datasets/aelchimminut/fruits262). Here are some key details about this original dataset:
The dataset we are using for our project is based on the Fruit-262 **fruit image dataset**, available on [Kaggle](https://www.kaggle.com/datasets/aelchimminut/fruits262). Here are some key details about this original dataset:
- 225,640 images
- 225,640 images
- 262 types of fruits
- 262 types of fruits
- ~ 860 images per class
- ~ 860 images per class
- format: *jpg*
- format: *jpg*
- images have different sizes
- images have different sizes
- contains subfolders for each class with the class name as folder name
- contains subfolders for each class with the class name as folder name
### Train / Dev / Test Split
### Train / Dev / Test Split
In our project, we have focused on **30** specific **classes** out of the 262 available fruit classes, constituting a total of **29,430** images. This decision was based on the relevance and diversity we aimed to achieve in our model's learning scope. We selected classes that are typically found in supermarkets in Germany (e.g. apples, bananas, oranges, mandarins, strawberries, etc.) but also included some exotic fruits (e.g. passion fruits, buddha's hand). We deliberately included fruits that are similar in appearance (e.g. mandarins and oranges) to challenge the model's ability to distinguish between them. A comprehensive list of the selected classes can be found [here](data/class_counts.md).
In our project, we have focused on **30** specific **classes** out of the 262 available fruit classes, constituting a total of **29,430** images. This decision was based on the relevance and diversity we aimed to achieve in our model's learning scope. We selected classes that are typically found in supermarkets in Germany (e.g. apples, bananas, oranges, mandarins, strawberries, etc.) but also included some exotic fruits (e.g. passion fruits, buddha's hand). We deliberately included fruits that are similar in appearance (e.g. mandarins and oranges) to challenge the model's ability to distinguish between them. A comprehensive list of the selected classes can be found [here](data/class_counts.md).
The original dataset lacks a predefined split into training, development (validation), and testing sets. To tailor our dataset for effective model training and evaluation, we implemented a custom script that methodically divides the dataset into specific proportions.
The original dataset lacks a predefined split into training, development (validation), and testing sets. To tailor our dataset for effective model training and evaluation, we implemented a custom script that methodically divides the dataset into specific proportions.
The script is configured to split the dataset into
The script is configured to split the dataset into
-**70%** for training
-**70%** for training
-**15%** for development (validation), and
-**15%** for development (validation), and
-**15%** for testing.
-**15%** for testing.
This means, for each selected fruit class, typically consisting of around **1000** images, we allocate around
This means, for each selected fruit class, typically consisting of around **1000** images, we allocate around
-**700** images for training
-**700** images for training
-**150** images for development, and
-**150** images for development, and
-**150** images for testing.
-**150** images for testing.
This allocation ensures a significant amount of data is used for the model's learning while adequately reserving a representative number of images for validation and testing.
This allocation ensures a significant amount of data is used for the model's learning while adequately reserving a representative number of images for validation and testing.
We opted not to use cross-validation for the train-dev split, considering the large size of our dataset. With approximately 1000 images in each of the 30 selected classes, our dataset is substantial enough to ensure effective training and validation. This approach negates the necessity for cross-validation, allowing for a streamlined and efficient training process.
We opted not to use cross-validation for the train-dev split, considering the large size of our dataset. With approximately 1000 images in each of the 30 selected classes, our dataset is substantial enough to ensure effective training and validation. This approach negates the necessity for cross-validation, allowing for a streamlined and efficient training process.
The data partitioning script randomly segregates the images for each fruit class into the designated training, development, and testing sets. This **random allocation per class** is pivotal for maintaining the data integrity and representativeness in each subset, facilitating an unbiased evaluation of the model's performance.
The data partitioning script randomly segregates the images for each fruit class into the designated training, development, and testing sets. This **random allocation per class** is pivotal for maintaining the data integrity and representativeness in each subset, facilitating an unbiased evaluation of the model's performance.
↪️ **To prepare the dataset for using it with this project please refer to the Data Preparation section in the [data folder](data/README.md).**
↪️ **To prepare the dataset for using it with this project please refer to the Data Preparation section in the [data folder](data/README.md).**
### Insights
---
The dataset is much cleaner than other fruit datasets we found on Kaggle, which often contain images of cooked food or fruit juices.
- different colors of fruits (e.g. green, red and yellow apples)
- varying stages of their life cycle
- some fruits are peeled, cut in half or in slices
- different quantities of fruits, and
- different colors of fruits (e.g. green, red and yellow apples)
This diversity in the dataset is beneficial for our project, as it allows our models to learn from a wide range of fruit images, making them more robust and adaptable to different real-world scenarios.
- some fruits are peeled, cut in half or in slices
### Data Distribution Across Classes
This diversity in the dataset is beneficial for our project, as it allows our models to learn from a wide range of fruit images, making them more robust and adaptable to different real-world scenarios.
As part of our dataset analysis, we have focused on understanding the distribution of images across the 30 selected fruit classes in our dataset.
---
### Data Distribution Across Classes
To visually represent the **class-wise distribution**, we plotted a histogram:
As part of our dataset analysis, we have focused on understanding the distribution of images across the 30 selected fruit classes in our dataset.
To visually represent the **class-wise distribution**, we plotted a histogram:
*Balance of Dataset*: The histogram provides visual and easy to see insights into whether the dataset is balanced or unbalanced. In our case, the dataset is mostly balanced. The apple class is slightly overrepresented and the nectarine, orange and jostaberry classes may have insufficient datapoints. We will keep an eye on the performance of our models on these classes to see if the imbalance has an impact on the model's ability to learn and predict these classes 🔍.
To evaluate classification models, the following metrics are commonly used:
*Balance of Dataset*: The histogram provides visual and easy to see insights into whether the dataset is balanced or unbalanced. In our case, the dataset is mostly balanced. The apple class is slightly overrepresented and the nectarine, orange and jostaberry classes may have insufficient datapoints. We will keep an eye on the performance of our models on these classes to see if the imbalance has an impact on the model's ability to learn and predict these classes 🔍.
-**Accuracy**: The ratio of correctly predicted observations to the total predictions.
## Metrics
- ⏩ *How many of the predicted classes are correct?*
-**Precision**: The ratio of correctly predicted positive observations to the total predicted positives. High precision relates to a low false positive rate.
To evaluate classification models, the following metrics are commonly used:
- ⏩ e.g. *How many of the predicted bananas are actually bananas?*
-**Recall**: The ratio of correctly predicted positive observations to the all observations in the actual class. It is also known as sensitivity or true positive rate.
-**Accuracy**: The ratio of correctly predicted observations to the total predictions.
- ⏩ e.g. *How many of the actual bananas were predicted as bananas?*
- ⏩ *How many of the predicted classes are correct?*
-**F1-Score**: The weighted average of Precision and Recall. It takes both false positives and false negatives into account. Useful for uneven class distribution.
-**Precision**: The ratio of correctly predicted positive observations to the total predicted positives. High precision relates to a low false positive rate.
- ⏩ e.g. *How many of the predicted bananas are actually bananas?*
-**Macro Average**: Averages the metric independently for each class before taking the average. This treats all classes equally, regardless of their frequency.
-**Recall**: The ratio of correctly predicted positive observations to the all observations in the actual class. It is also known as sensitivity or true positive rate.
- treats all classes equally, regardless of their size
- ⏩ e.g. *How many of the actual bananas were predicted as bananas?*
-**Micro Average**: Calculates the metric globally by counting the total true positives, false negatives, and false positives. This is influenced by class frequency.
-**F1-Score**: The weighted average of Precision and Recall. It takes both false positives and false negatives into account. Useful for uneven class distribution.
- treats all images equally, regardless of their class
-**Macro Average**: Averages the metric independently for each class before taking the average. This treats all classes equally, regardless of their frequency.
We are setting our focus on the **accuracy** metric. The accuracy is a suitable choice for our multi-class classification problem, as it provides a clear and intuitive measure of the model's overall performance. This metric is particularly effective for our balanced dataset, as it provides a reliable measure of the model's success in classifying the images across the 30 different fruit classes.
- treats all classes equally, regardless of their size
-**Micro Average**: Calculates the metric globally by counting the total true positives, false negatives, and false positives. This is influenced by class frequency.
## Baseline
- treats all images equally, regardless of their class
### Overview
We have implemented two types of baseline models: Random and Majority. These are implemented both as custom models and using scikit-learn's `DummyClassifier`. Our dataset involves classifying one out of 30 classes, with a balanced dataset of about 26,500 data points.
We are setting our focus on the **accuracy** metric. The accuracy is a suitable choice for our multi-class classification problem, as it provides a clear and intuitive measure of the model's overall performance. This metric is particularly effective for our balanced dataset, as it provides a reliable measure of the model's success in classifying the images across the 30 different fruit classes.
It's noteworthy that the performance metrics for our baseline models are **consistent** across both the training and test sets. This uniformity suggests that our data splits are well-balanced and representative, reducing the likelihood of biased or skewed results due to data split anomalies.
## Baseline
### Overview
If you want to reproduce the results, please run the script [`baselines.py`](baselines.py).
We have implemented two types of baseline models: Random and Majority. These are implemented both as custom models and using scikit-learn's `DummyClassifier`. Our dataset involves classifying one out of 30 classes, with a balanced dataset of about 26,500 data points.
### Results Table
It's noteworthy that the performance metrics for our baseline models are **consistent** across both the training and test sets. This uniformity suggests that our data splits are well-balanced and representative, reducing the likelihood of biased or skewed results due to data split anomalies.
The following table summarizes the performance of different baseline models on the **test** set:
If you want to reproduce the results, please run the script [`baselines.py`](baselines.py).
- Macro Average Precision, Recall, F1-Score around 0.033-0.034: These scores are consistent with what you'd expect from a random classifier in a balanced multi-class setting. With 30 classes, a random guess would be correct about 1/30 times, or approximately 0.033. The consistency in results between our custom implementation and scikit-learn's version reinforces the correctness of our implementation.

- Micro Average Precision, Recall, F1-Score around 0.034-0.035: Micro averages aggregate the contributions of all classes to compute the average metric. In a balanced dataset, micro and macro averages tend to be similar, as seen here.
### Random Baseline (Custom & Scikit-learn):
### Majority Baseline (Custom & Scikit-learn)
- Macro Average Precision, Recall, F1-Score around 0.001, 0.033, 0.003: The precision here is particularly low because the majority classifier always predicts the same class. In a balanced dataset with 30 classes, this means it will be correct only 1/30 times, but precision penalizes it for the other 29/30 times it is incorrect. Recall remains constant as it's just the hit rate of the single majority class.
- Macro Average Precision, Recall, F1-Score around 0.033-0.034: These scores are consistent with what you'd expect from a random classifier in a balanced multi-class setting. With 30 classes, a random guess would be correct about 1/30 times, or approximately 0.033. The consistency in results between our custom implementation and scikit-learn's version reinforces the correctness of our implementation.
- Micro Average Precision, Recall, F1-Score around 0.041: The micro average is slightly higher because it accounts for the overall success rate across all classes. Since one class is always predicted, its success rate dominates this calculation.
- Micro Average Precision, Recall, F1-Score around 0.034-0.035: Micro averages aggregate the contributions of all classes to compute the average metric. In a balanced dataset, micro and macro averages tend to be similar, as seen here.
### Additional Interpretations
### Majority Baseline (Custom & Scikit-learn)
- Macro Average Precision, Recall, F1-Score around 0.001, 0.033, 0.003: The precision here is particularly low because the majority classifier always predicts the same class. In a balanced dataset with 30 classes, this means it will be correct only 1/30 times, but precision penalizes it for the other 29/30 times it is incorrect. Recall remains constant as it's just the hit rate of the single majority class.
- Performance Lower Than Random Baseline: If a machine learning model performs worse than the random baseline, it suggests that the model is not learning effectively from the data. It could be due to several factors like poor feature selection, overfitting, or an issue with the training process.
- Micro Average Precision, Recall, F1-Score around 0.041: The micro average is slightly higher because it accounts for the overall success rate across all classes. Since one class is always predicted, its success rate dominates this calculation.
- Performance Lower Than Majority Baseline: This scenario is more alarming because the majority baseline is a very naive model. If a model performs worse than this, it might indicate that the model is doing worse than a naive guess of the most frequent class. This could be due to incorrect model architecture, data preprocessing errors, or other significant issues in the training pipeline.
### Additional Interpretations
- Performance Lower Than Random Baseline: If a machine learning model performs worse than the random baseline, it suggests that the model is not learning effectively from the data. It could be due to several factors like poor feature selection, overfitting, or an issue with the training process.
## Classifiers
- Performance Lower Than Majority Baseline: This scenario is more alarming because the majority baseline is a very naive model. If a model performs worse than this, it might indicate that the model is doing worse than a naive guess of the most frequent class. This could be due to incorrect model architecture, data preprocessing errors, or other significant issues in the training pipeline.
### Naive Bayes
Naive Bayes classifiers are based on Bayes' theorem with the assumption of independence between every pair of features. They are particularly known for their simplicity and efficiency, especially in text classification tasks.
## Classifiers
#### How Naive Bayes Works
### Naive Bayes
Naive Bayes classifiers work by calculating the probability of each class and the conditional probability of each feature belonging to each class. Despite its simplicity, Naive Bayes can yield surprisingly good results, especially in cases where the independence assumption holds.
Naive Bayes classifiers are based on Bayes' theorem with the assumption of independence between every pair of features. They are particularly known for their simplicity and efficiency, especially in text classification tasks.
#### Expected Results
While not as sophisticated as models like CNNs, Naive Bayes classifiers can still be quite effective, especially in scenarios with limited data or computational resources. However, their performance might be limited in our project due to the high interdependence of features in image data.
#### How Naive Bayes Works
Naive Bayes classifiers work by calculating the probability of each class and the conditional probability of each feature belonging to each class. Despite its simplicity, Naive Bayes can yield surprisingly good results, especially in cases where the independence assumption holds.
### Decision Tree
#### Expected Results
Decision Trees are a type of model that makes decisions based on asking a series of questions.
While not as sophisticated as models like CNNs, Naive Bayes classifiers can still be quite effective, especially in scenarios with limited data or computational resources. However, their performance might be limited in our project due to the high interdependence of features in image data.
#### How Decision Trees Work
### Decision Tree
Based on the features of the input data, a decision tree model asks a series of yes/no questions and leads down a path in the tree that eventually results in a decision (in this case, fruit classification). These models are intuitive and easy to visualize but can become complex and prone to overfitting, especially with a large number of features.
Decision Trees are a type of model that makes decisions based on asking a series of questions.
#### Expected Results
For our image classification task, decision trees might struggle with the high dimensionality and complexity of the data. They can serve as a good baseline but are unlikely to outperform more advanced models like CNNs.
#### How Decision Trees Work
Based on the features of the input data, a decision tree model asks a series of yes/no questions and leads down a path in the tree that eventually results in a decision (in this case, fruit classification). These models are intuitive and easy to visualize but can become complex and prone to overfitting, especially with a large number of features.
### Random Forest
#### Expected Results
Random Forest is an ensemble learning method, essentially a collection of decision trees, designed to improve the stability and accuracy of the algorithms.
For our image classification task, decision trees might struggle with the high dimensionality and complexity of the data. They can serve as a good baseline but are unlikely to outperform more advanced models like CNNs.
#### How Random Forests Work
### Random Forest
Random forests create a 'forest' of decision trees, each trained on a random subset of the data. The final output is determined by averaging the predictions of each tree. This approach helps in reducing overfitting, making the model more robust compared to a single decision tree.
Random Forest is an ensemble learning method, essentially a collection of decision trees, designed to improve the stability and accuracy of the algorithms.
#### Expected Results
Random forests are expected to perform better than individual decision trees in our project. They are less likely to overfit and can handle the variability in the image data more effectively. However, they might still fall short in performance compared to more specialized models like CNNs, particularly in high-dimensional tasks like image classification.
#### How Random Forests Work
Random forests create a 'forest' of decision trees, each trained on a random subset of the data. The final output is determined by averaging the predictions of each tree. This approach helps in reducing overfitting, making the model more robust compared to a single decision tree.
### CNN (Convolutional Neural Network)
#### Expected Results
Convolutional Neural Networks (CNNs) are a class of deep neural networks, highly effective for analyzing visual imagery. CNNs are particularly suited for image classification because they can automatically and adaptively learn spatial hierarchies of features from input images. This makes them capable of learning directly from raw images, eliminating the need for manual feature extraction.
Random forests are expected to perform better than individual decision trees in our project. They are less likely to overfit and can handle the variability in the image data more effectively. However, they might still fall short in performance compared to more specialized models like CNNs, particularly in high-dimensional tasks like image classification.
#### How CNNs Work
### CNN (Convolutional Neural Network)
CNNs are composed of multiple layers, including convolutional layers, pooling layers, and fully connected layers. The convolutional layers apply a series of filters to the input image, each filter detecting a specific feature. The pooling layers then downsample the feature maps, reducing their dimensionality. The fully connected layers then process the high-level features and make the final classification.
Convolutional Neural Networks (CNNs) are a class of deep neural networks, highly effective for analyzing visual imagery. CNNs are particularly suited for image classification because they can automatically and adaptively learn spatial hierarchies of features from input images. This makes them capable of learning directly from raw images, eliminating the need for manual feature extraction.
#### Expected Results
#### How CNNs Work
The CNN model is expected to outperform the baseline models and the basic classifiers (Naive Bayes, Decision Trees, and Random Forest) due to its higher capacity to learn complex features from images. CNNs are particularly effective for image classification tasks, and we anticipate that our CNN model will achieve the highest accuracy among all the models we've implemented.
CNNs are composed of multiple layers, including convolutional layers, pooling layers, and fully connected layers. The convolutional layers apply a series of filters to the input image, each filter detecting a specific feature. The pooling layers then downsample the feature maps, reducing their dimensionality. The fully connected layers then process the high-level features and make the final classification.
## Experiments & Results
#### Expected Results
### Overview
The CNN model is expected to outperform the baseline models and the basic classifiers (Naive Bayes, Decision Trees, and Random Forest) due to its higher capacity to learn complex features from images. CNNs are particularly effective for image classification tasks, and we anticipate that our CNN model will achieve the highest accuracy among all the models we've implemented.
We have conducted a series of experiments to evaluate the performance of different classifiers on our fruit image classification task. These experiments involve testing various feature combinations, hyperparameters, and picture sizes to identify the optimal model configuration. The results of these experiments are presented below, providing insights into the performance of each classifier and the impact of different configurations on the model's accuracy.
## Experiments & Results
To reproduce the results, please refer to the [README](src/README.md) in the src folder for instructions on how to run the experiments.
### Overview
### Feature Engineering
We have conducted a series of experiments to evaluate the performance of different classifiers on our fruit image classification task. These experiments involve testing various feature combinations, hyperparameters, and picture sizes to identify the optimal model configuration. The results of these experiments are presented below, providing insights into the performance of each classifier and the impact of different configurations on the model's accuracy.
- we are experimenting with different feature combinations, which can be used as an input parameter for our `classify_images.py` script
To reproduce the results, please refer to the [README](src/README.md) in the src folder for instructions on how to run the experiments.
| Feature / Filter | Length | Description |
| ---------------- | ------ | ----------- |
### Feature Engineering
| `standard` | 7500 | no filters applied, only the raw pixel values (RGB) are used as features |
- we are experimenting with different feature combinations, which can be used as an input parameter for our `classify_images.py` script
| `hsv` | the images are converted to HSV color space and the HSV values are used as features |
| `sobel` | 7500 | the images are
| Feature / Filter | Length | Description |
| ---------------- | ------ | ----------- |
### Naive Bayes
| `standard` | 7500 | no filters applied, only the raw pixel values (RGB) are used as features |
| `hsv` | the images are converted to HSV color space and the HSV values are used as features |
**Poor results** for all experiments with a Naive Bayes classifier :thumbsdown:. The best results are achieved using the HSV + Sobel filters. The accuracy (0.178) though is still better than our baseline.
| 50x50 | No filters | 0.113 | `{'var_smoothing': 5.5 * 10^-6}` | :arrow_right_hook: [GridSearch results](figures/naive_bayes/grid_search_results_50x50_standard_var_smoothing.png) and [confusion matrix](figures/naive_bayes/GaussianNB_50x50_standard_confusion_matrix_var_smoothing_5.455594781168525e-06.png) |
**Poor results** for all experiments with a Naive Bayes classifier :thumbsdown:. The best results are achieved using the HSV + Sobel filters. The accuracy (0.178) though is still better than our baseline.
- accuracy on the training set is also never higher than 0.20 :arrow_right: the classifier is not overfitting but also not learning anything
- for some classes, the diagonal is quite bright (e.g. apricots and passion fruits) :arrow_right: the classifier is quite good at predicting these classes
- but we also see that the classifier has a **strong bias** towards some classes (e.g. apricots, jostaberries and passion fruits and figs)
- for some classes, the diagonal is quite bright (e.g. apricots and passion fruits) :arrow_right: the classifier is quite good at predicting these classes
### Decision Tree
- but we also see that the classifier has a **strong bias** towards some classes (e.g. apricots, jostaberries and passion fruits and figs)


### Random Forest
### Decision Tree
**Feature Combinations:**

**50x50_** images
### Random Forest
**Feature Combinations:**
Ausgetestet:
```json
**50x50_** images
param_grid={
"max_depth":list(range(10,81,10))+[None],
Ausgetestet:
"n_estimators":[10,50,100],
"max_features":["sqrt","log2"],
```json
"min_samples_leaf":[2,5,10,20],
param_grid={
"min_samples_split":[2,5,10,20]
"max_depth":list(range(10,81,10))+[None],
}
"n_estimators":[10,50,100],
```
"max_features":["sqrt","log2"],
**Optimization:**
"min_samples_leaf":[2,5,10,20],
-*"no filters"* = RGB values as features
"min_samples_split":[2,5,10,20]
}
| Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
| 125x125 | No filters | 0.411 | `{'max_depth': 70, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 100}` | 8.56 min | 0.011 min
- Classifiers both make the same mistakes, e.g. confusing raspberries, redcurrants and strawberries :strawberry: (see bottom right corner of confusion matrix)
- the importance of the parameters can be seen in the following figures:
- Observations:
- using the grid like we did, many combinations of parameters are tested and the best combination is chosen
- Classifiers both make the same mistakes, e.g. confusing raspberries, redcurrants and strawberries :strawberry: (see bottom right corner of confusion matrix)
- if we also want to find out how the parameters influence the accuracy, we can visualize the results of the grid search as below; the code we used for this is slightly adapted from a [stackoverflow response](https://stackoverflow.com/questions/37161563/how-to-graph-grid-scores-from-gridsearchcv)
- :mag: the figure shows the accuracy when all parameters are fixed to their best value except for the one for which the accuracy is plotted (both for train and dev set)
- GridSearch:
- the importance of the parameters can be seen in the following figures:
- using the grid like we did, many combinations of parameters are tested and the best combination is chosen

- if we also want to find out how the parameters influence the accuracy, we can visualize the results of the grid search as below; the code we used for this is slightly adapted from a [stackoverflow response](https://stackoverflow.com/questions/37161563/how-to-graph-grid-scores-from-gridsearchcv)
- :mag: the figure shows the accuracy when all parameters are fixed to their best value except for the one for which the accuracy is plotted (both for train and dev set)
Confusion Matrix - No filters - best parameters | Confusion Matrix - HSV features - best parameters
We have implemented a Convolutional Neural Network (CNN) for our fruit image classification task. The CNN model consists of multiple layers, including three convolutional layers and pooling layers, a dropout layer and two fully connected layers with one of them being the output layer which uses the softmax function. The model architecture is a basic one and could be further optimized by changing the number of layers, the number of filters, the kernel size, the activation function, the dropout rate, etc.
### CNN (Convolutional Neural Network)
#### Testing Results
#### Model Architecture
The CNN model achieved an accuracy of **0.68** on the development set. This is a significant improvement over the baseline models and the basic classifiers. The model's performance on the training set was a little higher, with an accuracy of **0.83**. This suggests that the model is not overfitting, and it is learning effectively from the training data. The model's performance on the test set is expected to be similar to the development set, given the balanced nature of our dataset.
We have implemented a Convolutional Neural Network (CNN) for our fruit image classification task. The CNN model consists of multiple layers, including three convolutional layers and pooling layers, a dropout layer and two fully connected layers with one of them being the output layer which uses the softmax function. The model architecture is a basic one and could be further optimized by changing the number of layers, the number of filters, the kernel size, the activation function, the dropout rate, etc.
The learning curve of the CNN model shows that the model is learning effectively from the training data, with the training and validation loss decreasing and the accuracy increading over time. Also we can see that after about **25-30** Epochs the model reaches its best performance and the training stagnates.
The CNN model achieved an accuracy of **0.68** on the development set. This is a significant improvement over the baseline models and the basic classifiers. The model's performance on the training set was a little higher, with an accuracy of **0.83**. This suggests that the model is not overfitting, and it is learning effectively from the training data. The model's performance on the test set is expected to be similar to the development set, given the balanced nature of our dataset.
The confusion matrix of the CNN model shows that the model is performing well across all classes, with high accuracy for most classes. The model's performance is consistent across most classes, with some problems in distinguishing between similar classes, such as mandarins, oranges and grapefruits. Also the mandarine class is the one with the lowest accuracy because it is underrepresented in the dataset.
The learning curve of the CNN model shows that the model is learning effectively from the training data, with the training and validation loss decreasing and the accuracy increading over time. Also we can see that after about **25-30** Epochs the model reaches its best performance and the training stagnates.
With looking at some of the images that were misclassified, we can see that the model is sometimes confused by the similar shape and color of the fruits. For example, as stated before, the model often confuses mandarins, oranges and grapefruits.
The confusion matrix of the CNN model shows that the model is performing well across all classes, with high accuracy for most classes. The model's performance is consistent across most classes, with some problems in distinguishing between similar classes, such as mandarins, oranges and grapefruits. Also the mandarine class is the one with the lowest accuracy because it is underrepresented in the dataset.
With looking at some of the images that were misclassified, we can see that the model is sometimes confused by the similar shape and color of the fruits. For example, as stated before, the model often confuses mandarins, oranges and grapefruits.
Having tested different feature combinations, hyperparameters and picture sizes on the development set, we have found our optimal parameters for the final tests on the **test set**. The results are presented in the following diagram:

### Final Results
The CNN model achieved the highest accuracy of **0.68**, followed by the Random Forest model with an accuracy of **0.48**. The Naive Bayes and Decision Tree models achieved the lowest accuracies, with **0.18** and **0.39**, respectively. The Random Forest model's performance is particularly noteworthy, as it achieved an accuracy of **0.48** using the HSV feature combination on 50x50 images. This is a good improvement over the baseline models and the other basic classifiers, demonstrating the effectiveness of the Random Forest model for our fruit image classification task. The CNN model's performance is also impressive, achieving an accuracy of **0.68**. This is a confirmation of the effectiveness of CNNs for image classification tasks, and it is a significant improvement over the baseline models and the basic classifiers.
Having tested different feature combinations, hyperparameters and picture sizes on the development set, we have found our optimal parameters for the final tests on the **test set**. The results are presented in the following diagram:
The performance on the dev and test set is (as expected) nearly the same, which is a good sign that the model is not overfitting and again shows the balanced nature of our dataset.

### Feature Importance
The CNN model achieved the highest accuracy of **0.68**, followed by the Random Forest model with an accuracy of **0.48**. The Naive Bayes and Decision Tree models achieved the lowest accuracies, with **0.18** and **0.39**, respectively. The Random Forest model's performance is particularly noteworthy, as it achieved an accuracy of **0.48** using the HSV feature combination on 50x50 images. This is a good improvement over the baseline models and the other basic classifiers, demonstrating the effectiveness of the Random Forest model for our fruit image classification task. The CNN model's performance is also impressive, achieving an accuracy of **0.68**. This is a confirmation of the effectiveness of CNNs for image classification tasks, and it is a significant improvement over the baseline models and the basic classifiers.
*What are the most important features for our classification models?*
The performance on the dev and test set is (as expected) nearly the same, which is a good sign that the model is not overfitting and again shows the balanced nature of our dataset.
To answer this question, we can use the `feature_importances_` attribute of the Decision Tree and Random Forest models. Because the Naive Bayes and CNN models do not have a direct feature importance attribute, we will focus on the Decision Tree and Random Forest models for this analysis.
### Feature Importance
When using the RGB or HSV values as features, we have three features for each pixel. In order to visualize the feature importance, we **sum the feature importances for each pixel** and reshape the resulting array to the original image shape that was used for training the model. This way, we can visualize the feature importance for each pixel in the image.
*What are the most important features for our classification models?*
As can be seen in the following plot, the **pixels in the middle** have higher values and are thus more important for the classification than the pixels near the edges. The same pattern is found for all decision tree and random forest models that we have trained. This meets our expectations, as the middle of the image is **where the fruit is typically located** and the edges are often just the background.
To answer this question, we can use the `feature_importances_` attribute of the Decision Tree and Random Forest models. Because the Naive Bayes and CNN models do not have a direct feature importance attribute, we will focus on the Decision Tree and Random Forest models for this analysis.
When using the RGB or HSV values as features, we have three features for each pixel. In order to visualize the feature importance, we **sum the feature importances for each pixel** and reshape the resulting array to the original image shape that was used for training the model. This way, we can visualize the feature importance for each pixel in the image.
As can be seen in the following plot, the **pixels in the middle** have higher values and are thus more important for the classification than the pixels near the edges. The same pattern is found for all decision tree and random forest models that we have trained. This meets our expectations, as the middle of the image is **where the fruit is typically located** and the edges are often just the background.
*How well do our models perform with a reduced dataset size for training?*
We have also tested training a random forest model and a CNN model on a reduced dataset size. In the Diagram below you can see the results of the tests. The results show that the performance of the models decreases with a reduced dataset size but we can still achieve a good performance with 50% or more of the original dataset size.
### Data Reduction

*How well do our models perform with a reduced dataset size for training?*
We have also tested training a random forest model and a CNN model on a reduced dataset size. In the Diagram below you can see the results of the tests. The results show that the performance of the models decreases with a reduced dataset size but we can still achieve a good performance with 50% or more of the original dataset size.
## Challenges & Solutions

One significant challenge we faced in our fruit image classification project was the need for substantial computational resources. This was mainly due to the intensive nature of training deep learning models, especially when dealing with a dataset that, while not overly large, was complex enough to require advanced processing.
To effectively manage this challenge, we utilized the BWUniCluster and the CoLiCluster for our computational needs. These high-performance computing clusters provided us with the necessary power to train our models efficiently. By leveraging these resources, we were able to conduct extensive training and experimentation with our models, which would have been considerably slower or even impractical with standard computing setups.
## Challenges & Solutions
This approach not only expedited our training process but also allowed us to explore and refine our models to a greater extent, leading to more robust and accurate classification results. It highlights the importance of having access to appropriate computational resources in handling sophisticated machine learning tasks, even when the dataset size is not exceedingly large.
One significant challenge we faced in our fruit image classification project was the need for substantial computational resources. This was mainly due to the intensive nature of training deep learning models, especially when dealing with a dataset that, while not overly large, was complex enough to require advanced processing.
Moreover, we recognize that the task itself is inherently challenging due to the nature of our (party self choosen) dataset. Many fruits look remarkably similar to each other, and the variability in their appearance, like different stages of ripeness or with/without peeling, adds another layer of complexity to the classification task. These factors make the project not just a test of our technical skills but also an exploration into the intricate world of image recognition and classification.
To effectively manage this challenge, we utilized the BWUniCluster and the CoLiCluster for our computational needs. These high-performance computing clusters provided us with the necessary power to train our models efficiently. By leveraging these resources, we were able to conduct extensive training and experimentation with our models, which would have been considerably slower or even impractical with standard computing setups.
## Conclusion
This approach not only expedited our training process but also allowed us to explore and refine our models to a greater extent, leading to more robust and accurate classification results. It highlights the importance of having access to appropriate computational resources in handling sophisticated machine learning tasks, even when the dataset size is not exceedingly large.
Moreover, we recognize that the task itself is inherently challenging due to the nature of our (party self choosen) dataset. Many fruits look remarkably similar to each other, and the variability in their appearance, like different stages of ripeness or with/without peeling, adds another layer of complexity to the classification task. These factors make the project not just a test of our technical skills but also an exploration into the intricate world of image recognition and classification.