diff --git a/project/README.md b/project/README.md index 8bc967116207c033790d5b960a7d7982cd1fe5eb..fbf05affd9b42e99d963f55fb81aed95cb08e036 100644 --- a/project/README.md +++ b/project/README.md @@ -125,13 +125,34 @@ The following table summarizes the performance of different baseline models on t ### Naive Bayes +Naive Bayes classifiers are based on Bayes' theorem with the assumption of independence between every pair of features. They are particularly known for their simplicity and efficiency, especially in text classification tasks. + +#### How Naive Bayes Works +Naive Bayes classifiers work by calculating the probability of each class and the conditional probability of each feature belonging to each class. Despite its simplicity, Naive Bayes can yield surprisingly good results, especially in cases where the independence assumption holds. + +#### Expected Results +While not as sophisticated as models like CNNs, Naive Bayes classifiers can still be quite effective, especially in scenarios with limited data or computational resources. However, their performance might be limited in our project due to the high interdependence of features in image data. + ### Decision Tree +Decision Trees are a type of model that makes decisions based on asking a series of questions. +#### How Decision Trees Work +Based on the features of the input data, a decision tree model asks a series of yes/no questions and leads down a path in the tree that eventually results in a decision (in this case, fruit classification). These models are intuitive and easy to visualize but can become complex and prone to overfitting, especially with a large number of features. +#### Expected Results +For our image classification task, decision trees might struggle with the high dimensionality and complexity of the data. They can serve as a good baseline but are unlikely to outperform more advanced models like CNNs. ### Random Forest +Random Forest is an ensemble learning method, essentially a collection of decision trees, designed to improve the stability and accuracy of the algorithms. + +#### How Random Forests Work +Random forests create a 'forest' of decision trees, each trained on a random subset of the data. The final output is determined by averaging the predictions of each tree. This approach helps in reducing overfitting, making the model more robust compared to a single decision tree. + +#### Expected Results +Random forests are expected to perform better than individual decision trees in our project. They are less likely to overfit and can handle the variability in the image data more effectively. However, they might still fall short in performance compared to more specialized models like CNNs, particularly in high-dimensional tasks like image classification. + ### CNN (Convolutional Neural Network) Convolutional Neural Networks (CNNs) are a class of deep neural networks, highly effective for analyzing visual imagery. CNNs are particularly suited for image classification because they can automatically and adaptively learn spatial hierarchies of features from input images. This makes them capable of learning directly from raw images, eliminating the need for manual feature extraction. @@ -150,7 +171,7 @@ The CNN model is expected to outperform the baseline models and the basic classi We have conducted a series of experiments to evaluate the performance of different classifiers on our fruit image classification task. These experiments involve testing various feature combinations, hyperparameters, and picture sizes to identify the optimal model configuration. The results of these experiments are presented below, providing insights into the performance of each classifier and the impact of different configurations on the model's accuracy. -To reproduce the results, please refer to the README in the src folder for instructions on how to run the experiments. +To reproduce the results, please refer to the [README](src/README.md) in the src folder for instructions on how to run the experiments. ### Feature Engineering - we are experimenting with different feature combinations, which can be used as an input parameter for our `classify_images.py` script