From 6dc704053e1ba3b5086f273dce0813b37044ec7e Mon Sep 17 00:00:00 2001
From: igraf <igraf@cl.uni-heidelberg.de>
Date: Fri, 23 Feb 2024 19:29:15 +0100
Subject: [PATCH] Update classifiers part

---
 project/README.md | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/project/README.md b/project/README.md
index 14f9945..b1bec33 100644
--- a/project/README.md
+++ b/project/README.md
@@ -190,6 +190,8 @@ And here are the results for **all dataset splits**:
 
 Naive Bayes classifiers are based on Bayes' theorem with the assumption of independence between every pair of features. They are particularly known for their simplicity and efficiency, especially in text classification tasks.
 
+In our case, we are using the **Gaussian Naive Bayes** classifier, which assumes that the likelihood of the features is Gaussian. Though our image data does not strictly adhere to this assumption, we are still testing the performance of this classifier because we have numerical features which we do not want to regard as discrete features but as continuous ones which can be modeled by a Gaussian distribution with smoothing.
+
 #### How Naive Bayes Works
 Naive Bayes classifiers work by calculating the probability of each class and the conditional probability of each feature belonging to each class. Despite its simplicity, Naive Bayes can yield surprisingly good results, especially in cases where the independence assumption holds.
 
@@ -198,23 +200,28 @@ While not as sophisticated as models like CNNs, Naive Bayes classifiers can stil
 
 ### Decision Tree
 
-Decision Trees are a type of model that makes decisions based on asking a series of questions.
+A decision tree classifier is a flowchart-like structure that recursively divides the dataset into smaller and smaller subsets based on the most significant features. It can also be viewed as a model that makes decisions based on asking a series of questions.
+
 
 #### How Decision Trees Work
 Based on the features of the input data, a decision tree model asks a series of yes/no questions and leads down a path in the tree that eventually results in a decision (in this case, fruit classification). These models are intuitive and easy to visualize but can become complex and prone to overfitting, especially with a large number of features.
 
+A simplified example of a decision tree for our fruit classification task with a maximum depth of 2 is shown below. In each node, the tree compares a feature of the input image to a threshold and decides which branch to follow based on the result. The final decision is made at the leaf nodes.
+<img align="center" width="100%" height="auto" src="figures/decision_tree/plotted_tree_50x50_standard_max_depth_2_max_features_sqrt_min_samples_leaf_1_min_samples_split_2.png" title="test">
+
+
 #### Expected Results
 For our image classification task, decision trees might struggle with the high dimensionality and complexity of the data. They can serve as a good baseline but are unlikely to outperform more advanced models like CNNs.
 
 ### Random Forest
 
-Random Forest is an ensemble learning method, essentially a collection of decision trees, designed to improve the stability and accuracy of the algorithms.
+Random forest is an ensemble learning method, essentially a collection of decision trees, designed to improve the stability and accuracy of the algorithms.
 
 #### How Random Forests Work
-Random forests create a 'forest' of decision trees, each trained on a random subset of the data. The final output is determined by averaging the predictions of each tree. This approach helps in reducing overfitting, making the model more robust compared to a single decision tree.
+Random forests create a *'forest'* of decision trees, each trained on a random subset of the data. The final output is determined by averaging the predictions of each tree. This approach helps in reducing overfitting, making the model more robust compared to a single decision tree.
 
 #### Expected Results
-Random forests are expected to perform better than individual decision trees in our project. They are less likely to overfit and can handle the variability in the image data more effectively. However, they might still fall short in performance compared to more specialized models like CNNs, particularly in high-dimensional tasks like image classification.
+Random forests are expected to perform better than individual decision trees in our project. They are less likely to overfit and can handle the variability in the image data more effectively. However, they might still fall short in performance compared to more specialized models like CNNs, particularly in high-dimensional tasks like our image classification.
 
 ### CNN (Convolutional Neural Network)
 
-- 
GitLab