Add link to data preparation instructions

a20547b8 · F1nnH · b456e676 · a20547b8
Commit a20547b8 authored 1 year ago by F1nnH
--- a/project/README.md
+++ b/project/README.md
@@ -57,6 +57,7 @@ We opted not to use cross-validation for the train-dev split, considering the la

 The data partitioning script randomly segregates the images for each fruit class into the designated training, development, and testing sets. This **random allocation per class** is pivotal for maintaining the data integrity and representativeness in each subset, facilitating an unbiased evaluation of the model's performance.

+**To prepare the dataset for using it with this project please refer to the Data Preparation section in the [data folder](data/README.md).**

 ### Data Statistics

@@ -72,7 +73,6 @@ To visually represent the **class-wise distribution**, we plotted a histogram:

 *Balance of Dataset*: The histogram provides visual and easy to see insights into whether the dataset is balanced or unbalanced. In our case, the dataset is mostly balanced. The nectarine, orange and jostaberry classes may have insufficient datapoints. We will keep an eye on the performance of our models on these classes to see if the imbalance has an impact on the model's ability to learn and predict these classes :mag:.

-
 ## Metrics

 - **Accuracy**: The ratio of correctly predicted observations to the total predictions.