Add TOC and overview

dab396b0 · igraf · 48659887 · dab396b0
Commit dab396b0 authored 1 year ago by igraf
--- a/project/data/data_preprocessing/README.md
+++ b/project/data/data_preprocessing/README.md
 # Data Preprocessing

+[[_TOC_]]
+
+## Overview
+
+This folder contains the following scripts
+- `fruit_dataset_splitter.py` :arrow_right: to prepare your dataset
+- `fruit_dataset_analyze.py` :arrow_right: to find out more about your data
+
+## ✔ Setup
 First create a virtual environment and install the required packages:

 ```bash
@@ -8,9 +17,6 @@ source venv/bin/activate
 pip install -r requirements.txt
 ```

-### And then run the script:
-
-
 ## `fruit_dataset_splitter.py`

 The script `fruit_dataset_splitter.py` is designed to split an image dataset into training, development, and test subsets. It filters the dataset to include only specified fruit classes, and then randomly divides the images into the three subsets.
@@ -31,8 +37,6 @@ Each run of the script will:
 ---


-### Find out more about the data:
-
 ## `fruit_dataset_analyze.py`

 The script `fruit_dataset_analyze.py` analyzes a dataset of images, providing insights into the class distribution across training, development, and test subsets. It counts the number of images per class and visualizes this distribution in a histogram.
@@ -49,7 +53,7 @@ Each run of the script will:
 - Generate a DataFrame containing the counts of images per class across the entire dataset.
 - Save this DataFrame as a CSV file to `../class_counts.csv`.
 - Create a histogram showing the distribution of images across different classes in the dataset.
- Save the histogram plot to `../../figures/class_distribution_histogram-2.png`.
+- Save the histogram plot to `../../figures/class_distribution_histogram.png`.

 The script will also print the DataFrame, the total number of images in the dataset, and the file path of the saved histogram.