-`fruit_dataset_splitter.py` :arrow_right: to prepare your dataset
-`fruit_dataset_analyze.py` :arrow_right: to find out more about your data
## ✔ Setup
First create a virtual environment and install the required packages:
```bash
...
...
@@ -8,9 +17,6 @@ source venv/bin/activate
pip install-r requirements.txt
```
### And then run the script:
## `fruit_dataset_splitter.py`
The script `fruit_dataset_splitter.py` is designed to split an image dataset into training, development, and test subsets. It filters the dataset to include only specified fruit classes, and then randomly divides the images into the three subsets.
...
...
@@ -31,8 +37,6 @@ Each run of the script will:
---
### Find out more about the data:
## `fruit_dataset_analyze.py`
The script `fruit_dataset_analyze.py` analyzes a dataset of images, providing insights into the class distribution across training, development, and test subsets. It counts the number of images per class and visualizes this distribution in a histogram.
...
...
@@ -49,7 +53,7 @@ Each run of the script will:
- Generate a DataFrame containing the counts of images per class across the entire dataset.
- Save this DataFrame as a CSV file to `../class_counts.csv`.
- Create a histogram showing the distribution of images across different classes in the dataset.
- Save the histogram plot to `../../figures/class_distribution_histogram-2.png`.
- Save the histogram plot to `../../figures/class_distribution_histogram.png`.
The script will also print the DataFrame, the total number of images in the dataset, and the file path of the saved histogram.