-`fruit_dataset_splitter.py` :arrow_right: to prepare your dataset
-`fruit_dataset_analyze.py` :arrow_right: to find out more about your data
## ✔ Setup
First create a virtual environment and install the required packages:
First create a virtual environment and install the required packages:
```bash
```bash
...
@@ -8,9 +17,6 @@ source venv/bin/activate
...
@@ -8,9 +17,6 @@ source venv/bin/activate
pip install-r requirements.txt
pip install-r requirements.txt
```
```
### And then run the script:
## `fruit_dataset_splitter.py`
## `fruit_dataset_splitter.py`
The script `fruit_dataset_splitter.py` is designed to split an image dataset into training, development, and test subsets. It filters the dataset to include only specified fruit classes, and then randomly divides the images into the three subsets.
The script `fruit_dataset_splitter.py` is designed to split an image dataset into training, development, and test subsets. It filters the dataset to include only specified fruit classes, and then randomly divides the images into the three subsets.
...
@@ -31,8 +37,6 @@ Each run of the script will:
...
@@ -31,8 +37,6 @@ Each run of the script will:
---
---
### Find out more about the data:
## `fruit_dataset_analyze.py`
## `fruit_dataset_analyze.py`
The script `fruit_dataset_analyze.py` analyzes a dataset of images, providing insights into the class distribution across training, development, and test subsets. It counts the number of images per class and visualizes this distribution in a histogram.
The script `fruit_dataset_analyze.py` analyzes a dataset of images, providing insights into the class distribution across training, development, and test subsets. It counts the number of images per class and visualizes this distribution in a histogram.
...
@@ -49,7 +53,7 @@ Each run of the script will:
...
@@ -49,7 +53,7 @@ Each run of the script will:
- Generate a DataFrame containing the counts of images per class across the entire dataset.
- Generate a DataFrame containing the counts of images per class across the entire dataset.
- Save this DataFrame as a CSV file to `../class_counts.csv`.
- Save this DataFrame as a CSV file to `../class_counts.csv`.
- Create a histogram showing the distribution of images across different classes in the dataset.
- Create a histogram showing the distribution of images across different classes in the dataset.
- Save the histogram plot to `../../figures/class_distribution_histogram-2.png`.
- Save the histogram plot to `../../figures/class_distribution_histogram.png`.
The script will also print the DataFrame, the total number of images in the dataset, and the file path of the saved histogram.
The script will also print the DataFrame, the total number of images in the dataset, and the file path of the saved histogram.