Skip to content
Snippets Groups Projects
Commit dab396b0 authored by igraf's avatar igraf
Browse files

Add TOC and overview

parent 48659887
No related branches found
No related tags found
No related merge requests found
# Data Preprocessing # Data Preprocessing
[[_TOC_]]
## Overview
This folder contains the following scripts
- `fruit_dataset_splitter.py` :arrow_right: to prepare your dataset
- `fruit_dataset_analyze.py` :arrow_right: to find out more about your data
## ✔ Setup
First create a virtual environment and install the required packages: First create a virtual environment and install the required packages:
```bash ```bash
...@@ -8,9 +17,6 @@ source venv/bin/activate ...@@ -8,9 +17,6 @@ source venv/bin/activate
pip install -r requirements.txt pip install -r requirements.txt
``` ```
### And then run the script:
## `fruit_dataset_splitter.py` ## `fruit_dataset_splitter.py`
The script `fruit_dataset_splitter.py` is designed to split an image dataset into training, development, and test subsets. It filters the dataset to include only specified fruit classes, and then randomly divides the images into the three subsets. The script `fruit_dataset_splitter.py` is designed to split an image dataset into training, development, and test subsets. It filters the dataset to include only specified fruit classes, and then randomly divides the images into the three subsets.
...@@ -31,8 +37,6 @@ Each run of the script will: ...@@ -31,8 +37,6 @@ Each run of the script will:
--- ---
### Find out more about the data:
## `fruit_dataset_analyze.py` ## `fruit_dataset_analyze.py`
The script `fruit_dataset_analyze.py` analyzes a dataset of images, providing insights into the class distribution across training, development, and test subsets. It counts the number of images per class and visualizes this distribution in a histogram. The script `fruit_dataset_analyze.py` analyzes a dataset of images, providing insights into the class distribution across training, development, and test subsets. It counts the number of images per class and visualizes this distribution in a histogram.
...@@ -49,7 +53,7 @@ Each run of the script will: ...@@ -49,7 +53,7 @@ Each run of the script will:
- Generate a DataFrame containing the counts of images per class across the entire dataset. - Generate a DataFrame containing the counts of images per class across the entire dataset.
- Save this DataFrame as a CSV file to `../class_counts.csv`. - Save this DataFrame as a CSV file to `../class_counts.csv`.
- Create a histogram showing the distribution of images across different classes in the dataset. - Create a histogram showing the distribution of images across different classes in the dataset.
- Save the histogram plot to `../../figures/class_distribution_histogram-2.png`. - Save the histogram plot to `../../figures/class_distribution_histogram.png`.
The script will also print the DataFrame, the total number of images in the dataset, and the file path of the saved histogram. The script will also print the DataFrame, the total number of images in the dataset, and the file path of the saved histogram.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment