dispatcher
orchard.data_handler.dispatcher
¶
Dataset Fetching Dispatcher and Loading Interface.
Central entry point for dataset retrieval. Routes each dataset to its
dedicated fetch module inside the fetchers/ sub-package and exposes the
loading functions that return DatasetData containers.
Adding a new domain only requires a new branch in ensure_dataset_npz
and a corresponding module in fetchers/.
DatasetData(path, name, is_rgb, num_classes, annotation_path=None)
dataclass
¶
Metadata container for a loaded dataset.
Stores path and format info instead of raw arrays to save RAM.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
Path
|
Path to images NPZ. |
name |
str
|
Dataset identifier. |
is_rgb |
bool
|
Whether images are RGB (3 channels). |
num_classes |
int
|
Number of categories. |
annotation_path |
Path | None
|
Path to annotations NPZ (detection datasets only; None for classification). |
ensure_dataset_npz(metadata, retries=5, delay=5.0)
¶
Dispatcher that routes each dataset to its dedicated fetch pipeline.
Automatically detects dataset type from metadata.name and delegates
to the appropriate download/conversion module. Adding a new domain
(e.g. a new resolution or source) only requires a new branch here and
a corresponding fetch module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
DatasetMetadata
|
Metadata containing URL, MD5, name and target path. |
required |
retries
|
int
|
Max number of download attempts (NPZ fetcher only). |
5
|
delay
|
float
|
Delay (seconds) between retries (NPZ fetcher only). |
5.0
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the successfully validated .npz file. |
Source code in orchard/data_handler/dispatcher.py
load_dataset(metadata)
¶
Ensure the dataset is present on disk and return its inspection results.
Downloads the dataset if missing, then inspects all samples to determine format properties (color mode, class count).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
DatasetMetadata
|
Dataset metadata (URL, MD5, name, path). |
required |
Returns:
| Type | Description |
|---|---|
DatasetData
|
Inspection results with format info derived from the full dataset. |
Source code in orchard/data_handler/dispatcher.py
load_dataset_health_check(metadata, chunk_size=100)
¶
Quick health-check: inspects only the first chunk_size samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
DatasetMetadata
|
Dataset metadata (URL, MD5, name, path). |
required |
chunk_size
|
int
|
Number of samples to inspect (default 100). |
100
|
Returns:
| Type | Description |
|---|---|
DatasetData
|
DatasetData with format info derived from the chunk. |