fetcher
orchard.data_handler.fetcher
¶
Dataset Fetching Dispatcher and Loading Interface.
Central entry point for dataset retrieval. Routes each dataset to its
dedicated fetch module inside the fetchers/ sub-package and exposes the
loading functions that return DatasetData containers.
Adding a new domain only requires a new branch in ensure_dataset_npz
and a corresponding module in fetchers/.
DatasetData(path, name, is_rgb, num_classes)
dataclass
¶
Metadata container for a loaded dataset.
Stores path and format info instead of raw arrays to save RAM.
ensure_dataset_npz(metadata, retries=5, delay=5.0)
¶
Dispatcher that routes each dataset to its dedicated fetch pipeline.
Automatically detects dataset type from metadata.name and delegates
to the appropriate download/conversion module. Adding a new domain
(e.g. a new resolution or source) only requires a new branch here and
a corresponding fetch module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
DatasetMetadata
|
Metadata containing URL, MD5, name and target path. |
required |
retries
|
int
|
Max number of download attempts (NPZ fetcher only). |
5
|
delay
|
float
|
Delay (seconds) between retries (NPZ fetcher only). |
5.0
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the successfully validated .npz file. |
Source code in orchard/data_handler/fetcher.py
load_dataset(metadata)
¶
Ensures the dataset is present and returns its metadata container.
load_dataset_health_check(metadata, chunk_size=100)
¶
Quick health-check: inspects only the first chunk_size samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
DatasetMetadata
|
Dataset metadata (URL, MD5, name, path). |
required |
chunk_size
|
int
|
Number of samples to inspect (default 100). |
100
|
Returns:
| Type | Description |
|---|---|
DatasetData
|
DatasetData with format info derived from the chunk. |