fetchers
orchard.data_handler.fetchers
¶
Domain-Specific Dataset Fetchers.
Each module in this sub-package handles the download and conversion logic
for a single dataset domain (MedMNIST, Galaxy10, etc.), keeping the main
fetcher dispatcher clean as new sources are added.
Design note: fetcher modules intentionally duplicate some logic (e.g. stratified splitting) rather than sharing a base class. Each fetcher is a self-contained adapter to an external resource whose URL, format, or availability may change without notice. Isolation ensures that breaking changes in one source never cascade to others, and that any single fetcher can be removed cleanly.
ensure_cifar_npz(metadata)
¶
Ensures a CIFAR dataset is downloaded and converted to NPZ format.
Supports both CIFAR-10 and CIFAR-100 via metadata.name routing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
DatasetMetadata
|
DatasetMetadata with name ('cifar10' or 'cifar100') and path |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to validated NPZ file |
Source code in orchard/data_handler/fetchers/cifar_converter.py
ensure_galaxy10_npz(metadata)
¶
Ensures Galaxy10 is downloaded and converted to NPZ format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
DatasetMetadata
|
DatasetMetadata with URL and path |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to validated NPZ file |
Source code in orchard/data_handler/fetchers/galaxy10_converter.py
ensure_medmnist_npz(metadata, retries=5, delay=5.0)
¶
Downloads a MedMNIST NPZ file with retries and MD5 validation.
Implements a three-phase strategy
- Return immediately if a valid local copy already exists.
- Delete any corrupted local copy.
- Stream-download with retry loop and atomic file replacement.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
DatasetMetadata
|
Metadata containing URL, MD5, name and target path. |
required |
retries
|
int
|
Max number of download attempts. |
5
|
delay
|
float
|
Base delay (seconds) between retries (quadratic backoff on 429). |
5.0
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Path to the successfully validated .npz file. |
Raises:
| Type | Description |
|---|---|
OrchardDatasetError
|
If all download attempts fail. |