fetcher

`orchard.data_handler.fetcher` ¶

Dataset Fetching Dispatcher and Loading Interface.

Central entry point for dataset retrieval. Routes each dataset to its dedicated fetch module inside the fetchers/ sub-package and exposes the loading functions that return DatasetData containers.

Adding a new domain only requires a new branch in ensure_dataset_npz and a corresponding module in fetchers/.

`DatasetData(path, name, is_rgb, num_classes)` `dataclass` ¶

Metadata container for a loaded dataset.

Stores path and format info instead of raw arrays to save RAM.

`ensure_dataset_npz(metadata, retries=5, delay=5.0)` ¶

Dispatcher that routes each dataset to its dedicated fetch pipeline.

Automatically detects dataset type from metadata.name and delegates to the appropriate download/conversion module. Adding a new domain (e.g. a new resolution or source) only requires a new branch here and a corresponding fetch module.

Parameters:

Name	Type	Description	Default
`metadata`	`DatasetMetadata`	Metadata containing URL, MD5, name and target path.	required
`retries`	`int`	Max number of download attempts (NPZ fetcher only).	`5`
`delay`	`float`	Delay (seconds) between retries (NPZ fetcher only).	`5.0`

Returns:

Name	Type	Description
`Path`	`Path`	Path to the successfully validated .npz file.

Source code in orchard/data_handler/fetcher.py

def ensure_dataset_npz(
    metadata: DatasetMetadata,
    retries: int = 5,
    delay: float = 5.0,
) -> Path:
    """
    Dispatcher that routes each dataset to its dedicated fetch pipeline.

    Automatically detects dataset type from ``metadata.name`` and delegates
    to the appropriate download/conversion module. Adding a new domain
    (e.g. a new resolution or source) only requires a new branch here and
    a corresponding fetch module.

    Args:
        metadata (DatasetMetadata): Metadata containing URL, MD5, name and target path.
        retries (int): Max number of download attempts (NPZ fetcher only).
        delay (float): Delay (seconds) between retries (NPZ fetcher only).

    Returns:
        Path: Path to the successfully validated .npz file.
    """
    # Galaxy10 requires HDF5 download and conversion to NPZ
    if metadata.name == "galaxy10":
        from .fetchers import ensure_galaxy10_npz

        return ensure_galaxy10_npz(metadata)

    # CIFAR-10/100 via torchvision download and NPZ conversion
    if metadata.name in ("cifar10", "cifar100"):
        from .fetchers import ensure_cifar_npz

        return ensure_cifar_npz(metadata)

    # Default: standard NPZ download with retries and MD5 check
    from .fetchers import ensure_medmnist_npz

    return ensure_medmnist_npz(metadata, retries=retries, delay=delay)

`load_dataset(metadata)` ¶

Ensures the dataset is present and returns its metadata container.

Source code in orchard/data_handler/fetcher.py

def load_dataset(metadata: DatasetMetadata) -> DatasetData:
    """
    Ensures the dataset is present and returns its metadata container.
    """
    return _load_and_inspect(metadata)

`load_dataset_health_check(metadata, chunk_size=100)` ¶

Quick health-check: inspects only the first chunk_size samples.

Parameters:

Name	Type	Description	Default
`metadata`	`DatasetMetadata`	Dataset metadata (URL, MD5, name, path).	required
`chunk_size`	`int`	Number of samples to inspect (default 100).	`100`

Returns:

Type	Description
`DatasetData`	DatasetData with format info derived from the chunk.

Source code in orchard/data_handler/fetcher.py

def load_dataset_health_check(metadata: DatasetMetadata, chunk_size: int = 100) -> DatasetData:
    """
    Quick health-check: inspects only the first *chunk_size* samples.

    Args:
        metadata: Dataset metadata (URL, MD5, name, path).
        chunk_size: Number of samples to inspect (default 100).

    Returns:
        DatasetData with format info derived from the chunk.
    """
    return _load_and_inspect(metadata, chunk_size=chunk_size)

fetcher

orchard.data_handler.fetcher ¶

DatasetData(path, name, is_rgb, num_classes) dataclass ¶

ensure_dataset_npz(metadata, retries=5, delay=5.0) ¶

load_dataset(metadata) ¶

load_dataset_health_check(metadata, chunk_size=100) ¶

`orchard.data_handler.fetcher` ¶

`DatasetData(path, name, is_rgb, num_classes)` `dataclass` ¶

`ensure_dataset_npz(metadata, retries=5, delay=5.0)` ¶

`load_dataset(metadata)` ¶

`load_dataset_health_check(metadata, chunk_size=100)` ¶