Skip to content

metadata

orchard.core.metadata

Dataset Metadata Package.

This package centralizes the specifications for all supported datasets. It serves as the single source of truth for the Orchard, ensuring that data dimensions, labels, and normalization constants are consistent across the entire pipeline.

DatasetMetadata

Bases: BaseModel

Immutable metadata container for a dataset entry.

Holds identity, source, image properties, and normalization constants for both classification and detection datasets. Detection datasets additionally specify an annotation_path for bounding-box labels.

Attributes:

Name Type Description
name str

Short identifier (e.g., 'pathmnist', 'galaxy10').

display_name str

Human-readable name for reporting.

md5_checksum str

MD5 hash for download integrity verification.

url str

Source URL for dataset download.

path Path

Local path to the .npz archive.

classes list[str]

Class labels in index order.

in_channels int

Number of image channels (1=grayscale, 3=RGB).

native_resolution int | None

Native pixel resolution (e.g., 28, 224).

mean tuple[float, ...]

Channel-wise normalization mean.

std tuple[float, ...]

Channel-wise normalization standard deviation.

annotation_path Path | None

Local path to annotation .npz (detection only).

normalization_info property

Formatted mean/std for reporting.

resolution_str property

Formatted resolution string (e.g., '28x28', '224x224').

num_classes property

Total number of target classes.

ClassificationRegistryWrapper

Bases: DatasetRegistryWrapper

Registry wrapper for classification datasets (medical, space, benchmark).

DatasetRegistryWrapper

Bases: BaseModel

Base wrapper for dataset registries.

Provides resolution validation, deep-copied access, and the get_dataset lookup method. Subclasses define which domain registries are available.

Attributes:

Name Type Description
resolution int

Target dataset resolution.

registry dict[str, DatasetMetadata]

Deep-copied metadata registry for the selected resolution.

get_dataset(name)

Retrieve a DatasetMetadata entry by name.

Parameters:

Name Type Description Default
name str

Dataset identifier.

required

Returns:

Type Description
DatasetMetadata

Deep copy of the matching DatasetMetadata.

Raises:

Type Description
KeyError

If dataset not found in registry.

Source code in orchard/core/metadata/wrapper.py
def get_dataset(self, name: str) -> DatasetMetadata:
    """
    Retrieve a DatasetMetadata entry by name.

    Args:
        name: Dataset identifier.

    Returns:
        Deep copy of the matching DatasetMetadata.

    Raises:
        KeyError: If dataset not found in registry.
    """
    if name not in self.registry:
        available = list(self.registry.keys())
        raise KeyError(f"Dataset '{name}' not found. Available: {available}")

    return copy.deepcopy(self.registry[name])

DetectionRegistryWrapper

Bases: DatasetRegistryWrapper

Registry wrapper for detection datasets.

get_registry(resolution, task_type='classification')

Factory function to obtain the correct registry wrapper for a task.

Parameters:

Name Type Description Default
resolution int

Target image resolution.

required
task_type str

"classification" or "detection".

'classification'

Returns:

Type Description
DatasetRegistryWrapper

Registry wrapper with datasets available for the given task and resolution.

Raises:

Type Description
ValueError

If task_type is not "classification" or "detection".

Source code in orchard/core/metadata/wrapper.py
def get_registry(
    resolution: int,
    task_type: str = "classification",
) -> DatasetRegistryWrapper:
    """
    Factory function to obtain the correct registry wrapper for a task.

    Args:
        resolution: Target image resolution.
        task_type: ``"classification"`` or ``"detection"``.

    Returns:
        Registry wrapper with datasets available for the given task and resolution.

    Raises:
        ValueError: If ``task_type`` is not ``"classification"`` or ``"detection"``.
    """
    if task_type == "detection":
        return DetectionRegistryWrapper(resolution=resolution)
    if task_type != "classification":
        raise ValueError(
            f"Unknown task_type: {task_type!r}. Expected 'classification' or 'detection'."
        )
    return ClassificationRegistryWrapper(resolution=resolution)