Skip to content

metadata

orchard.core.metadata

Dataset Metadata Package.

This package centralizes the specifications for all supported datasets. It serves as the single source of truth for the Orchard, ensuring that data dimensions, labels, and normalization constants are consistent across the entire pipeline.

DatasetMetadata

Bases: BaseModel

Immutable metadata container for a dataset entry.

Ensures dataset-specific constants are grouped and frozen throughout pipeline execution. Serves as static definition feeding into dynamic DatasetConfig.

Attributes:

Name Type Description
name str

Short identifier (e.g., 'pathmnist', 'galaxy10').

display_name str

Human-readable name for reporting.

md5_checksum str

MD5 hash for download integrity verification.

url str

Source URL for dataset download.

path Path

Local path to the .npz archive.

classes list[str]

Class labels in index order.

in_channels int

Number of image channels (1=grayscale, 3=RGB).

native_resolution int | None

Native pixel resolution (e.g., 28, 224).

mean tuple[float, ...]

Channel-wise normalization mean.

std tuple[float, ...]

Channel-wise normalization standard deviation.

is_anatomical bool

Whether images have fixed anatomical orientation.

is_texture_based bool

Whether classification relies on texture patterns.

normalization_info property

Formatted mean/std for reporting.

resolution_str property

Formatted resolution string (e.g., '28x28', '224x224').

num_classes property

Total number of target classes.

DatasetRegistryWrapper

Bases: BaseModel

Pydantic wrapper for multi-domain dataset registries.

Merges domain-specific registries (medical, space) based on the selected resolution and provides validated, deep-copied access to dataset metadata entries.

Attributes:

Name Type Description
resolution int

Target dataset resolution (28, 32, 64, 128, or 224).

registry dict[str, DatasetMetadata]

Deep-copied metadata registry for the selected resolution.

get_dataset(name)

Retrieves specific DatasetMetadata by name.

Parameters:

Name Type Description Default
name str

Dataset identifier

required

Returns:

Type Description
DatasetMetadata

Deep copy of DatasetMetadata

Raises:

Type Description
KeyError

If dataset not found in registry

Source code in orchard/core/metadata/wrapper.py
def get_dataset(self, name: str) -> DatasetMetadata:
    """
    Retrieves specific DatasetMetadata by name.

    Args:
        name: Dataset identifier

    Returns:
        Deep copy of DatasetMetadata

    Raises:
        KeyError: If dataset not found in registry
    """
    if name not in self.registry:
        available = list(self.registry.keys())
        raise KeyError(f"Dataset '{name}' not found. Available: {available}")

    return copy.deepcopy(self.registry[name])