dataset_config

`orchard.core.config.dataset_config` ¶

Dataset Registry Orchestration & Metadata Resolution.

Bridges static dataset metadata with runtime execution requirements. Normalizes datasets regardless of native format (Grayscale/RGB) to meet model architecture input specifications. Supports multi-resolution (28x28 through 224x224) with proper YAML override while maintaining frozen immutability.

Key Responsibilities

Adaptive normalization: Adjusts mean/std based on channel logic
Feature promotion: Automates grayscale-to-RGB for ImageNet weights
Resource budgeting: Enforces sampling limits and class balancing
Multi-resolution support: Resolves metadata by selected resolution

`DatasetConfig` ¶

Bases: BaseModel

Validated manifest for dataset execution context.

Bridges static registry metadata with runtime preferences. Resolves channel promotion and sampling policies with multi-resolution support. Auto-syncs img_size with resolution when not explicitly set.

Attributes:

Name	Type	Description
`name`	`str`	Dataset identifier from registry (e.g., 'bloodmnist', 'organcmnist').
`metadata`	`DatasetMetadata \| None`	DatasetMetadata object (excluded from serialization).
`data_root`	`ValidatedPath`	Root directory containing dataset files.
`use_weighted_sampler`	`bool`	Enable class-balanced sampling for imbalanced datasets.
`max_samples`	`PositiveInt \| None`	Maximum samples to load (None=all).
`val_ratio`	`Probability`	Fraction of max_samples for val/test splits (default 0.10).
`img_size`	`ImageSize \| None`	Target square resolution for model input (auto-synced).
`force_rgb`	`bool`	Convert grayscale to RGB for pretrained ImageNet weights.
`resolution`	`int`	Target resolution variant (28, 32, 64, 128, or 224).
`lazy_loading`	`bool`	Use memory-mapped loading instead of eagerly loading into RAM.

`dataset_name` `property` ¶

Get dataset identifier from metadata.

Returns:

Type	Description
`str`	Dataset name string (e.g., 'bloodmnist').

`num_classes` `property` ¶

Get number of target classes.

Returns:

Type	Description
`int`	Integer count of classification classes.

`in_channels` `property` ¶

Get native input channels from metadata.

Returns:

Type	Description
`int`	1 for grayscale datasets, 3 for RGB.

`effective_in_channels` `property` ¶

Get actual channels the model receives after promotion.

Returns:

Type	Description
`int`	3 if force_rgb enabled, otherwise native in_channels.

`mean` `property` ¶

Get channel-wise normalization mean.

Expands single-channel mean to 3 channels if force_rgb is enabled on a grayscale dataset.

Returns:

Type	Description
`tuple[float, ...]`	tuple of mean values per channel.

`std` `property` ¶

Get channel-wise normalization standard deviation.

Expands single-channel std to 3 channels if force_rgb is enabled on a grayscale dataset.

Returns:

Type	Description
`tuple[float, ...]`	tuple of std values per channel.

`effective_is_anatomical` `property` ¶

Whether the dataset is anatomical (conservative default: True).

Returns:

Type	Description
`bool`	Metadata flag if available, else True (safest for augmentation).

`effective_is_texture_based` `property` ¶

Whether the dataset is texture-based (conservative default: True).

Returns:

Type	Description
`bool`	Metadata flag if available, else True (safest for augmentation).

`processing_mode` `property` ¶

Get description of channel processing mode.

Returns:

Type	Description
`str`	'NATIVE-RGB' for RGB datasets, 'RGB-PROMOTED' for grayscale
`str`	with force_rgb, or 'NATIVE-GRAY' for grayscale without promotion.

`validate_resolution(v)` `classmethod` ¶

Enforce resolution against supported registry values.

Source code in orchard/core/config/dataset_config.py

@field_validator("resolution")
@classmethod
def validate_resolution(cls, v: int) -> int:
    """
    Enforce resolution against supported registry values.
    """
    if v not in SUPPORTED_RESOLUTIONS:
        raise OrchardConfigError(
            f"resolution={v} is not supported. Choose from {sorted(SUPPORTED_RESOLUTIONS)}."
        )
    return v

`validate_min_samples(v)` `classmethod` ¶

Enforce minimum sample count for meaningful train/val/test splits.

Source code in orchard/core/config/dataset_config.py

@field_validator("max_samples")
@classmethod
def validate_min_samples(cls, v: int | None) -> int | None:
    """
    Enforce minimum sample count for meaningful train/val/test splits.
    """
    if v is not None and v < 20:
        raise OrchardConfigError(
            f"max_samples={v} is too small for meaningful train/val/test splits. "
            f"Use max_samples >= 20 or None to load all samples."
        )
    return v

`sync_img_size_with_resolution(values)` `classmethod` ¶

Auto-sync img_size with resolution if not explicitly set.

This runs BEFORE frozen instantiation, allowing us to modify values.

Logic: 1. If img_size is explicitly set in YAML/args → keep it 2. If img_size is None/missing → use resolution 3. If metadata exists → use metadata.native_resolution

Parameters:

Name	Type	Description	Default
`values`	`dict[str, Any]`	Raw input dict before Pydantic validation	required

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Modified values dict with synced img_size

Source code in orchard/core/config/dataset_config.py

@model_validator(mode="before")
@classmethod
def sync_img_size_with_resolution(cls, values: dict[str, Any]) -> dict[str, Any]:
    """
    Auto-sync img_size with resolution if not explicitly set.

    This runs BEFORE frozen instantiation, allowing us to modify values.

    Logic:
    1. If img_size is explicitly set in YAML/args → keep it
    2. If img_size is None/missing → use resolution
    3. If metadata exists → use metadata.native_resolution

    Args:
        values (dict[str, Any]): Raw input dict before Pydantic validation

    Returns:
        dict[str, Any]: Modified values dict with synced img_size
    """
    img_size = values.get("img_size")
    resolution = values.get("resolution", 28)
    metadata = values.get("metadata")

    if img_size is None:
        if metadata is not None:
            # Use metadata's native resolution
            img_size = metadata.native_resolution
        else:
            # Use resolution parameter
            img_size = resolution

        values["img_size"] = img_size

    return values

dataset_config

orchard.core.config.dataset_config ¶

DatasetConfig ¶

dataset_name property ¶

num_classes property ¶

in_channels property ¶

effective_in_channels property ¶

mean property ¶

std property ¶

effective_is_anatomical property ¶

effective_is_texture_based property ¶

processing_mode property ¶

validate_resolution(v) classmethod ¶

validate_min_samples(v) classmethod ¶

sync_img_size_with_resolution(values) classmethod ¶

`orchard.core.config.dataset_config` ¶

`DatasetConfig` ¶

`dataset_name` `property` ¶

`num_classes` `property` ¶

`in_channels` `property` ¶

`effective_in_channels` `property` ¶

`mean` `property` ¶

`std` `property` ¶

`effective_is_anatomical` `property` ¶

`effective_is_texture_based` `property` ¶

`processing_mode` `property` ¶

`validate_resolution(v)` `classmethod` ¶

`validate_min_samples(v)` `classmethod` ¶

`sync_img_size_with_resolution(values)` `classmethod` ¶