dataset_config
orchard.core.config.dataset_config
¶
Dataset Registry Orchestration & Metadata Resolution.
Bridges static dataset metadata with runtime execution requirements. Normalizes datasets regardless of native format (Grayscale/RGB) to meet model architecture input specifications. Supports multi-resolution (28x28 through 224x224) with proper YAML override while maintaining frozen immutability.
Key Responsibilities
- Adaptive normalization: Adjusts mean/std based on channel logic
- Feature promotion: Automates grayscale-to-RGB for ImageNet weights
- Resource budgeting: Enforces sampling limits and class balancing
- Multi-resolution support: Resolves metadata by selected resolution
DatasetConfig
¶
Bases: BaseModel
Validated manifest for dataset execution context.
Bridges static registry metadata with runtime preferences. Resolves channel promotion and sampling policies with multi-resolution support. Auto-syncs img_size with resolution when not explicitly set.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Dataset identifier from registry (e.g., 'bloodmnist', 'organcmnist'). |
metadata |
DatasetMetadata | None
|
DatasetMetadata object (excluded from serialization). |
data_root |
ValidatedPath
|
Root directory containing dataset files. |
use_weighted_sampler |
bool
|
Enable class-balanced sampling for imbalanced datasets. |
max_samples |
PositiveInt | None
|
Maximum samples to load (None=all). |
val_ratio |
Probability
|
Fraction of max_samples for val/test splits (default 0.10). |
img_size |
ImageSize | None
|
Target square resolution for model input (auto-synced). |
force_rgb |
bool
|
Convert grayscale to RGB for pretrained ImageNet weights. |
resolution |
int
|
Target resolution variant (28, 32, 64, 128, or 224). |
lazy_loading |
bool
|
Use memory-mapped loading instead of eagerly loading into RAM. |
dataset_name
property
¶
Get dataset identifier from metadata.
Returns:
| Type | Description |
|---|---|
str
|
Dataset name string (e.g., 'bloodmnist'). |
num_classes
property
¶
Get number of target classes.
Returns:
| Type | Description |
|---|---|
int
|
Integer count of classification classes. |
in_channels
property
¶
Get native input channels from metadata.
Returns:
| Type | Description |
|---|---|
int
|
1 for grayscale datasets, 3 for RGB. |
effective_in_channels
property
¶
Get actual channels the model receives after promotion.
Returns:
| Type | Description |
|---|---|
int
|
3 if force_rgb enabled, otherwise native in_channels. |
mean
property
¶
Get channel-wise normalization mean.
Expands single-channel mean to 3 channels if force_rgb is enabled on a grayscale dataset.
Returns:
| Type | Description |
|---|---|
tuple[float, ...]
|
tuple of mean values per channel. |
std
property
¶
Get channel-wise normalization standard deviation.
Expands single-channel std to 3 channels if force_rgb is enabled on a grayscale dataset.
Returns:
| Type | Description |
|---|---|
tuple[float, ...]
|
tuple of std values per channel. |
effective_is_anatomical
property
¶
Whether the dataset is anatomical (conservative default: True).
Returns:
| Type | Description |
|---|---|
bool
|
Metadata flag if available, else True (safest for augmentation). |
effective_is_texture_based
property
¶
Whether the dataset is texture-based (conservative default: True).
Returns:
| Type | Description |
|---|---|
bool
|
Metadata flag if available, else True (safest for augmentation). |
processing_mode
property
¶
Get description of channel processing mode.
Returns:
| Type | Description |
|---|---|
str
|
'NATIVE-RGB' for RGB datasets, 'RGB-PROMOTED' for grayscale |
str
|
with force_rgb, or 'NATIVE-GRAY' for grayscale without promotion. |
validate_resolution(v)
classmethod
¶
Enforce resolution against supported registry values.
Source code in orchard/core/config/dataset_config.py
validate_min_samples(v)
classmethod
¶
Enforce minimum sample count for meaningful train/val/test splits.
Source code in orchard/core/config/dataset_config.py
sync_img_size_with_resolution(values)
classmethod
¶
Auto-sync img_size with resolution if not explicitly set.
This runs BEFORE frozen instantiation, allowing us to modify values.
Logic: 1. If img_size is explicitly set in YAML/args → keep it 2. If img_size is None/missing → use resolution 3. If metadata exists → use metadata.native_resolution
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
dict[str, Any]
|
Raw input dict before Pydantic validation |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Modified values dict with synced img_size |