diagnostic
orchard.data_handler.diagnostic
¶
Diagnostic Utilities for Health Checks and Smoke Tests.
This private submodule provides lightweight data utilities used exclusively for pipeline validation (health checks, smoke tests, CI). These are not part of the production training pipeline.
SyntheticDetectionData(image_path, annotation_path, num_classes, name)
¶
Container for synthetic detection dataset paths and metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
image_path |
Path
|
Path to images NPZ. |
annotation_path |
Path
|
Path to annotations NPZ. |
num_classes |
int
|
Number of object classes (excluding background). |
name |
str
|
Dataset identifier. |
Source code in orchard/data_handler/diagnostic/synthetic_detection.py
create_synthetic_dataset(num_classes=8, samples=100, resolution=28, channels=3, name='syntheticmnist')
¶
Create a synthetic NPZ-compatible dataset for testing.
This function generates random image data and labels, saves them to a temporary .npz file, and returns a DatasetData object that can be used with the existing data pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_classes
|
int
|
Number of target categories (default: 8) |
8
|
samples
|
int
|
Number of training samples (default: 100) |
100
|
resolution
|
int
|
Image resolution (HxW) (default: 28) |
28
|
channels
|
int
|
Number of color channels (default: 3 for RGB) |
3
|
name
|
str
|
Dataset name for identification (default: "syntheticmnist") |
'syntheticmnist'
|
Returns:
| Name | Type | Description |
|---|---|---|
DatasetData |
DatasetData
|
A data object compatible with the existing pipeline |
Example
data = create_synthetic_dataset(num_classes=8, samples=100) train_loader, val_loader, test_loader = get_dataloaders( ... data, cfg.dataset, cfg.training, cfg.augmentation, cfg.num_workers ... )
Source code in orchard/data_handler/diagnostic/synthetic.py
create_synthetic_grayscale_dataset(num_classes=8, samples=100, resolution=28)
¶
Create a synthetic grayscale NPZ dataset for testing.
Convenience function for creating single-channel (grayscale) synthetic data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_classes
|
int
|
Number of target categories (default: 8) |
8
|
samples
|
int
|
Number of training samples (default: 100) |
100
|
resolution
|
int
|
Image resolution (HxW) (default: 28) |
28
|
Returns:
| Name | Type | Description |
|---|---|---|
DatasetData |
DatasetData
|
A grayscale data object compatible with the pipeline |
Source code in orchard/data_handler/diagnostic/synthetic.py
create_synthetic_detection_dataset(num_classes=4, samples=50, resolution=64, channels=3, name='synthetic_detection')
¶
Create a synthetic detection dataset for testing.
Generates random images with random bounding boxes and saves them as NPZ files (images + annotations separately).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_classes
|
int
|
Number of object categories (default: 4). |
4
|
samples
|
int
|
Number of training images (default: 50). |
50
|
resolution
|
int
|
Image size in pixels (default: 64). |
64
|
channels
|
int
|
Color channels (default: 3). |
3
|
name
|
str
|
Dataset identifier (default: "synthetic_detection"). |
'synthetic_detection'
|
Returns:
| Type | Description |
|---|---|
SyntheticDetectionData
|
SyntheticDetectionData with paths to generated NPZ files. |
Source code in orchard/data_handler/diagnostic/synthetic_detection.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
create_temp_loader(dataset_path, batch_size=_DEFAULT_HEALTHCHECK_BATCH_SIZE)
¶
Load a NPZ dataset lazily and return a DataLoader for health checks.
This avoids loading the entire dataset into RAM at once, which is critical for large datasets (e.g., 224x224 images).