trainer
orchard.trainer
¶
Trainer Package Facade.
This package exposes the central ModelTrainer class, the optimization factories, and the low-level execution engines, providing a unified interface for the training lifecycle.
LoopOptions(grad_clip, total_epochs, mixup_epochs, use_tqdm, monitor_metric)
dataclass
¶
Scalar configuration for a :class:TrainingLoop.
Groups training hyper-parameters that do not depend on PyTorch objects,
keeping the TrainingLoop constructor lean.
Attributes:
| Name | Type | Description |
|---|---|---|
grad_clip |
float | None
|
Max norm for gradient clipping (0 or None disables). |
total_epochs |
int
|
Total number of epochs (for tqdm progress bar). |
mixup_epochs |
int
|
Epoch cutoff after which MixUp is disabled. |
use_tqdm |
bool
|
Whether to show tqdm progress bar. |
monitor_metric |
str
|
Metric key for ReduceLROnPlateau stepping
(e.g. |
TrainingLoop(model, train_loader, val_loader, optimizer, scheduler, criterion, device, scaler, mixup_fn, options)
¶
Single-epoch execution kernel shared by ModelTrainer and TrialTrainingExecutor.
Encapsulates the per-epoch train/validate/schedule cycle. Callers own the outer epoch loop and policy decisions (checkpointing, early stopping, Optuna pruning). This class only executes one epoch at a time.
Attributes:
| Name | Type | Description |
|---|---|---|
model |
Module
|
Neural network to train. |
train_loader |
DataLoader
|
Training data provider. |
val_loader |
DataLoader
|
Validation data provider. |
optimizer |
Optimizer
|
Gradient descent optimizer. |
scheduler |
LRScheduler | None
|
Learning rate scheduler (or None). |
criterion |
Module
|
Loss function. |
device |
device
|
Hardware target (CUDA/MPS/CPU). |
scaler |
GradScaler | None
|
AMP GradScaler (or None). |
mixup_fn |
Callable | None
|
MixUp partial function (or None). |
options |
LoopOptions
|
Scalar training options (see :class: |
Source code in orchard/trainer/_loop.py
run_train_step(epoch)
¶
Execute a single training epoch with MixUp cutoff.
Applies MixUp augmentation only when epoch <= mixup_epochs.
Does not run validation or step the scheduler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
epoch
|
int
|
Current epoch number (1-indexed). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Average training loss for the epoch. |
Source code in orchard/trainer/_loop.py
run_epoch(epoch)
¶
Execute a full train → validate → schedule cycle for one epoch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
epoch
|
int
|
Current epoch number (1-indexed). |
required |
Returns:
| Type | Description |
|---|---|
tuple[float, Mapping[str, float]]
|
Tuple of (average training loss, validation metrics dict). |
Source code in orchard/trainer/_loop.py
ModelTrainer(model, train_loader, val_loader, optimizer, scheduler, criterion, device, training, output_path=None, tracker=None)
¶
Encapsulates the core training, validation, and scheduling logic.
Manages the complete training lifecycle including epoch iteration, metric tracking, automated checkpointing based on validation performance, and early stopping with patience-based criteria. Integrates modern training techniques (AMP, Mixup, gradient clipping) and ensures deterministic model restoration to best-performing weights.
The trainer follows a structured execution flow:
- Training Phase: Forward/backward passes with optional Mixup augmentation
- Validation Phase: Performance evaluation on held-out data
- Scheduling Phase: Learning rate updates (ReduceLROnPlateau or step-based)
- Checkpointing: Save model when monitor_metric improves
- Early Stopping: Halt training if no improvement for
patienceepochs
Attributes:
| Name | Type | Description |
|---|---|---|
model |
Neural network architecture to train. |
|
train_loader |
Training data provider. |
|
val_loader |
Validation data provider. |
|
optimizer |
Gradient descent optimizer. |
|
scheduler |
Learning rate scheduler. |
|
criterion |
Loss function (e.g., CrossEntropyLoss). |
|
device |
Hardware target (CUDA/MPS/CPU). |
|
training |
Training hyperparameters sub-config. |
|
epochs |
Total number of training epochs. |
|
patience |
Early stopping patience (epochs without improvement). |
|
best_acc |
Best validation accuracy achieved. |
|
best_metric |
Best value of the monitored metric. |
|
epochs_no_improve |
Consecutive epochs without monitored metric improvement. |
|
scaler |
AMP scaler ( |
|
mixup_fn |
Mixup augmentation function (partial of |
|
best_path |
Filesystem path for best model checkpoint. |
|
train_losses |
list[float]
|
Training loss history per epoch. |
val_metrics_history |
list[Mapping[str, float]]
|
Validation metrics history per epoch. |
monitor_metric |
Name of metric driving checkpointing. |
|
_loop |
Shared epoch kernel handling train → validate → schedule. |
Example
from orchard.trainer import ModelTrainer trainer = ModelTrainer( ... model=model, ... train_loader=train_loader, ... val_loader=val_loader, ... optimizer=optimizer, ... scheduler=scheduler, ... criterion=criterion, ... device=device, ... training=cfg.training, ... output_path=paths.checkpoints / "best_model.pth" ... ) checkpoint_path, losses, metrics = trainer.train()
Model automatically restored to best weights¶
Initializes the ModelTrainer with all required training components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
Neural network architecture to train. |
required |
train_loader
|
DataLoader[Any]
|
DataLoader for training dataset. |
required |
val_loader
|
DataLoader[Any]
|
DataLoader for validation dataset. |
required |
optimizer
|
Optimizer
|
Gradient descent optimizer (e.g., SGD, AdamW). |
required |
scheduler
|
LRScheduler
|
Learning rate scheduler for training dynamics. |
required |
criterion
|
Module
|
Loss function for optimisation (e.g., CrossEntropyLoss). |
required |
device
|
device
|
Compute device for training. |
required |
training
|
TrainingConfig
|
Training hyperparameters sub-config. |
required |
output_path
|
Path | None
|
Path for best model checkpoint (default: |
None
|
tracker
|
TrackerProtocol | None
|
Optional experiment tracker for MLflow metric logging. |
None
|
Source code in orchard/trainer/trainer.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | |
train()
¶
Executes the main training loop with checkpointing and early stopping.
Performs iterative training across configured epochs, executing:
- Forward/backward passes with optional Mixup augmentation
- Validation metric computation (loss, accuracy, AUC)
- Learning rate scheduling (plateau-aware or step-based)
- Automated checkpointing on monitor_metric improvement
- Early stopping with patience-based criteria
Returns:
| Type | Description |
|---|---|
tuple[Path, list[float], list[Mapping[str, float]]]
|
tuple containing: |
- Path: Filesystem path to best model checkpoint
- list[float]: Training loss history per epoch
- list[dict]: Validation metrics history (loss, accuracy, AUC per epoch)
Notes:
- Model weights are automatically restored to best checkpoint after training
- Mixup augmentation is disabled after mixup_epochs
- Early stopping triggers if no monitor_metric improvement for
patienceepochs
Source code in orchard/trainer/trainer.py
load_best_weights()
¶
Load the best checkpoint from disk into the model (device-aware).
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the state-dict is incompatible with the model. |
OrchardExportError
|
If the checkpoint file does not exist. |
Source code in orchard/trainer/trainer.py
create_amp_scaler(training, device='cuda')
¶
Create AMP GradScaler if mixed precision is enabled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
training
|
TrainingConfig
|
Training sub-config (reads |
required |
device
|
str
|
Target device string ( |
'cuda'
|
Returns:
| Type | Description |
|---|---|
GradScaler | None
|
GradScaler instance when AMP is enabled, None otherwise. |
Source code in orchard/trainer/_loop.py
create_mixup_fn(training)
¶
Create a seeded MixUp partial function if alpha > 0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
training
|
TrainingConfig
|
Training sub-config (reads |
required |
Returns:
| Type | Description |
|---|---|
Callable[..., Any] | None
|
Partial of |
Callable[..., Any] | None
|
or None when MixUp is disabled. |
Source code in orchard/trainer/_loop.py
compute_auc(y_true, y_score)
¶
Compute macro-averaged ROC-AUC with graceful fallback.
Handles binary (positive class probability) and multiclass (OvR)
cases. Returns NaN on failure so callers can distinguish
"computation impossible" from "genuinely zero AUC".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
NDArray[Any]
|
Ground truth class indices, shape |
required |
y_score
|
NDArray[Any]
|
Probability distributions, shape |
required |
Returns:
| Type | Description |
|---|---|
float
|
ROC-AUC score, or |
Source code in orchard/trainer/engine.py
mixup_data(x, y, alpha=1.0, rng=None)
¶
Applies MixUp augmentation by blending two random samples.
MixUp generates convex combinations of training pairs to improve generalization and calibration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input data batch (images) |
required |
y
|
Tensor
|
Target labels batch |
required |
alpha
|
float
|
Beta distribution parameter (0 disables MixUp) |
1.0
|
rng
|
Generator | None
|
NumPy random generator for reproducibility (seeded from config) |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Tensor, Tensor, Tensor, float]
|
4-tuple of (mixed_x, y_a, y_b, lam). |
Source code in orchard/trainer/engine.py
train_one_epoch(model, loader, criterion, optimizer, device, mixup_fn=None, scaler=None, grad_clip=0.0, epoch=0, total_epochs=1, use_tqdm=True)
¶
Performs a single full pass over the training dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
Neural network architecture to train |
required |
loader
|
DataLoader[Any]
|
Training data provider |
required |
criterion
|
Module
|
Loss function |
required |
optimizer
|
Optimizer
|
Gradient descent optimizer |
required |
device
|
device
|
Hardware target (CUDA/MPS/CPU) |
required |
mixup_fn
|
Callable[..., Any] | None
|
Function to apply MixUp data blending (optional) |
None
|
scaler
|
GradScaler | None
|
PyTorch GradScaler for mixed precision training (optional) |
None
|
grad_clip
|
float | None
|
Max norm for gradient clipping (0 disables) |
0.0
|
epoch
|
int
|
Current epoch index for progress bar |
0
|
total_epochs
|
int
|
Total number of epochs (for progress bar) |
1
|
use_tqdm
|
bool
|
Show progress bar during training |
True
|
Returns:
| Type | Description |
|---|---|
float
|
Average training loss for the epoch |
Source code in orchard/trainer/engine.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
validate_epoch(model, val_loader, criterion, device)
¶
Evaluates model performance on held-out validation set.
Computes validation loss, accuracy, and ROC-AUC score under no_grad context. AUC calculated using One-vs-Rest (OvR) strategy with macro-averaging for robust performance estimation on potentially imbalanced datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
Neural network model to evaluate |
required |
val_loader
|
DataLoader[Any]
|
Validation data provider |
required |
criterion
|
Module
|
Loss function (e.g., CrossEntropyLoss) |
required |
device
|
device
|
Hardware target (CUDA/MPS/CPU) |
required |
Returns:
| Type | Description |
|---|---|
Mapping[str, float]
|
Validation metrics dict with keys: |
Mapping[str, float]
|
|
Mapping[str, float]
|
|
Mapping[str, float]
|
|
Mapping[str, float]
|
|
Source code in orchard/trainer/engine.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | |
compute_class_weights(labels, num_classes, device)
¶
Compute balanced class weights (sklearn formula: N / (n_classes * count_c)).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
NDArray[Any]
|
Training set labels (1D array). |
required |
num_classes
|
int
|
Total number of classes. |
required |
device
|
device
|
Target device for the weight tensor. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
1D tensor of per-class weights, shape |
Source code in orchard/trainer/setup.py
get_criterion(training, class_weights=None)
¶
Universal Vision Criterion Factory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
training
|
TrainingConfig
|
Training sub-config with criterion parameters. |
required |
class_weights
|
Tensor | None
|
Optional per-class weights for imbalanced datasets. |
None
|
Returns:
| Type | Description |
|---|---|
Module
|
Loss module (CrossEntropyLoss or FocalLoss). |
Raises:
| Type | Description |
|---|---|
OrchardConfigError
|
If |
Source code in orchard/trainer/setup.py
get_optimizer(model, training)
¶
Factory function to instantiate optimizer from config.
Dispatches on training.optimizer_type:
- sgd — SGD with momentum, suited for convolutional architectures.
- adamw — AdamW with decoupled weight decay, suited for transformers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
Network whose parameters will be optimised. |
required |
training
|
TrainingConfig
|
Training sub-config with optimizer hyper-parameters. |
required |
Returns:
| Type | Description |
|---|---|
Optimizer
|
Configured optimizer instance. |
Raises:
| Type | Description |
|---|---|
OrchardConfigError
|
If |
Source code in orchard/trainer/setup.py
get_scheduler(optimizer, training)
¶
Advanced Scheduler Factory.
Supports multiple LR decay strategies based on TrainingConfig:
- cosine — Smooth decay following a cosine curve.
- plateau — Reduces LR when
monitor_metricstops improving (mode="max"). - step — Periodic reduction by a fixed factor.
- none — Maintains a constant learning rate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
optimizer
|
Optimizer
|
Optimizer whose learning rate will be scheduled. |
required |
training
|
TrainingConfig
|
Training sub-config with scheduler hyper-parameters. |
required |
Returns:
| Type | Description |
|---|---|
CosineAnnealingLR | ReduceLROnPlateau | StepLR | LambdaLR
|
Configured learning rate scheduler instance. |
Raises:
| Type | Description |
|---|---|
OrchardConfigError
|
If |