evaluator
orchard.evaluation.evaluator
¶
Evaluation Engine Module.
Runs batch-level inference on a labelled test set and consolidates
predictions into global classification metrics (accuracy, macro F1,
macro AUC). Supports optional Test-Time Augmentation via the tta
sub-module, applying domain-aware transforms (anatomical, texture)
and averaging softmax outputs across the ensemble.
Key Functions:
evaluate_model: Full-dataset evaluation with optional TTA, returning predictions, labels, metric dict, and macro F1.
Example
preds, labels, metrics, f1 = evaluate_model( ... model, test_loader, device, use_tta=True, cfg=cfg ... ) print(f"Test AUC: {metrics['auc']:.4f}")
evaluate_model(model, test_loader, device, use_tta=False, is_anatomical=False, is_texture_based=False, aug_cfg=None, resolution=28)
¶
Performs full-set evaluation and coordinates metric calculation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Module
|
The trained neural network. |
required |
test_loader
|
DataLoader[Any]
|
DataLoader for the evaluation set. |
required |
device
|
device
|
Hardware target (CPU/CUDA/MPS). |
required |
use_tta
|
bool
|
Flag to enable Test-Time Augmentation. |
False
|
is_anatomical
|
bool
|
Dataset-specific orientation constraint. |
False
|
is_texture_based
|
bool
|
Dataset-specific texture preservation flag. |
False
|
aug_cfg
|
AugmentationConfig | None
|
Augmentation sub-configuration (required for TTA). |
None
|
resolution
|
int
|
Dataset resolution for TTA intensity scaling. |
28
|
Returns:
| Type | Description |
|---|---|
tuple[NDArray[Any], NDArray[Any], dict[str, float], float]
|
tuple[np.ndarray, np.ndarray, dict, float]: A 4-tuple of:
|