Skip to content

evaluation_adapter

orchard.tasks.detection.evaluation_adapter

Detection Evaluation Pipeline Adapter.

Full evaluation for detection: inference + mAP computation + training loss curves + bbox visualization + structured report.

DetectionEvalPipelineAdapter

Orchestrates detection inference, mAP computation, and reporting.

run_evaluation(model, test_loader, train_losses, val_metrics_history, class_names, paths, training, dataset, augmentation, evaluation, arch_name, aug_info='N/A', tracker=None)

Run detection evaluation pipeline.

Computes mAP metrics on the test set, plots training loss curves, and optionally logs metrics to the experiment tracker.

Parameters:

Name Type Description Default
model Module

Trained detection model (already on target device).

required
test_loader DataLoader[Any]

DataLoader for test set.

required
train_losses list[float]

Training loss history per epoch.

required
val_metrics_history list[Mapping[str, float]]

Validation metrics history per epoch.

required
class_names list[str]

List of class label strings.

required
paths RunPaths

RunPaths for artifact output.

required
training TrainingConfig

Training sub-config.

required
dataset DatasetConfig

Dataset sub-config.

required
augmentation AugmentationConfig

Augmentation sub-config.

required
evaluation EvaluationConfig

Evaluation sub-config.

required
arch_name str

Architecture identifier.

required
aug_info str

Augmentation description string.

'N/A'
tracker TrackerProtocol | None

Optional experiment tracker for final metrics.

None

Returns:

Type Description
Mapping[str, float]

Mapping of detection metric names to float values.

Source code in orchard/tasks/detection/evaluation_adapter.py
def run_evaluation(
    self,
    model: nn.Module,
    test_loader: DataLoader[Any],
    train_losses: list[float],
    val_metrics_history: list[Mapping[str, float]],
    class_names: list[str],
    paths: RunPaths,
    training: TrainingConfig,
    dataset: DatasetConfig,
    augmentation: AugmentationConfig,  # noqa: ARG002
    evaluation: EvaluationConfig,
    arch_name: str,
    aug_info: str = "N/A",  # noqa: ARG002
    tracker: TrackerProtocol | None = None,
) -> Mapping[str, float]:
    """
    Run detection evaluation pipeline.

    Computes mAP metrics on the test set, plots training loss curves,
    and optionally logs metrics to the experiment tracker.

    Args:
        model: Trained detection model (already on target device).
        test_loader: DataLoader for test set.
        train_losses: Training loss history per epoch.
        val_metrics_history: Validation metrics history per epoch.
        class_names: List of class label strings.
        paths: RunPaths for artifact output.
        training: Training sub-config.
        dataset: Dataset sub-config.
        augmentation: Augmentation sub-config.
        evaluation: Evaluation sub-config.
        arch_name: Architecture identifier.
        aug_info: Augmentation description string.
        tracker: Optional experiment tracker for final metrics.

    Returns:
        Mapping of detection metric names to float values.
    """
    device = next(model.parameters()).device

    # Inference + mAP computation
    model.eval()
    metric = MeanAveragePrecision(iou_type="bbox")

    with torch.no_grad():
        for images, targets in test_loader:
            images_on_device = [img.to(device) for img in images]
            predictions = model(images_on_device)
            metric.update(
                [to_cpu(p) for p in predictions],
                [to_cpu(t) for t in targets],
            )

    result = metric.compute()
    test_metrics = {
        METRIC_MAP: float(result["map"]),
        METRIC_MAP_50: float(result["map_50"]),
        METRIC_MAP_75: float(result["map_75"]),
    }

    # Log results
    logger.info(
        "%s%s %-18s: mAP=%.4f  mAP@50=%.4f  mAP@75=%.4f",
        LogStyle.INDENT,
        LogStyle.ARROW,
        "Test Metrics",
        test_metrics[METRIC_MAP],
        test_metrics[METRIC_MAP_50],
        test_metrics[METRIC_MAP_75],
    )

    # Bbox visualization grid
    if evaluation.save_predictions_grid:
        show_detections(
            model=model,
            loader=test_loader,
            device=device,
            classes=class_names,
            save_path=paths.figures / f"detection_samples_{arch_name}_{dataset.resolution}.png",
            ctx=PlotContext(
                arch_name=arch_name,
                resolution=dataset.resolution,
                fig_dpi=evaluation.fig_dpi,
                plot_style=evaluation.plot_style,
                cmap_confusion=evaluation.cmap_confusion,
                grid_cols=evaluation.grid_cols,
                n_samples=evaluation.n_samples,
                fig_size_predictions=evaluation.fig_size_predictions,
                mean=dataset.mean,
                std=dataset.std,
            ),
        )

    # Training curves — plot mAP instead of loss (METRIC_LOSS is a 0.0 sentinel)
    val_map = [m.get(METRIC_MAP, 0.0) for m in val_metrics_history]
    ctx = PlotContext(
        arch_name=arch_name,
        resolution=dataset.resolution,
        fig_dpi=evaluation.fig_dpi,
        plot_style=evaluation.plot_style,
        cmap_confusion=evaluation.cmap_confusion,
        grid_cols=evaluation.grid_cols,
        n_samples=evaluation.n_samples,
        fig_size_predictions=evaluation.fig_size_predictions,
    )
    plot_training_curves(
        train_losses=train_losses,
        val_metric_values=val_map,
        out_path=paths.figures / "training_curves.png",
        ctx=ctx,
        val_label="Validation mAP",
    )

    # Structured report (Excel/CSV/JSON) — args tested in test_reporting.py
    report = create_structured_report(
        val_metrics=val_metrics_history,
        test_metrics=test_metrics,
        train_losses=train_losses,
        best_path=paths.best_model_path,
        log_path=paths.logs / "session.log",
        arch_name=arch_name,
        dataset=dataset,
        training=training,
        task_type="detection",
    )
    report.save(paths.final_report_path, fmt=evaluation.report_format)

    # Tracker logging
    if tracker is not None:
        full_metrics = {METRIC_LOSS: 0.0, **test_metrics}  # sentinel for tracker schema
        tracker.log_test_metrics(full_metrics)

    return MappingProxyType(test_metrics)