Skip to content

pennfudan_fetcher

orchard.data_handler.fetchers.pennfudan_fetcher

PennFudan Pedestrian Detection Dataset Fetcher.

Downloads the PennFudan pedestrian dataset ZIP, extracts instance masks, converts masks to bounding boxes, resizes images, and produces two NPZ files (images + annotations) compatible with DetectionDataset.from_npz().

ensure_pennfudan_npz(metadata)

Ensure PennFudan dataset is downloaded and converted to NPZ format.

Parameters:

Name Type Description Default
metadata DatasetMetadata

DatasetMetadata with URL, path, and annotation_path.

required

Returns:

Type Description
Path

Path to the images NPZ file.

Source code in orchard/data_handler/fetchers/pennfudan_fetcher.py
def ensure_pennfudan_npz(metadata: DatasetMetadata) -> Path:
    """
    Ensure PennFudan dataset is downloaded and converted to NPZ format.

    Args:
        metadata: DatasetMetadata with URL, path, and annotation_path.

    Returns:
        Path to the images NPZ file.
    """
    image_path = metadata.path
    annotation_path = metadata.annotation_path

    if annotation_path is None:
        raise OrchardDatasetError(
            "PennFudan metadata must have annotation_path set"  # pragma: no mutate
        )

    # Return cached if both NPZ files exist
    if image_path.exists() and annotation_path.exists():
        logger.info(
            "%s%s %-18s: PennFudan found at %s",
            LogStyle.INDENT,
            LogStyle.ARROW,
            "Dataset",
            image_path.name,
        )
        return image_path

    # Download and convert
    zf = _download_zip(metadata.url)
    images, boxes_list, labels_list = _parse_pennfudan_zip(zf)

    logger.info(
        "%s%s %-18s: %d images, %d total instances",
        LogStyle.INDENT,
        LogStyle.ARROW,
        "Parsed",
        len(images),
        sum(len(b) for b in boxes_list),
    )

    _save_detection_npz(images, boxes_list, labels_list, image_path, annotation_path)

    logger.info(
        "%s%s %-18s: %s + %s",
        LogStyle.INDENT,
        LogStyle.SUCCESS,
        "NPZ Created",
        image_path.name,
        annotation_path.name,
    )

    return image_path