Skip to content

data_io

orchard.core.io.data_io

Data Integrity & Dataset I/O Utilities.

Provides tools for verifying file integrity via checksums and validating the structure of NPZ dataset archives.

validate_npz_keys(data)

Validates that the loaded NPZ dataset contains all required dataset keys.

Parameters:

Name Type Description Default
data NpzFile

The loaded NPZ file object.

required

Raises:

Type Description
OrchardDatasetError

If any required key (images/labels) is missing.

Source code in orchard/core/io/data_io.py
def validate_npz_keys(data: np.lib.npyio.NpzFile) -> None:
    """
    Validates that the loaded NPZ dataset contains all required dataset keys.

    Args:
        data (np.lib.npyio.NpzFile): The loaded NPZ file object.

    Raises:
        OrchardDatasetError: If any required key (images/labels) is missing.
    """
    missing = _REQUIRED_NPZ_KEYS - set(data.files)
    if missing:
        found = list(data.files)
        raise OrchardDatasetError(
            f"NPZ archive is corrupted or invalid. Missing keys: {missing} | Found keys: {found}"
        )

md5_checksum(path, chunk_size=_MD5_CHUNK_SIZE)

Calculates the MD5 checksum of a file using buffered reading.

Parameters:

Name Type Description Default
path Path

Path to the file to verify.

required
chunk_size int

Read buffer size in bytes.

_MD5_CHUNK_SIZE

Returns:

Name Type Description
str str

The calculated hexadecimal MD5 hash.

Source code in orchard/core/io/data_io.py
def md5_checksum(path: Path, chunk_size: int = _MD5_CHUNK_SIZE) -> str:
    """
    Calculates the MD5 checksum of a file using buffered reading.

    Args:
        path (Path): Path to the file to verify.
        chunk_size (int): Read buffer size in bytes.

    Returns:
        str: The calculated hexadecimal MD5 hash.
    """
    hash_md5 = hashlib.md5(usedforsecurity=False)  # pragma: no mutate
    with path.open("rb") as f:
        for chunk in iter(lambda: f.read(chunk_size), b""):  # pragma: no mutate
            hash_md5.update(chunk)
    return hash_md5.hexdigest()