reproducibility

`orchard.core.environment.reproducibility` ¶

Reproducibility Environment.

Ensures deterministic behavior across Python, NumPy, and PyTorch by centralizing RNG seeding, DataLoader worker initialization, and strict algorithmic determinism enforcement.

Three reproducibility levels are supported:

Standard (strict=False): Seeds all PRNGs and disables cuDNN auto-tuner. Sufficient for most experiments — results are reproducible across runs on the same hardware, but non-deterministic kernels (e.g. atomicAdd in cuBLAS) may cause minor floating-point variations.
Strict (strict=True): Enables torch.use_deterministic_algorithms(True) on all backends (CUDA, MPS, CPU) and configures CUBLAS_WORKSPACE_CONFIG when CUDA is available. Forces num_workers=0 via HardwareConfig to eliminate multiprocessing non-determinism. Incurs a 5-30% performance penalty on GPU workloads.
Strict warn-only (strict=True, warn_only=True): Same as strict, but non-deterministic operations emit warnings instead of raising errors. Useful for discovering which operations lack deterministic kernels without crashing the experiment.

Strict mode is controlled by HardwareConfig.use_deterministic_algorithms, resolved from the recipe YAML or direct Config construction.

`set_seed(seed, strict=False, warn_only=False)` ¶

Seed all PRNGs and optionally enforce deterministic algorithms.

Seeds Python's random, NumPy, and PyTorch (CPU + CUDA + MPS). In strict mode, additionally forces deterministic kernels at the cost of reduced performance.

Note

PYTHONHASHSEED is set here for completeness, but CPython reads it only at interpreter startup — the runtime assignment has no effect on the running process. The project Dockerfile handles this correctly (ENV PYTHONHASHSEED=0). For bare-metal runs, prefix the command: PYTHONHASHSEED=42 orchard run <recipe>. Full bit-exact determinism additionally requires strict=True and num_workers=0 (both enforced automatically in Docker via DOCKER_REPRODUCIBILITY_MODE).

Parameters:

Name	Type	Description	Default
`seed`	`int`	The seed value to set across all PRNGs.	required
`strict`	`bool`	If True, enforces deterministic algorithms (5-30% perf penalty).	`False`
`warn_only`	`bool`	If True (and strict=True), uses warn-only mode for `torch.use_deterministic_algorithms` — logs warnings instead of raising errors for non-deterministic ops. Ignored when strict is False.	`False`

Source code in orchard/core/environment/reproducibility.py

def set_seed(seed: int, strict: bool = False, warn_only: bool = False) -> None:  # pragma: no mutate
    """
    Seed all PRNGs and optionally enforce deterministic algorithms.

    Seeds Python's ``random``, NumPy, and PyTorch (CPU + CUDA + MPS).
    In strict mode, additionally forces deterministic kernels at the
    cost of reduced performance.

    Note:
        ``PYTHONHASHSEED`` is set here for completeness, but CPython reads it
        only at interpreter startup — the runtime assignment has no effect on
        the running process. The project Dockerfile handles this correctly
        (``ENV PYTHONHASHSEED=0``). For bare-metal runs, prefix the command:
        ``PYTHONHASHSEED=42 orchard run <recipe>``. Full bit-exact determinism
        additionally requires ``strict=True`` and ``num_workers=0`` (both
        enforced automatically in Docker via ``DOCKER_REPRODUCIBILITY_MODE``).

    Args:
        seed: The seed value to set across all PRNGs.
        strict: If True, enforces deterministic algorithms (5-30% perf penalty).
        warn_only: If True (and strict=True), uses warn-only mode for
            ``torch.use_deterministic_algorithms`` — logs warnings instead of
            raising errors for non-deterministic ops. Ignored when strict
            is False.
    """
    random.seed(seed)

    # Best-effort: effective only if set before interpreter startup (see Note)
    already_set = os.environ.get("PYTHONHASHSEED") == str(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    if strict and not already_set:
        _stacklevel = 2  # pragma: no mutate
        warnings.warn(
            f"PYTHONHASHSEED={seed} set at runtime, but CPython reads it only at "
            "interpreter startup. For bare-metal determinism: "
            f"PYTHONHASHSEED={seed} orchard run <recipe>",
            stacklevel=_stacklevel,
        )

    np.random.seed(seed)
    torch.manual_seed(seed)

    has_cuda = torch.cuda.is_available()
    has_mps = hasattr(torch.backends, "mps") and torch.backends.mps.is_available()

    if has_cuda:
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

        if strict:
            os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"

    if has_mps:
        torch.mps.manual_seed(seed)

    if strict:
        if has_mps:
            _stacklevel = 2  # pragma: no mutate
            warnings.warn(
                "MPS backend has partial determinism support in PyTorch. "
                "Some operations may not have deterministic implementations. "
                "Consider using CPU for fully deterministic experiments.",
                stacklevel=_stacklevel,
            )
        torch.use_deterministic_algorithms(True, warn_only=warn_only)

`worker_init_fn(worker_id)` ¶

Initialize PRNGs for a DataLoader worker subprocess.

Each worker receives a unique but deterministic sub-seed derived from the parent seed, ensuring augmentation diversity while maintaining reproducibility across runs.

Called automatically by DataLoader when num_workers > 0. In strict reproducibility mode, num_workers is forced to 0 by HardwareConfig, so this function is never invoked.

Parameters:

Name	Type	Description	Default
`worker_id`	`int`	Subprocess ID provided by DataLoader (0-based).	required

Source code in orchard/core/environment/reproducibility.py

def worker_init_fn(worker_id: int) -> None:
    """
    Initialize PRNGs for a DataLoader worker subprocess.

    Each worker receives a unique but deterministic sub-seed derived from
    the parent seed, ensuring augmentation diversity while maintaining
    reproducibility across runs.

    Called automatically by DataLoader when ``num_workers > 0``.
    In strict reproducibility mode, ``num_workers`` is forced to 0 by
    HardwareConfig, so this function is never invoked.

    Args:
        worker_id: Subprocess ID provided by DataLoader (0-based).
    """
    worker_info = torch.utils.data.get_worker_info()
    if worker_info is None:
        return

    # Derive unique sub-seed: deterministic per (parent_seed, worker_id)
    base_seed = worker_info.seed
    seed = (base_seed + worker_id) % 2**32

    # Synchronize all major PRNGs for this worker
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

reproducibility

orchard.core.environment.reproducibility ¶

set_seed(seed, strict=False, warn_only=False) ¶

worker_init_fn(worker_id) ¶

`orchard.core.environment.reproducibility` ¶

`set_seed(seed, strict=False, warn_only=False)` ¶

`worker_init_fn(worker_id)` ¶