Mutation Testing

Orchard ML uses mutmut v3 for mutation testing. Mutmut injects small code changes (mutants) and verifies that the test suite catches each one. Survived mutants indicate gaps in test assertions.

Configuration

Mutation testing is configured in pyproject.toml:

[tool.mutmut]
paths_to_mutate = ["orchard/"]
tests_dir = ["tests/"]

Log and cosmetic mutations are suppressed automatically by the patched entry point scripts/mutmut_entry.py — no per-line # pragma: no mutate annotations are needed for logging calls. See Patched Entry Point below for details.

Running Mutation Tests

Full repository (slow — hours on first run):

# Generate mutants and run tests against each one
mutmut run

# View results summary
mutmut results

# Inspect a specific survived mutant
mutmut show <mutant_name>

Single module (recommended for iterative work):

mutmut v3 uses dotted-module glob patterns as positional arguments:

# Mutate only the search_spaces module
mutmut run "orchard.optimization.search_spaces*"

# Mutate only the loader module
mutmut run "orchard.data_handler.loader*"

# Mutate only the evaluation pipeline
mutmut run "orchard.evaluation.evaluation_pipeline*"

Multiple modules in one run:

mutmut run "orchard.optimization*" "orchard.trainer*"

Single class or function:

mutmut run "orchard.optimization.search_spaces.*SearchSpaceRegistry*"
mutmut run "*get_optimization_space*"

Mutation Registry

The mutation registry (mutmut-registry.yaml) tracks per-file mutation scores and auto-updates when you test a module. Use scripts/mutmut_run.py:

# Run mutmut on a single file and update the registry
python scripts/mutmut_run.py orchard/cli_app.py

# Run mutmut on an entire sub-package
python scripts/mutmut_run.py orchard/core/config/

# Multiple targets at once
python scripts/mutmut_run.py orchard/cli_app.py orchard/exceptions.py

# Show the registry report (no mutmut run, just read existing results)
python scripts/mutmut_run.py --report

# Show report for specific modules
python scripts/mutmut_run.py --report orchard/core/config/

# Batch: run each .py file one by one (cleans cache, updates registry after each)
python scripts/mutmut_run.py --batch orchard/trainer/

# Batch the whole project
python scripts/mutmut_run.py --batch orchard/

Output example:

Module                                                  Total  Kill  Surv   N/C   Score
---------------------------------------------------------------------------------------
orchard/cli_app.py                                         45    42     3     0   93.3%
orchard/exceptions.py                                       8     8     0     0  100.0%
---------------------------------------------------------------------------------------
TOTAL                                                      53    50     3     0   94.3%

The registry YAML is tracked in git so you can see score evolution across commits.

Registry guards (scripts/check_mutmut_registry.py):

# Fail if any module score dropped vs HEAD (pre-commit gate)
python scripts/check_mutmut_registry.py --ratchet

# Fail if any modified module has a stale registry entry (release gate)
python scripts/check_mutmut_registry.py --freshness

# Both
python scripts/check_mutmut_registry.py --ratchet --freshness

Cleaning Cache

mutmut v3 caches trampoline files and metadata in the mutants/ directory. It skips re-generation when the trampoline is newer than the source file. To force a fresh run, delete both the trampoline and its metadata:

# Clean cache for a specific module
rm mutants/orchard/optimization/search_spaces.py \
   mutants/orchard/optimization/search_spaces.py.meta

# Alternative: touch the source file to invalidate the cache
touch orchard/optimization/search_spaces.py

# Clean all cached results
rm -rf mutants/

Filtering Results

# Show all results (killed + survived)
mutmut results --all true

# Show only survived mutants (the ones to fix)
mutmut results

# Inspect a specific survived mutant
mutmut show <mutant_name>

Writing Mutation-Resilient Tests

Tests that only check key presence (assert "key" in space) will let many mutants survive. To kill mutants effectively:

Assert exact values passed to functions (bounds, lists, constants)
Assert exact return values, not just types
Test boundary conditions (e.g., resolution 223 vs 224)
Test both branches of conditionals (enabled/disabled, present/absent)
Verify side effects (function called vs not called)

Patched Entry Point

scripts/mutmut_entry.py monkey-patches mutmut's MutationVisitor to suppress cosmetic mutations without per-line annotations. It is invoked automatically by scripts/mutmut_run.py.

Two suppression levels:

Level	Methods	Effect
Full skip	`debug`, `info`, `add_format`	Entire `Call` node excluded — call, arguments, and strings
String-only skip	`warning`, `error`, `warn`, `getLogger`	Only string literals inside the call are excluded; the call itself and non-string args remain mutable

This eliminates the need for # pragma: no mutate on logging lines.

Pragma Conventions

Annotation	Scope	Usage
`# pragma: no mutate`	Single line	Plot formatting constants, cosmetic-only literals
`# pragma: no cover`	Single line	Unreachable defensive code

Logging calls (info, debug, warning, error, warn) are handled automatically by the patched entry point — do not annotate them manually.

Never apply # pragma: no mutate to:

Conditionals, computed values, or any real logic
Entire files (do_not_mutate is forbidden)

Known Issue: `set_start_method` Crash

mutmut 3.5 calls multiprocessing.set_start_method('fork') at module level in mutmut/__main__.py (line 1152). When the trampoline re-imports this module during stats collection, the call fails with:

RuntimeError: context has already been set

Fix: patch your local mutmut installation:

# In your mutmut venv
sed -i "s/set_start_method('fork')/set_start_method('fork', force=True)/" \
    venv/lib/python3.12/site-packages/mutmut/__main__.py

This is safe — force=True simply allows resetting the already-set context. The bug is masked when stats are cached; it surfaces whenever mutmut needs to re-collect stats (new tests added, cache cleaned).

conftest Helper

When tests use patch.dict(os.environ, ..., clear=True), mutmut v3 trampolines break because MUTANT_UNDER_TEST is wiped. Use the mutmut_safe_env() helper from tests/conftest.py:

from tests.conftest import mutmut_safe_env

def test_something():
    with patch.dict(os.environ, mutmut_safe_env(MY_VAR="1"), clear=True):
        ...