<- Back to Home | Back to Testing
Mutation Testing
Orchard ML uses mutmut v3 for mutation testing. Mutmut injects small code changes (mutants) and verifies that the test suite catches each one. Survived mutants indicate gaps in test assertions.
Configuration
Mutation testing is configured in pyproject.toml:
Log and cosmetic mutations are suppressed automatically by the patched
entry point scripts/mutmut_entry.py — no per-line # pragma: no mutate
annotations are needed for logging calls. See Patched Entry Point
below for details.
Running Mutation Tests
Full repository (slow — hours on first run):
# Generate mutants and run tests against each one
mutmut run
# View results summary
mutmut results
# Inspect a specific survived mutant
mutmut show <mutant_name>
Single module (recommended for iterative work):
mutmut v3 uses dotted-module glob patterns as positional arguments:
# Mutate only the search_spaces module
mutmut run "orchard.optimization.search_spaces*"
# Mutate only the loader module
mutmut run "orchard.data_handler.loader*"
# Mutate only the evaluation pipeline
mutmut run "orchard.evaluation.evaluation_pipeline*"
Multiple modules in one run:
Single class or function:
mutmut run "orchard.optimization.search_spaces.*SearchSpaceRegistry*"
mutmut run "*get_optimization_space*"
Mutation Registry
The mutation registry (mutmut-registry.yaml) tracks per-file mutation scores
and auto-updates when you test a module. Use scripts/mutmut_run.py:
# Run mutmut on a single file and update the registry
python scripts/mutmut_run.py orchard/cli_app.py
# Run mutmut on an entire sub-package
python scripts/mutmut_run.py orchard/core/config/
# Multiple targets at once
python scripts/mutmut_run.py orchard/cli_app.py orchard/exceptions.py
# Show the registry report (no mutmut run, just read existing results)
python scripts/mutmut_run.py --report
# Show report for specific modules
python scripts/mutmut_run.py --report orchard/core/config/
# Batch: run each .py file one by one (cleans cache, updates registry after each)
python scripts/mutmut_run.py --batch orchard/trainer/
# Batch the whole project
python scripts/mutmut_run.py --batch orchard/
Output example:
Module Total Kill Surv N/C Score
---------------------------------------------------------------------------------------
orchard/cli_app.py 45 42 3 0 93.3%
orchard/exceptions.py 8 8 0 0 100.0%
---------------------------------------------------------------------------------------
TOTAL 53 50 3 0 94.3%
The registry YAML is tracked in git so you can see score evolution across commits.
Registry guards (scripts/check_mutmut_registry.py):
# Fail if any module score dropped vs HEAD (pre-commit gate)
python scripts/check_mutmut_registry.py --ratchet
# Fail if any modified module has a stale registry entry (release gate)
python scripts/check_mutmut_registry.py --freshness
# Both
python scripts/check_mutmut_registry.py --ratchet --freshness
Cleaning Cache
mutmut v3 caches trampoline files and metadata in the mutants/ directory.
It skips re-generation when the trampoline is newer than the source file.
To force a fresh run, delete both the trampoline and its metadata:
# Clean cache for a specific module
rm mutants/orchard/optimization/search_spaces.py \
mutants/orchard/optimization/search_spaces.py.meta
# Alternative: touch the source file to invalidate the cache
touch orchard/optimization/search_spaces.py
# Clean all cached results
rm -rf mutants/
Filtering Results
# Show all results (killed + survived)
mutmut results --all true
# Show only survived mutants (the ones to fix)
mutmut results
# Inspect a specific survived mutant
mutmut show <mutant_name>
Writing Mutation-Resilient Tests
Tests that only check key presence (assert "key" in space) will let many
mutants survive. To kill mutants effectively:
- Assert exact values passed to functions (bounds, lists, constants)
- Assert exact return values, not just types
- Test boundary conditions (e.g., resolution 223 vs 224)
- Test both branches of conditionals (enabled/disabled, present/absent)
- Verify side effects (function called vs not called)
Patched Entry Point
scripts/mutmut_entry.py monkey-patches mutmut's MutationVisitor to
suppress cosmetic mutations without per-line annotations. It is invoked
automatically by scripts/mutmut_run.py.
Two suppression levels:
| Level | Methods | Effect |
|---|---|---|
| Full skip | debug, info, add_format |
Entire Call node excluded — call, arguments, and strings |
| String-only skip | warning, error, warn, getLogger |
Only string literals inside the call are excluded; the call itself and non-string args remain mutable |
This eliminates the need for # pragma: no mutate on logging lines.
Pragma Conventions
| Annotation | Scope | Usage |
|---|---|---|
# pragma: no mutate |
Single line | Plot formatting constants, cosmetic-only literals |
# pragma: no cover |
Single line | Unreachable defensive code |
Logging calls (info, debug, warning, error, warn) are handled
automatically by the patched entry point — do not annotate them manually.
Never apply # pragma: no mutate to:
- Conditionals, computed values, or any real logic
- Entire files (
do_not_mutateis forbidden)
Known Issue: `set_start_method` Crash
mutmut 3.5 calls multiprocessing.set_start_method('fork') at module level
in mutmut/__main__.py (line 1152). When the trampoline re-imports this
module during stats collection, the call fails with:
Fix: patch your local mutmut installation:
# In your mutmut venv
sed -i "s/set_start_method('fork')/set_start_method('fork', force=True)/" \
venv/lib/python3.12/site-packages/mutmut/__main__.py
This is safe — force=True simply allows resetting the already-set context.
The bug is masked when stats are cached; it surfaces whenever mutmut needs
to re-collect stats (new tests added, cache cleaned).
conftest Helper
When tests use patch.dict(os.environ, ..., clear=True), mutmut v3
trampolines break because MUTANT_UNDER_TEST is wiped. Use the
mutmut_safe_env() helper from tests/conftest.py:
from tests.conftest import mutmut_safe_env
def test_something():
with patch.dict(os.environ, mutmut_safe_env(MY_VAR="1"), clear=True):
...