← Back to Home | Back to Testing
Mutation Testing
Orchard ML uses mutmut v3 for mutation testing. Mutmut injects small code changes (mutants) and verifies that the test suite catches each one. Survived mutants indicate gaps in test assertions.
Configuration
Mutation testing is configured in pyproject.toml:
Log and cosmetic mutations are suppressed automatically by the patched
entry point scripts/mutmut_entry.py — no per-line # pragma: no mutate
annotations are needed for logging calls. See Patched Entry Point
below for details.
Running Mutation Tests
[!WARNING] Prerequisites
- Always use
.venv/bin/python— never system python.- All tests must pass before running mutmut. A single test failure causes ALL mutants to be marked
not_checked, and batch mode sees "incomplete results" and skips/restores backup.
Full repository (slow — hours on first run):
# Generate mutants and run tests against each one
.venv/bin/python scripts/mutmut_entry.py run
# View results summary
.venv/bin/python scripts/mutmut_entry.py results
# Inspect a specific survived mutant
.venv/bin/python scripts/mutmut_entry.py show <mutant_name>
Single module (recommended for iterative work):
mutmut v3 uses dotted-module glob patterns as positional arguments:
# Mutate only the search_spaces module
.venv/bin/python scripts/mutmut_entry.py run "orchard.optimization.search_spaces*"
# Mutate only the loader module
.venv/bin/python scripts/mutmut_entry.py run "orchard.data_handler.loader*"
# Mutate only the evaluation pipeline
.venv/bin/python scripts/mutmut_entry.py run "orchard.evaluation.evaluation_pipeline*"
Multiple modules in one run:
Single class or function:
.venv/bin/python scripts/mutmut_entry.py run "orchard.optimization.search_spaces.*SearchSpaceRegistry*"
.venv/bin/python scripts/mutmut_entry.py run "*get_optimization_space*"
[!NOTE] Always use
scripts/mutmut_entry.pyinstead of baremutmut— the patched entry point suppresses cosmetic mutations on logging calls automatically.scripts/mutmut_run.pyinvokes it internally.
Mutation Registry
The mutation registry (mutmut-registry.yaml) tracks per-file mutation scores
and auto-updates when you test a module. Use scripts/mutmut_run.py:
# Run mutmut on a single file and update the registry
.venv/bin/python scripts/mutmut_run.py orchard/cli_app.py
# Run mutmut on an entire sub-package
.venv/bin/python scripts/mutmut_run.py orchard/core/config/
# Multiple targets at once
.venv/bin/python scripts/mutmut_run.py orchard/cli_app.py orchard/exceptions.py
# Show the registry report (no mutmut run, just read existing results)
.venv/bin/python scripts/mutmut_run.py --report
# Show report for specific modules
.venv/bin/python scripts/mutmut_run.py --report orchard/core/config/
# Batch: run each .py file one by one (cleans cache, updates registry after each)
.venv/bin/python scripts/mutmut_run.py --batch orchard/trainer/
# Batch the whole project
.venv/bin/python scripts/mutmut_run.py --batch orchard/
Output example:
Module Total Kill Surv N/C Score
---------------------------------------------------------------------------------------
orchard/architectures/factory.py 80 80 0 0 100.0%
orchard/cli_app.py 507 477 30 0 94.1%
orchard/core/environment/hardware.py 133 129 4 0 97.0%
---------------------------------------------------------------------------------------
TOTAL 720 686 34 0 95.3%
The registry YAML is tracked in git so you can see score evolution across commits.
Registry guards (scripts/check_mutmut_registry.py):
# Fail if any module score dropped vs HEAD (pre-commit gate)
.venv/bin/python scripts/check_mutmut_registry.py --ratchet
# Fail if any modified module has a stale registry entry (release gate)
.venv/bin/python scripts/check_mutmut_registry.py --freshness
# Both
.venv/bin/python scripts/check_mutmut_registry.py --ratchet --freshness
Cleaning Cache
mutmut v3 caches trampoline files and metadata in the mutants/ directory.
Always clean the entire mutants/ directory before reruns — deleting
individual files is error-prone and can leave stale state:
[!WARNING] Uncommitted files and the registry
--batchmode uses_is_freshwhich compares the registrylast_runtimestamp againstgit log -1 --format=%aI. Uncommitted changes don't updategit log, so old registry entries look "newer" and the file gets skipped silently.Before running mutmut on uncommitted files, remove their registry entries and the cache:
rm -rf mutants/ .venv/bin/python -c " import yaml; from pathlib import Path reg_path = Path('mutmut-registry.yaml') reg = yaml.safe_load(reg_path.read_text()) or {} for k in ['orchard/path/to/changed_file.py']: reg.pop(k, None) reg = dict(sorted(reg.items())) reg_path.write_text(yaml.dump(reg, default_flow_style=False, sort_keys=False)) "
Gotchas
[!CAUTION] Never use
--batchon__init__.pyfiles
_to_mutmut_globstrips.__init__and appends*, soorchard/__init__.pybecomes globorchard*— which matches the entire codebase. Use--reportinstead for__init__.pyand pure-declaration files (constants, re-exports) with no mutable logic:[!NOTE] Batch timeout
Batch mode has a 600-second (10 min) timeout per file. If exceeded, previous results are restored from the
.meta.bakbackup.[!NOTE] CI does not run mutmut
Mutation testing is a local quality gate only. CI runs linting, type checking, and pytest — but not mutmut.
Writing Mutation-Resilient Tests
Tests that only check key presence (assert "key" in space) will let many
mutants survive. To kill mutants effectively:
- Assert exact values passed to functions (bounds, lists, constants)
- Assert exact return values, not just types
- Test boundary conditions (e.g., resolution 223 vs 224)
- Test both branches of conditionals (enabled/disabled, present/absent)
- Verify side effects (function called vs not called)
Patched Entry Point
scripts/mutmut_entry.py monkey-patches mutmut's MutationVisitor to
suppress cosmetic mutations without per-line annotations. It is invoked
automatically by scripts/mutmut_run.py.
Two suppression levels:
| Level | Methods | Effect |
|---|---|---|
| Full skip | debug, info, add_format |
Entire Call node excluded — call, arguments, and strings |
| String-only skip | warning, error, warn, getLogger |
Only string literals inside the call are excluded; the call itself and non-string args remain mutable |
This eliminates the need for # pragma: no mutate on logging lines.
Pragma Conventions
| Annotation | Scope | Usage |
|---|---|---|
# pragma: no mutate |
Single line | Plot formatting constants, cosmetic-only literals |
# pragma: no cover |
Single line | Unreachable defensive code |
Logging calls (info, debug, warning, error, warn) are handled
automatically by the patched entry point — do not annotate them manually.
Never apply # pragma: no mutate to:
- Conditionals, computed values, or any real logic
- Entire files (
do_not_mutateis forbidden)
Resolved Issue: `set_start_method` Crash
mutmut 3.5.0 calls multiprocessing.set_start_method('fork') at module level
in mutmut/__main__.py. When the module is re-executed (e.g. via
python -m mutmut run), the call fails with:
Status: fixed upstream in GH-466
(merged into main). The fix guards the call with get_start_method(allow_none=True)
and is included in mutmut > 3.5.0. If you are still on 3.5.0, either install
from git:
or apply the local patch:
sed -i "s/set_start_method('fork')/set_start_method('fork', force=True)/" \
.venv/lib/python3.*/site-packages/mutmut/__main__.py
Resolved Issue: Name Mangling in Trampoline Generation
When a class name starts with an underscore (e.g. _CrossDomainValidator),
mutmut generates trampoline function names like
__CrossDomainValidator_validate_trampoline. Inside the class body, Python's
name mangling
rewrites __CrossDomainValidator_validate_trampoline to
_CrossDomainValidator__CrossDomainValidator_validate_trampoline, causing a
NameError at import time.
Status: fixed upstream in boxed/mutmut#499
(merged 2026-04-16, reported in #498).
The fix uses a _mutmut_ prefix instead of _{class_name}_, which is always
safe regardless of class name. No local patch needed once you install a release
that includes this fix.
If you are still on a build that predates the fix, apply the patch manually:
sed -i 's/prefix = f"_{class_name}_{method_name}"/prefix = f"_mutmut_{class_name}_{method_name}"/' \
.venv/lib/python3.*/site-packages/mutmut/mutation/trampoline_templates.py \
.venv/lib/python3.*/site-packages/mutmut/mutation/file_mutation.py
Pending Issue: env scrubbing wipes `MUTANT_UNDER_TEST`
When tests use patch.dict(os.environ, ..., clear=True), mutmut v3
trampolines break because MUTANT_UNDER_TEST is wiped from the environment.
Mutants reachable only through such tests are falsely reported as survived,
silently lowering the mutation score (measured impact on
orchard/core/environment/hardware.py: 19 false survivors, 85.7 % vs. the
true 97.0 %).
Status: reported upstream as boxed/mutmut#511 with two proposed fixes (sticky cache vs. import-time cache, both verified locally). Awaiting maintainer review.
Workaround: use the mutmut_safe_env() helper from tests/conftest.py,
which re-injects MUTANT_UNDER_TEST into the patched env:
from tests.conftest import mutmut_safe_env
def test_something():
with patch.dict(os.environ, mutmut_safe_env(MY_VAR="1"), clear=True):
...
Once the upstream fix lands, the helper can be removed and every
mutmut_safe_env(...) call replaced with a plain dict literal.