Testing & Quality Assurance

Installation

Install dev dependencies (includes all quality and testing tools):

pip install -e ".[dev]"

Environment Verification

Smoke Test (1-epoch sanity check):

# Default: BloodMNIST 28×28
python -m tests.smoke_test

# Custom dataset
python -m tests.smoke_test --dataset pathmnist

Output: Validates full pipeline in <30 seconds: - Dataset loading and preprocessing - Model instantiation and weight transfer - Training loop execution - Evaluation metrics computation - Excel/PNG artifact generation

Health Check (dataset integrity):

python -m tests.health_check --dataset organcmnist --resolution 224

Output: Verifies: - MD5 checksum matching - NPZ key structure (train_images, train_labels, val_images, etc.) - Sample count validation

Code Quality Checks

Orchard ML includes automated quality check scripts that run all code quality tools in sequence.

Quick Check (Recommended)

Fast quality checks for everyday development (~30-60 seconds):

# Run all standard quality checks
bash scripts/check_quality.sh

What it checks: - Black: Code formatting compliance (PEP 8 style, 100 chars) - Ruff: Linting + import sorting (replaces Flake8 and isort) - Bandit: Security vulnerability scanning - Radon: Cyclomatic complexity & maintainability index - Pytest: Full test suite with coverage report

Extended Check (Thorough)

Comprehensive checks with type checking (~60-120 seconds):

# Run extended quality checks with MyPy
bash scripts/check_quality_full.sh

Additional checks: - MyPy: Static type checking - Radon: Extended metrics (raw metrics, detailed analysis) - Pytest: HTML coverage report

Tool Descriptions

Formatting Tools

Black: Opinionated code formatter (line length: 100)
```
black orchard/ tests/  # Auto-fix
```

Linting & Import Sorting

Ruff: Fast linter and import sorter (replaces Flake8 + isort)
Checks: unused variables, imports, style violations, argument usage
Line length: 100 (E501 ignored, handled by Black)

Rules: E, F, W, I, ARG

ruff check orchard/ tests/       # Check
ruff check --fix orchard/ tests/  # Auto-fix

Security Tools

Bandit: Detects common security issues
Checks: hardcoded passwords, SQL injection, insecure temp files
Severity: Low, Medium, and High (-l)
```
bandit -r orchard/ -l -q
```

Complexity Analysis

Radon: Code metrics analyzer
Cyclomatic Complexity (CC): Measures code complexity (max: B = 6-10)
Maintainability Index (MI): Measures maintainability (min: B = 20-100)

Grades: A (best), B, C, D, E, F (worst)

radon cc orchard/ -n B --total-average  # Complexity
radon mi orchard/ -n B                   # Maintainability

Type Checking

MyPy: Static type checker for Python

Verifies type hints and catches type errors at compile time

mypy orchard/ --ignore-missing-imports --no-strict-optional

Individual Tool Usage

# Code formatting check
black --check --diff orchard/ tests/

# Linting + import sorting
ruff check orchard/ tests/

# Security scanning
bandit -r orchard/ -l -q

# Complexity analysis
radon cc orchard/ -n B --total-average
radon mi orchard/ -n B

# Type checking
mypy orchard/ --ignore-missing-imports

# Tests with coverage (fails if < 100%)
pytest --cov=orchard --cov-report=term-missing --cov-fail-under=100 -v tests/

Mutation Testing

Mutation testing documentation has been moved to a dedicated guide. See Mutation Testing (MUTANTS.md) for configuration, running mutmut, the mutation registry, pragma conventions, and more.

Testing & Quality Assurance

Test Suite

Orchard ML includes a comprehensive test suite targeting →100% code coverage:

Unit Tests Integration Tests

# Run full test suite
pytest tests/ -v

# Run with coverage report
pytest tests/ --cov=orchard --cov-report=html

# Run specific test categories
pytest tests/ -m unit          # Unit tests only
pytest tests/ -m integration   # Integration tests only

# Run parallel tests (faster)
pytest tests/ -n auto

Test Categories

Unit Tests: Config validation, metadata injection, type safety
Integration Tests: End-to-end pipeline validation, YAML hydration
Smoke Tests: 1-epoch sanity checks (~30 seconds)
Health Checks: Dataset integrity

Continuous Integration

GitHub Actions automatically run on every push:

✅ Code Quality: Black, Ruff, mypy formatting, linting, and type checks
✅ Multi-Python Testing: Unit tests across Python 3.10–3.14
✅ Smoke Test: 1-epoch end-to-end validation (~30s, CPU-only)
✅ Documentation: README.md presence verification
✅ Security Scanning: Bandit (code analysis) and pip-audit (dependency vulnerabilities)
✅ Code Coverage: Automated reporting to Codecov (99%+ coverage)
✅ SonarCloud: Continuous code quality inspection (reliability, security, maintainability)

SonarCloud Metrics:

All badges above are dynamic and updated automatically by SonarCloud on every push to main.

Pipeline Status:

Job	Description	Status
Code Quality	Black, Ruff, mypy	✅ Required to pass
Pytest Suite	5 Python versions	✅ Required to pass
Smoke Test	1-epoch E2E validation	✅ Required to pass
Documentation	README verification	✅ Required to pass
Security Scan	Bandit + pip-audit	✅ Required to pass (Bandit hard-fail, pip-audit advisory)
Build Status	Aggregate summary	✅ Fails if lint, pytest, smoke test, or security fails

View the latest build:

Note: Health checks are not run in CI to avoid excessive dataset downloads. Run locally with python -m tests.health_check for dataset integrity validation.

Note: Python 3.14 (dev) is tested for core functionality only. ONNX export requires onnxruntime>=1.24.1 which provides Python 3.14 wheels.