Testing & Quality Assurance
Installation
Install dev dependencies (includes all quality and testing tools):
Environment Verification
Smoke Test (1-epoch sanity check):
# Default: BloodMNIST 28×28
python -m tests.smoke_test
# Custom dataset
python -m tests.smoke_test --dataset pathmnist
Output: Validates full pipeline in <30 seconds: - Dataset loading and preprocessing - Model instantiation and weight transfer - Training loop execution - Evaluation metrics computation - Excel/PNG artifact generation
Health Check (dataset integrity):
Output: Verifies:
- MD5 checksum matching
- NPZ key structure (train_images, train_labels, val_images, etc.)
- Sample count validation
Code Quality Checks
Orchard ML includes automated quality check scripts that run all code quality tools in sequence.
Quick Check (Recommended)
Fast quality checks for everyday development (~30-60 seconds):
What it checks: - Black: Code formatting compliance (PEP 8 style, 100 chars) - Ruff: Linting + import sorting (replaces Flake8 and isort) - Bandit: Security vulnerability scanning - Radon: Cyclomatic complexity & maintainability index - Pytest: Full test suite with coverage report
Extended Check (Thorough)
Comprehensive checks with type checking (~60-120 seconds):
Additional checks: - MyPy: Static type checking - Radon: Extended metrics (raw metrics, detailed analysis) - Pytest: HTML coverage report
Tool Descriptions
Formatting Tools
- Black: Opinionated code formatter (line length: 100)
Linting & Import Sorting
- Ruff: Fast linter and import sorter (replaces Flake8 + isort)
- Checks: unused variables, imports, style violations, argument usage
- Line length: 100 (E501 ignored, handled by Black)
- Rules: E, F, W, I, ARG
Security Tools
- Bandit: Detects common security issues
- Checks: hardcoded passwords, SQL injection, insecure temp files
- Severity: Low, Medium, and High (
-l)
Complexity Analysis
- Radon: Code metrics analyzer
- Cyclomatic Complexity (CC): Measures code complexity (max: B = 6-10)
- Maintainability Index (MI): Measures maintainability (min: B = 20-100)
- Grades: A (best), B, C, D, E, F (worst)
Type Checking
- MyPy: Static type checker for Python
- Verifies type hints and catches type errors at compile time
Individual Tool Usage
# Code formatting check
black --check --diff orchard/ tests/
# Linting + import sorting
ruff check orchard/ tests/
# Security scanning
bandit -r orchard/ -l -q
# Complexity analysis
radon cc orchard/ -n B --total-average
radon mi orchard/ -n B
# Type checking
mypy orchard/ --ignore-missing-imports
# Tests with coverage (fails if < 100%)
pytest --cov=orchard --cov-report=term-missing --cov-fail-under=100 -v tests/
Mutation Testing
Mutation testing documentation has been moved to a dedicated guide. See Mutation Testing (MUTANTS.md) for configuration, running mutmut, the mutation registry, pragma conventions, and more.
Testing & Quality Assurance
Test Suite
Orchard ML includes a comprehensive test suite targeting →100% code coverage:
# Run full test suite
pytest tests/ -v
# Run with coverage report
pytest tests/ --cov=orchard --cov-report=html
# Run specific test categories
pytest tests/ -m unit # Unit tests only
pytest tests/ -m integration # Integration tests only
# Run parallel tests (faster)
pytest tests/ -n auto
Test Categories
- Unit Tests: Config validation, metadata injection, type safety
- Integration Tests: End-to-end pipeline validation, YAML hydration
- Smoke Tests: 1-epoch sanity checks (~30 seconds)
- Health Checks: Dataset integrity
Continuous Integration
GitHub Actions automatically run on every push:
- ✅ Code Quality: Black, Ruff, mypy formatting, linting, and type checks
- ✅ Multi-Python Testing: Unit tests across Python 3.10–3.14
- ✅ Smoke Test: 1-epoch end-to-end validation (~30s, CPU-only)
- ✅ Documentation: README.md presence verification
- ✅ Security Scanning: Bandit (code analysis) and pip-audit (dependency vulnerabilities)
- ✅ Code Coverage: Automated reporting to Codecov (99%+ coverage)
- ✅ SonarCloud: Continuous code quality inspection (reliability, security, maintainability)
SonarCloud Metrics:
All badges above are dynamic and updated automatically by SonarCloud on every push to
main.
Pipeline Status:
| Job | Description | Status |
|---|---|---|
| Code Quality | Black, Ruff, mypy | ✅ Required to pass |
| Pytest Suite | 5 Python versions | ✅ Required to pass |
| Smoke Test | 1-epoch E2E validation | ✅ Required to pass |
| Documentation | README verification | ✅ Required to pass |
| Security Scan | Bandit + pip-audit | ✅ Required to pass (Bandit hard-fail, pip-audit advisory) |
| Build Status | Aggregate summary | ✅ Fails if lint, pytest, smoke test, or security fails |
Note: Health checks are not run in CI to avoid excessive dataset downloads. Run locally with
python -m tests.health_checkfor dataset integrity validation.Note: Python 3.14 (dev) is tested for core functionality only. ONNX export requires
onnxruntime>=1.24.1which provides Python 3.14 wheels.