Back to Home

Testing & Quality Assurance

Installation

Install dev dependencies (includes all quality and testing tools):

pip install -e ".[dev]"

Environment Verification

Smoke Test (1-epoch sanity check):

# Default: BloodMNIST 28×28
python -m tests.smoke_test

# Custom dataset
python -m tests.smoke_test --dataset pathmnist

Output: Validates full pipeline in <30 seconds: - Dataset loading and preprocessing - Model instantiation and weight transfer - Training loop execution - Evaluation metrics computation - Excel/PNG artifact generation

Health Check (dataset integrity):

python -m tests.health_check --dataset organcmnist --resolution 224

Output: Verifies: - MD5 checksum matching - NPZ key structure (train_images, train_labels, val_images, etc.) - Sample count validation


Code Quality Checks

Orchard ML includes automated quality check scripts that run all code quality tools in sequence.

Quick Check (Recommended)

Fast quality checks for everyday development (~30-60 seconds):

# Run all standard quality checks
bash scripts/check_quality.sh

What it checks: - Black: Code formatting compliance (PEP 8 style, 100 chars) - Ruff: Linting + import sorting (replaces Flake8 and isort) - Bandit: Security vulnerability scanning - Radon: Cyclomatic complexity & maintainability index - Pytest: Full test suite with coverage report

Extended Check (Thorough)

Comprehensive checks with type checking (~60-120 seconds):

# Run extended quality checks with MyPy
bash scripts/check_quality_full.sh

Additional checks: - MyPy: Static type checking - Radon: Extended metrics (raw metrics, detailed analysis) - Pytest: HTML coverage report

Tool Descriptions

Formatting Tools

  • Black: Opinionated code formatter (line length: 100)
    black orchard/ tests/  # Auto-fix
    

Linting & Import Sorting

  • Ruff: Fast linter and import sorter (replaces Flake8 + isort)
  • Checks: unused variables, imports, style violations, argument usage
  • Line length: 100 (E501 ignored, handled by Black)
  • Rules: E, F, W, I, ARG
    ruff check orchard/ tests/       # Check
    ruff check --fix orchard/ tests/  # Auto-fix
    

Security Tools

  • Bandit: Detects common security issues
  • Checks: hardcoded passwords, SQL injection, insecure temp files
  • Severity: Low, Medium, and High (-l)
    bandit -r orchard/ -l -q
    

Complexity Analysis

  • Radon: Code metrics analyzer
  • Cyclomatic Complexity (CC): Measures code complexity (max: B = 6-10)
  • Maintainability Index (MI): Measures maintainability (min: B = 20-100)
  • Grades: A (best), B, C, D, E, F (worst)
    radon cc orchard/ -n B --total-average  # Complexity
    radon mi orchard/ -n B                   # Maintainability
    

Type Checking

  • MyPy: Static type checker for Python
  • Verifies type hints and catches type errors at compile time
    mypy orchard/ --ignore-missing-imports --no-strict-optional
    

Individual Tool Usage

# Code formatting check
black --check --diff orchard/ tests/

# Linting + import sorting
ruff check orchard/ tests/

# Security scanning
bandit -r orchard/ -l -q

# Complexity analysis
radon cc orchard/ -n B --total-average
radon mi orchard/ -n B

# Type checking
mypy orchard/ --ignore-missing-imports

# Tests with coverage (fails if < 100%)
pytest --cov=orchard --cov-report=term-missing --cov-fail-under=100 -v tests/

Mutation Testing

Mutation testing documentation has been moved to a dedicated guide. See Mutation Testing (MUTANTS.md) for configuration, running mutmut, the mutation registry, pragma conventions, and more.


Testing & Quality Assurance

Test Suite

Orchard ML includes a comprehensive test suite targeting →100% code coverage:

Unit Tests Integration Tests

# Run full test suite
pytest tests/ -v

# Run with coverage report
pytest tests/ --cov=orchard --cov-report=html

# Run specific test categories
pytest tests/ -m unit          # Unit tests only
pytest tests/ -m integration   # Integration tests only

# Run parallel tests (faster)
pytest tests/ -n auto

Test Categories

  • Unit Tests: Config validation, metadata injection, type safety
  • Integration Tests: End-to-end pipeline validation, YAML hydration
  • Smoke Tests: 1-epoch sanity checks (~30 seconds)
  • Health Checks: Dataset integrity

Continuous Integration

GitHub Actions automatically run on every push:

  • Code Quality: Black, Ruff, mypy formatting, linting, and type checks
  • Multi-Python Testing: Unit tests across Python 3.10–3.14
  • Smoke Test: 1-epoch end-to-end validation (~30s, CPU-only)
  • Documentation: README.md presence verification
  • Security Scanning: Bandit (code analysis) and pip-audit (dependency vulnerabilities)
  • Code Coverage: Automated reporting to Codecov (99%+ coverage)
  • SonarCloud: Continuous code quality inspection (reliability, security, maintainability)

SonarCloud Metrics:

Reliability Security Maintainability Coverage Bugs Code Smells

All badges above are dynamic and updated automatically by SonarCloud on every push to main.

Pipeline Status:

Job Description Status
Code Quality Black, Ruff, mypy ✅ Required to pass
Pytest Suite 5 Python versions ✅ Required to pass
Smoke Test 1-epoch E2E validation ✅ Required to pass
Documentation README verification ✅ Required to pass
Security Scan Bandit + pip-audit ✅ Required to pass (Bandit hard-fail, pip-audit advisory)
Build Status Aggregate summary ✅ Fails if lint, pytest, smoke test, or security fails

View the latest build: CI/CD

Note: Health checks are not run in CI to avoid excessive dataset downloads. Run locally with python -m tests.health_check for dataset integrity validation.

Note: Python 3.14 (dev) is tested for core functionality only. ONNX export requires onnxruntime>=1.24.1 which provides Python 3.14 wheels.