Back to Home

Model Export Guide

Convert trained Orchard ML models to ONNX for production deployment.

How It Works

Add an export: section to any YAML recipe. The pipeline will train, evaluate, and export in a single run:

orchard run recipes/config_efficientnet_b0.yaml

After training completes, the export phase automatically: 1. Loads the best checkpoint from checkpoints/ 2. Traces the model and converts to ONNX 3. Validates PyTorch vs exported output (optional) 4. Benchmarks inference latency (optional)

outputs/20260219_galaxy10_efficientnetb0_abc123/
  checkpoints/
    best_efficientnetb0.pth     # PyTorch checkpoint
  exports/
    model.onnx                  # Production-ready ONNX export

Configuration

All export behavior is controlled via the export: section of your YAML recipe.

Minimal (defaults are sensible):

export:
  format: onnx

Full reference:

export:
  # Format
  format: onnx                    # only ONNX supported

  # ONNX settings
  opset_version: 18               # 18 = latest, no conversion warnings
  dynamic_axes: true              # dynamic batch size for flexible inference
  do_constant_folding: true       # fold constants at export time

  # Quantization
  quantize: false                 # apply post-training quantization
  quantization_type: int8         # int8 | uint8 | int4 | uint4
  quantization_backend: qnnpack   # qnnpack (mobile/ARM) | fbgemm (x86)

  # Validation
  validate_export: true           # compare PyTorch vs exported output
  validation_samples: 10          # number of samples for validation
  max_deviation: 1.0e-04          # max allowed numerical deviation

  # Benchmark
  benchmark: false                # run ONNX inference latency benchmark
Field Default Description
format onnx Export format (only ONNX supported)
opset_version 18 ONNX opset version (18 recommended)
dynamic_axes true Enable dynamic batch size for inference flexibility
do_constant_folding true Optimize constant operations during export
quantize false Apply post-training quantization
quantization_type int8 Weight type: int8, uint8 (server), int4, uint4 (edge)
quantization_backend qnnpack Quantization backend: qnnpack (mobile/ARM), fbgemm (x86)
validate_export true Run numerical validation after export
validation_samples 10 Number of random samples for validation
max_deviation 1e-4 Maximum allowed output deviation (PyTorch vs ONNX)
benchmark false Run inference latency benchmark after export

Quantization

Orchard ML supports dynamic post-training quantization via ONNX Runtime with four weight types:

Weight Type Bits Target Notes
int8 8 Server / general Default, all layers quantized
uint8 8 Server / general Unsigned variant
int4 4 Edge / mobile Only FC layers quantized (Conv stays FP32)
uint4 4 Edge / mobile Unsigned variant

Server deployment (INT8):

export:
  format: onnx
  quantize: true
  quantization_type: int8
  quantization_backend: fbgemm   # x86 servers

Edge deployment (INT4):

export:
  format: onnx
  quantize: true
  quantization_type: int4
  quantization_backend: qnnpack   # mobile / ARM
Backend Target Hardware Quantization Style
qnnpack Mobile / ARM Per-tensor
fbgemm x86 servers Per-channel

After export, the output directory will contain both models:

exports/
  model.onnx                  # Full-precision original
  model_quantized.onnx        # Quantized (INT8/INT4/...)

Note: 4-bit quantization (INT4/UINT4) only quantizes Gemm nodes (fully-connected layers). Conv layers remain at full precision because ONNX Runtime's 4-bit packing does not support convolution weights. This is the standard approach for edge-deployed vision models.

The validate_export check runs against the original (non-quantized) ONNX model. If benchmark: true is set, both models are benchmarked for latency comparison.

Troubleshooting

Validation failed

If numerical deviations exceed max_deviation, relax the tolerance or set validate_export: false.

Missing onnxscript

pip install onnx onnxruntime onnxscript

Export warnings

opset_version: 18 produces clean output. Lower versions may emit harmless conversion warnings.

Next Steps