vit_tiny
orchard.architectures.vit_tiny
¶
Vision Transformer Tiny (ViT-Tiny) for 224×224 Image Classification.
Implements the Vision Transformer architecture via timm library with support for multiple pretrained weight variants. Designed for efficient image classification with transfer learning capabilities.
Key Features:
- Patch-Based Attention: Processes 16×16 patches with transformer encoders
- Multi-Weight Support: Compatible with ImageNet-1k/21k pretraining
- Adaptive Input: Dynamic first-layer modification for grayscale datasets
- Efficient Scale: Tiny variant balances performance and compute requirements
Pretrained Weight Options:
- 'vit_tiny_patch16_224.augreg_in21k_ft_in1k': ImageNet-21k → 1k fine-tuned
- 'vit_tiny_patch16_224.augreg_in21k': ImageNet-21k (requires custom head)
- 'vit_tiny_patch16_224': ImageNet-1k baseline
build_vit_tiny(num_classes, in_channels, *, pretrained, weight_variant=None)
¶
Constructs Vision Transformer Tiny adapted for image classification datasets.
Workflow
- Resolve pretrained weight variant from config (if enabled)
- Load model via timm with automatic head replacement
- Modify patch embedding layer for custom input channels
- Apply weight morphing for channel compression (if grayscale)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_classes
|
int
|
Number of dataset classes for classification head |
required |
in_channels
|
int
|
Input channels (1=Grayscale, 3=RGB) |
required |
pretrained
|
bool
|
Whether to load pretrained weights |
required |
weight_variant
|
str | None
|
Specific timm weight variant identifier |
None
|
Returns:
| Type | Description |
|---|---|
Module
|
Adapted ViT-Tiny model (device placement handled by factory). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If weight variant is invalid or incompatible with pretrained flag |