infrastructure_config
orchard.core.config.infrastructure_config
¶
Infrastructure & Resource Lifecycle Management.
Operational bridge between declarative configuration and physical execution environment. Manages 'clean-start' and 'graceful-stop' sequences, ensuring hardware resource optimization and preventing concurrent run collisions via filesystem locks.
Key Tasks
- Process sanitization: Guards against ghost processes and multi-process collisions in local environments
- Environment locking: Mutex strategy for synchronized experimental output access
- Resource deallocation: GPU/MPS cache flushing and temporary artifact cleanup
InfraManagerProtocol
¶
Bases: Protocol
Protocol defining infrastructure management interface.
Enables dependency injection and mocking in tests while ensuring consistent lifecycle management across implementations.
prepare_environment(cfg, logger)
¶
Prepare execution environment before experiment run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
'HardwareAwareConfig'
|
Configuration with hardware manifest access. |
required |
logger
|
Logger
|
Logger instance for status reporting. |
required |
Source code in orchard/core/config/infrastructure_config.py
release_resources(cfg, logger)
¶
Release resources allocated during environment preparation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
'HardwareAwareConfig'
|
Configuration used during resource allocation. |
required |
logger
|
Logger
|
Logger instance for status reporting. |
required |
Source code in orchard/core/config/infrastructure_config.py
HardwareAwareConfig
¶
Bases: Protocol
Structural contract for configurations exposing hardware manifest.
Decouples infrastructure management from concrete Config implementations, enabling type-safe access to hardware execution policies.
Attributes:
| Name | Type | Description |
|---|---|---|
hardware |
Any
|
HardwareConfig instance with device and lock settings. |
InfrastructureManager
¶
Bases: BaseModel
Environment safeguarding and resource management executor.
Ensures clean execution environment before runs and proper resource release after, preventing concurrent experiment collisions and GPU memory leaks.
Lifecycle
- prepare_environment(): Kill zombies, acquire lock
- [Experiment runs]
- release_resources(): Release lock, flush caches
prepare_environment(cfg, logger=None)
¶
Prepare execution environment for experiment run.
Performs pre-run cleanup and resource acquisition to ensure isolated, collision-free experiment execution.
Steps
- Terminate duplicate/zombie processes (if allow_process_kill=True and not in shared compute environment like SLURM/PBS/LSF)
- Acquire filesystem lock to prevent concurrent runs using the same project name
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
HardwareAwareConfig
|
Configuration object with hardware.allow_process_kill and hardware.lock_file_path attributes. |
required |
logger
|
Logger | None
|
Logger for status messages. Defaults to 'Infrastructure'. |
None
|
Source code in orchard/core/config/infrastructure_config.py
release_resources(cfg, logger=None)
¶
Release system and hardware resources gracefully after experiment.
Performs cleanup to ensure resources are properly freed and available for subsequent runs. Handles errors gracefully to avoid blocking experiment completion.
Steps
- Release filesystem lock at cfg.hardware.lock_file_path
- Flush GPU/MPS memory caches to prevent VRAM fragmentation
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
HardwareAwareConfig
|
Configuration object with hardware.lock_file_path attribute. |
required |
logger
|
Logger | None
|
Logger for status messages. Defaults to 'Infrastructure'. |
None
|
Raises:
| Type | Description |
|---|---|
OSError
|
If lock file cannot be released (e.g. permission denied). |