Releases: aws/sagemaker-hyperpod-cli
Releases · aws/sagemaker-hyperpod-cli
v3.6.0
Features
- Add EFA support in manifest for training jobs (#345)
- Add end-to-end example documentation (#350)
- Add 4 new HyperPod GA regions (ca-central-1, ap-southeast-3, ap-southeast-4, eu-south-2) (#360)
Enhancements
- Update documentation for elastic training arguments (#343)
- Upgrade Inference Operator helm chart (#346)
- Update MIG config for GPU operator (#358)
- Release Health Monitoring Agent 1.0.1249.0_1.0.359.0 with enhanced Nvidia timeout analysis and bug fixes (#361)
Bug Fixes
Elastic Training Support for ReInvent Keynote 3
- Adding new command line arguments to the HyperPodTrainingOperator to support elastic training capabailities
- --elastic-replica-increment-step, --max-node-count, --elastic-graceful-shutdown-timeout-in-seconds, --elastic-scaling-timeout-in-seconds, --elastic-scale-up-snooze-time-in-seconds, --elastic-replica-discrete-values
- Enables dynamic scaling of compute resources during training operations
Hyperpod CLI V2 with Nova recipe support
Hyperpod CLI V2 with Nova recipe support
Parker CLI, Fractional GPU Feature
-Added hp-devspace command set for ML dev environments
-New commands: create, list, get, update, delete, geturl for dev space management
Support for namespace-based auth and resource isolation
Added auth and get-config commands to check permissions and view default settings
Users can request partial GPU resources using MIG profiles instead of full GPUs
Added --accelerator-partition-type, --accelerator-partition-count, accelerator-partition-limit
New list-accelerator-partition-type command to view available GPU partitions for instance types
v3.3.1
Features
- Describe cluster command
- User can use hyp describe cluster to learn more info about hp clusters
- Jinja template handling logic for inference and training
- User can modify jinja template to add parameters supported by CRD through init experience of inference and training, for further CLI customization
- Cluster creation template versioning
- User can choose cloudformation template version through cluster creation expeirence
- KVCache and intelligent routing for HyperPod Inference
- InferenceEndpointConfig CRD supported is updated to v1
- KVCache and Intelligent Routing support is added in template version 1.1
init experience Launch
Features
- Init Experience
- Init, Validate, and Create JumpStart endpoint, Custom endpoint, and PyTorch Training Job with local configuration
- Cluster management
- Bug fixes for cluster creation
Bug fixes
Bug Fixes
- Bug Fixes in cluster creation
v3.2.0 - Cluster Management and Init Experience
Features
Cluster management
Creation of cluster stack
Describing and listing a cluster stack
Updating a cluster
Init Experience
Init, Validate, Create with local configurations
v3.1.0: Release tg (#209)
v3.1.0 (2025-08-13)
Features
- Task Governance feature for training jobs.