feat: add dynamo-inference project with Dynamo v0.9.0 + cross-node EFA LIBFABRIC#71
Open
dmvevents wants to merge 19 commits intoaws-samples:mainfrom
Open
feat: add dynamo-inference project with Dynamo v0.9.0 + cross-node EFA LIBFABRIC#71dmvevents wants to merge 19 commits intoaws-samples:mainfrom
dmvevents wants to merge 19 commits intoaws-samples:mainfrom
Conversation
This contribution adds comprehensive GPU-to-GPU distributed inference capabilities using NVIDIA NIXL (Network Infrastructure for eXascale Learning) framework. **Key Components:** Container Images: - dynamo-vllm: vLLM backend with NIXL support - dynamo-trtllm: TensorRT-LLM backend with NIXL support - Production-ready Dockerfiles with UCX, EFA, GPUDirect RDMA Infrastructure: - EKS/HyperPod deployment manifests - ETCD coordination service setup - EFA networking configuration - GPUDirect RDMA kernel module integration Documentation: - Complete benchmarking guide (nixlbench) - Installation and setup instructions - Performance validation results (0.307 GB/sec peak bandwidth) - Troubleshooting guides Testing: - nixlbench validation suite - Multi-node GPU communication tests - UCX backend performance benchmarks This enables high-performance distributed inference workloads with cross-node GPU-to-GPU communication on AWS infrastructure.
…ddress PR feedback Changes made to address PR review feedback for aws-samples#49: 1. Directory Rename (nixl-distributed-inference → dynamo-inference) - Renamed project directory to align with Dynamo initiative naming - All file references updated to reflect new structure - Git history preserved via git mv command 2. Security: Redact AWS Account Numbers - Replaced all instances of AWS account ID 058264135704 with <AWS_ACCOUNT_ID> - Affected files: YAML configurations, documentation, build scripts, test logs - Total replacements: 28 instances across 12 files - No credentials or secrets exposed in commit history (verified) 3. Remove Production Terminology Claims - Changed "production-ready" → "deployment-ready" - Changed "Production Ready" → "Deployment Ready" - Changed "ready for production" → "ready for deployment" - Changed "production deployment" → "deployment" - Changed "production workloads" → "deployment workloads" - Preserved legitimate technical terms (Docker tags, filenames) - Compliance with aws-samples org policy (no production guarantees) Files Modified: - All documentation (.md files) updated for terminology - Kubernetes deployment configs (.yaml) updated for account redaction - Build scripts (.sh) updated for account redaction - Test logs (.log) updated for account redaction - README.md updated with deployment-ready language This update ensures the contribution meets aws-samples organization standards for public repositories, removing sensitive AWS account identifiers and production readiness claims.
Replace 'production' terminology with appropriate alternatives to comply
with aws-samples organization policy:
- Rename Dockerfile.production → Dockerfile.base
- Update Docker tags: production → optimized
- Update container type references: production → base
- Update documentation: production → deployment (context-dependent)
Files changed:
- Dockerfile.base (renamed from Dockerfile.production)
- Build scripts: build.sh, build_trtllm.sh, build-all-{slim,runtime}.sh
- Validation: scripts/validate-build.sh
- Dockerfiles: Dockerfile.dynamo-{vllm,trtllm}
- Documentation: README.md, VERSION_ALIGNMENT_APPLIED.md, and 9 other docs
Legitimate contextual uses remain (e.g., "production deployment",
"production workload testing").
This commit adds complete TRT-LLM deployment capabilities to the Dynamo inference platform, including: - Disaggregated TRT-LLM deployment configurations (prefill/decode workers) - KV cache transceiver configurations for disaggregated mode - Helper scripts and benchmark suite for testing - Comprehensive deployment documentation with troubleshooting - A10 GPU support documentation for workshop deployments - Benchmark results from H100 testing Key features: - Working TRT-LLM disaggregated architecture with 2x prefill + 2x decode workers - Triton Unicode bug workaround applied automatically - ConfigMap-based configuration management - Tested and validated on H100 GPUs (381-470 tokens/sec) - Full A10 GPU support with optimized configurations Files added: - deployments/trtllm/trtllm-disagg-qwen.yaml - deployments/trtllm/trtllm-prefill-config.yaml - deployments/trtllm/trtllm-decode-config.yaml - scripts/trtllm-helpers.sh - scripts/benchmark-trtllm.sh - docs/TRTLLM_DEPLOYMENT_GUIDE.md - docs/A10_DEPLOYMENT_GUIDE.md - benchmark-results/trtllm_benchmark_20251118_191302.md
Replace friend/colleague references with neutral technical language: - 'friend configuration' → 'reference configuration' - 'colleague' → 'reference implementation' - 'successful friend configuration' → 'validated configuration' Files updated: - NIXLBENCH_SETUP_GUIDE.md - nixl-aligned/VERSION_COMPARISON.md
**Critical fix**: Correct libfabric version references - Changed all v1.21.0 references to v2.3.0 (matches Dockerfile and validated tests) - Files: nixl-aligned/GETTING_STARTED.md, VERSION_COMPARISON.md, README.md **Format standardization**: - Removed all emojis (✅, ❌,⚠️ ) and replaced with text ([Completed], [No], [Warning]) - Standardized header capitalization (ALLCAPS → Standard Case) - Files affected: 13 markdown files across project **Summary of changes**: - libfabric version alignment: 10 files updated - Emoji removal: ~150 replacements across 13 files - Header standardization: docs/KUBECTL_QUICK_REF.md, NIXLBENCH_TESTING_GUIDE.md All documentation now matches repository standards and validated test configurations.
- Add NON_INTERACTIVE environment variable support to all build scripts (build.sh, build_vllm.sh, build_trtllm.sh, build-all-slim.sh, build-all-runtime.sh) - Remove informal language from documentation - Replace AWS account numbers with placeholder <AWS_ACCOUNT_ID> - Remove status markers from documentation sections - Standardize terminology (Production -> Debloated/Optimized)
…ation Update all component versions to align with Dynamo v0.9.0 release: Version updates: - Dynamo: 0.4.0/0.6.1 -> v0.9.0 - NIXL: 0.6.0/0.7.1 -> 0.9.0 - TRT-LLM: 1.1.0rc5 -> 1.3.0rc3 - vLLM: 0.10.2/0.11.0 -> 0.14.1 - CUDA base: 25.01/cuda12.8 -> 25.12/cuda13.1 - PyTorch: 25.06-py3 -> 25.12-py3 - EFA installer: 1.43.1 -> 1.45.1 - NCCL: 2.23.4-1 -> 2.28.9-1 - AWS OFI NCCL: v1.12.0-aws -> v1.17.2 - GDRCopy: 2.4.1 -> 2.5.1 - UCX: v1.19.0 -> v1.20.0 New: cross-node disaggregated inference deployment YAML - Validated on 2x P5.48xlarge with NIXL LIBFABRIC over EFA RDMA - EFA handshake confirmed between nodes - 128Gi memory requirement documented (64Gi causes OOM) Files updated: - Dockerfile.base, Dockerfile.dynamo-trtllm, Dockerfile.dynamo-vllm - build_trtllm.sh, build_vllm.sh - README.md (version tables, cross-node section) - NGC build references
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add the
dynamo-inferenceproject with production-validated disaggregated inference on AWS using NVIDIA Dynamo v0.9.0 with EFA RDMA networking.What's included
Dockerfiles (EFA-optimized base + framework images)
Dockerfile.base— NIXL 0.9.0 + EFA 1.45.1 + UCX 1.20.0 + NCCL 2.28.9 + CUDA 13.1Dockerfile.dynamo-trtllm— TensorRT-LLM 1.3.0rc3Dockerfile.dynamo-vllm— vLLM 0.14.1Cross-node disaggregated inference
deployments/cross-node/trtllm-crossnode-libfabric.yaml— Production-validated on 2x P5.48xlargeBuild scripts and documentation
build.sh,build_trtllm.sh,build_vllm.sh— automated buildsComponent Versions
Validation
Tested on 2x P5.48xlarge (8x H100 80GB, 32 EFA per node):
Test plan