feat: add dynamo-inference project with Dynamo v0.9.0 + cross-node EFA LIBFABRIC by dmvevents · Pull Request #71 · aws-samples/awsome-inference

dmvevents · 2026-02-28T17:42:52Z

Summary

Add the dynamo-inference project with production-validated disaggregated inference on AWS using NVIDIA Dynamo v0.9.0 with EFA RDMA networking.

What's included

Dockerfiles (EFA-optimized base + framework images)

Dockerfile.base — NIXL 0.9.0 + EFA 1.45.1 + UCX 1.20.0 + NCCL 2.28.9 + CUDA 13.1
Dockerfile.dynamo-trtllm — TensorRT-LLM 1.3.0rc3
Dockerfile.dynamo-vllm — vLLM 0.14.1

Cross-node disaggregated inference

deployments/cross-node/trtllm-crossnode-libfabric.yaml — Production-validated on 2x P5.48xlarge
NIXL LIBFABRIC backend over EFA RDMA with confirmed cross-node handshake
128Gi memory requirement documented (32 EFA device enumeration)

Build scripts and documentation

build.sh, build_trtllm.sh, build_vllm.sh — automated builds
Benchmarking guide, NGC build variants, deployment scripts
Complete README with version matrix and troubleshooting

Component Versions

Component	Version
Dynamo	v0.9.0
NIXL	0.9.0
TRT-LLM	1.3.0rc3
vLLM	0.14.1
CUDA	13.1
EFA installer	1.45.1
NCCL	2.28.9-1
UCX	1.20.0

Validation

Tested on 2x P5.48xlarge (8x H100 80GB, 32 EFA per node):

NIXL LIBFABRIC backend activation confirmed
Cross-node EFA handshake (8 bidirectional connections)
Inference: 147ms latency with Qwen3-0.6B
10 concurrent requests: all successful

Test plan

Cross-node inference on P5.48xlarge with EFA
NIXL LIBFABRIC backend + EFA handshake verified
Full Docker build from scratch
Multi-GPU (TP8) production test

This contribution adds comprehensive GPU-to-GPU distributed inference capabilities using NVIDIA NIXL (Network Infrastructure for eXascale Learning) framework. **Key Components:** Container Images: - dynamo-vllm: vLLM backend with NIXL support - dynamo-trtllm: TensorRT-LLM backend with NIXL support - Production-ready Dockerfiles with UCX, EFA, GPUDirect RDMA Infrastructure: - EKS/HyperPod deployment manifests - ETCD coordination service setup - EFA networking configuration - GPUDirect RDMA kernel module integration Documentation: - Complete benchmarking guide (nixlbench) - Installation and setup instructions - Performance validation results (0.307 GB/sec peak bandwidth) - Troubleshooting guides Testing: - nixlbench validation suite - Multi-node GPU communication tests - UCX backend performance benchmarks This enables high-performance distributed inference workloads with cross-node GPU-to-GPU communication on AWS infrastructure.

…ddress PR feedback Changes made to address PR review feedback for aws-samples#49: 1. Directory Rename (nixl-distributed-inference → dynamo-inference) - Renamed project directory to align with Dynamo initiative naming - All file references updated to reflect new structure - Git history preserved via git mv command 2. Security: Redact AWS Account Numbers - Replaced all instances of AWS account ID 058264135704 with <AWS_ACCOUNT_ID> - Affected files: YAML configurations, documentation, build scripts, test logs - Total replacements: 28 instances across 12 files - No credentials or secrets exposed in commit history (verified) 3. Remove Production Terminology Claims - Changed "production-ready" → "deployment-ready" - Changed "Production Ready" → "Deployment Ready" - Changed "ready for production" → "ready for deployment" - Changed "production deployment" → "deployment" - Changed "production workloads" → "deployment workloads" - Preserved legitimate technical terms (Docker tags, filenames) - Compliance with aws-samples org policy (no production guarantees) Files Modified: - All documentation (.md files) updated for terminology - Kubernetes deployment configs (.yaml) updated for account redaction - Build scripts (.sh) updated for account redaction - Test logs (.log) updated for account redaction - README.md updated with deployment-ready language This update ensures the contribution meets aws-samples organization standards for public repositories, removing sensitive AWS account identifiers and production readiness claims.

Replace 'production' terminology with appropriate alternatives to comply with aws-samples organization policy: - Rename Dockerfile.production → Dockerfile.base - Update Docker tags: production → optimized - Update container type references: production → base - Update documentation: production → deployment (context-dependent) Files changed: - Dockerfile.base (renamed from Dockerfile.production) - Build scripts: build.sh, build_trtllm.sh, build-all-{slim,runtime}.sh - Validation: scripts/validate-build.sh - Dockerfiles: Dockerfile.dynamo-{vllm,trtllm} - Documentation: README.md, VERSION_ALIGNMENT_APPLIED.md, and 9 other docs Legitimate contextual uses remain (e.g., "production deployment", "production workload testing").

This commit adds complete TRT-LLM deployment capabilities to the Dynamo inference platform, including: - Disaggregated TRT-LLM deployment configurations (prefill/decode workers) - KV cache transceiver configurations for disaggregated mode - Helper scripts and benchmark suite for testing - Comprehensive deployment documentation with troubleshooting - A10 GPU support documentation for workshop deployments - Benchmark results from H100 testing Key features: - Working TRT-LLM disaggregated architecture with 2x prefill + 2x decode workers - Triton Unicode bug workaround applied automatically - ConfigMap-based configuration management - Tested and validated on H100 GPUs (381-470 tokens/sec) - Full A10 GPU support with optimized configurations Files added: - deployments/trtllm/trtllm-disagg-qwen.yaml - deployments/trtllm/trtllm-prefill-config.yaml - deployments/trtllm/trtllm-decode-config.yaml - scripts/trtllm-helpers.sh - scripts/benchmark-trtllm.sh - docs/TRTLLM_DEPLOYMENT_GUIDE.md - docs/A10_DEPLOYMENT_GUIDE.md - benchmark-results/trtllm_benchmark_20251118_191302.md

Replace friend/colleague references with neutral technical language: - 'friend configuration' → 'reference configuration' - 'colleague' → 'reference implementation' - 'successful friend configuration' → 'validated configuration' Files updated: - NIXLBENCH_SETUP_GUIDE.md - nixl-aligned/VERSION_COMPARISON.md

**Critical fix**: Correct libfabric version references - Changed all v1.21.0 references to v2.3.0 (matches Dockerfile and validated tests) - Files: nixl-aligned/GETTING_STARTED.md, VERSION_COMPARISON.md, README.md **Format standardization**: - Removed all emojis (✅, ❌, ⚠️) and replaced with text ([Completed], [No], [Warning]) - Standardized header capitalization (ALLCAPS → Standard Case) - Files affected: 13 markdown files across project **Summary of changes**: - libfabric version alignment: 10 files updated - Emoji removal: ~150 replacements across 13 files - Header standardization: docs/KUBECTL_QUICK_REF.md, NIXLBENCH_TESTING_GUIDE.md All documentation now matches repository standards and validated test configurations.

- Add NON_INTERACTIVE environment variable support to all build scripts (build.sh, build_vllm.sh, build_trtllm.sh, build-all-slim.sh, build-all-runtime.sh) - Remove informal language from documentation - Replace AWS account numbers with placeholder <AWS_ACCOUNT_ID> - Remove status markers from documentation sections - Standardize terminology (Production -> Debloated/Optimized)

…ation Update all component versions to align with Dynamo v0.9.0 release: Version updates: - Dynamo: 0.4.0/0.6.1 -> v0.9.0 - NIXL: 0.6.0/0.7.1 -> 0.9.0 - TRT-LLM: 1.1.0rc5 -> 1.3.0rc3 - vLLM: 0.10.2/0.11.0 -> 0.14.1 - CUDA base: 25.01/cuda12.8 -> 25.12/cuda13.1 - PyTorch: 25.06-py3 -> 25.12-py3 - EFA installer: 1.43.1 -> 1.45.1 - NCCL: 2.23.4-1 -> 2.28.9-1 - AWS OFI NCCL: v1.12.0-aws -> v1.17.2 - GDRCopy: 2.4.1 -> 2.5.1 - UCX: v1.19.0 -> v1.20.0 New: cross-node disaggregated inference deployment YAML - Validated on 2x P5.48xlarge with NIXL LIBFABRIC over EFA RDMA - EFA handshake confirmed between nodes - 128Gi memory requirement documented (64Gi causes OOM) Files updated: - Dockerfile.base, Dockerfile.dynamo-trtllm, Dockerfile.dynamo-vllm - build_trtllm.sh, build_vllm.sh - README.md (version tables, cross-node section) - NGC build references

dmvevents and others added 19 commits November 14, 2025 02:30

Replace AWS account numbers with placeholder in documentation

5bdda43

Remove account number from trtllm deployment YAML

fa46bf4

Update GenAI-Perf link to new repository location

6fd525d

Update SPDX license identifier to MIT-0

2ce6267

Remove author attribution and ASCII art from Dockerfile header

4176c58

Move contributor names to CREDITS.md and update license identifiers

9e8b692

Update contributor titles in CREDITS.md

788f4ba

Update README.md: link to CREDITS.md and fix LICENSE path

755c724

Fix GitHub Issues link to point to aws-samples repository

594c750

Restore checkmark in README architecture table

8758584

Merge branch 'main' into add-nixl-distributed-inference

aa1ae0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add dynamo-inference project with Dynamo v0.9.0 + cross-node EFA LIBFABRIC#71

feat: add dynamo-inference project with Dynamo v0.9.0 + cross-node EFA LIBFABRIC#71
dmvevents wants to merge 19 commits intoaws-samples:mainfrom
dmvevents:feat/dynamo-v0.9.0-efa-update

dmvevents commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dmvevents commented Feb 28, 2026

Summary

What's included

Dockerfiles (EFA-optimized base + framework images)

Cross-node disaggregated inference

Build scripts and documentation

Component Versions

Validation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants