feat(nodes): GPU detection and K8s node hardware info#166
Merged
thestumonkey merged 3 commits intodevfrom Feb 27, 2026
Merged
Conversation
- Worker: _collect_gpu_info() queries nvidia-smi/rocm-smi for model, VRAM, CUDA/ROCm version. get_capabilities() includes gpu_count, gpu_devices, gpu_model, gpu_vram_mb. handle_info() now exposes capabilities for discovery. - Backend: GPUDevice model + UNodeCapabilities extended with GPU fields (all optional with defaults for backward compat). - K8s model: KubernetesNode gains gpu_capacity_nvidia/amd from nvidia.com/gpu and amd.com/gpu extended resources. - Frontend: KubernetesNode TS interface + kubernetesApi.listNodes(). ClusterNodeList component (lazy-loaded, expandable, GPU badges). UNode cards show GPU model, VRAM, count, CUDA/ROCm version. KubernetesClustersPage integrates ClusterNodeList per cluster card. All interactive elements have data-testid attributes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thestumonkey
added a commit
that referenced
this pull request
Feb 27, 2026
- Worker: _collect_gpu_info() queries nvidia-smi/rocm-smi for model, VRAM, CUDA/ROCm version. get_capabilities() includes gpu_count, gpu_devices, gpu_model, gpu_vram_mb. handle_info() now exposes capabilities for discovery. - Backend: GPUDevice model + UNodeCapabilities extended with GPU fields (all optional with defaults for backward compat). - K8s model: KubernetesNode gains gpu_capacity_nvidia/amd from nvidia.com/gpu and amd.com/gpu extended resources. - Frontend: KubernetesNode TS interface + kubernetesApi.listNodes(). ClusterNodeList component (lazy-loaded, expandable, GPU badges). UNode cards show GPU model, VRAM, count, CUDA/ROCm version. KubernetesClustersPage integrates ClusterNodeList per cluster card. All interactive elements have data-testid attributes. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_collect_gpu_info()queriesnvidia-smiandrocm-smito collect GPU model, VRAM, CUDA toolkit version, and ROCm version per device. Falls back to driver version ifnvccisn't available. Results surfaced in both heartbeats and the discovery/infoendpoint.GPUDevicePydantic model + extendedUNodeCapabilitieswithgpu_count,gpu_devices,gpu_model,gpu_vram_mb— all optional with safe defaults (fully backward-compatible with older workers).KubernetesNodenow parsesnvidia.com/gpuandamd.com/gpuextended resources from node capacity intogpu_capacity_nvidia/gpu_capacity_amd.ClusterNodeListcomponent (lazy-loaded, expandable) embedded in each K8s cluster card showing node status, roles, CPU/mem, kubelet version, OS, and GPU badges. UNode cards on ClusterPage now display GPU model, VRAM, count, and CUDA/ROCm version when a GPU is present.Test plan
gpu_devicesin MongoDB with model + VRAMGET /api/kubernetes/{id}/nodeson GPU cluster →gpu_capacity_nvidiapopulatedgrep -r "data-testid" ushadow/frontend/src/components/kubernetes/ClusterNodeList.tsxpasses🤖 Generated with Claude Code