Skip to content

Comments

Add DGX OS 7 Packer template for MAAS#6

Merged
dholt merged 13 commits intoDeepOps:masterfrom
dholt:add-dgxos7-template
Feb 4, 2026
Merged

Add DGX OS 7 Packer template for MAAS#6
dholt merged 13 commits intoDeepOps:masterfrom
dholt:add-dgxos7-template

Conversation

@dholt
Copy link
Member

@dholt dholt commented Feb 4, 2026

Summary

  • Add new dgxos7/ directory with Packer template for building NVIDIA DGX OS 7 images deployable via MAAS
  • DGX OS 7 is based on Ubuntu 24.04 LTS with kernel 6.8, supporting DGX H100/H200/B200/B300/A100 platforms
  • Re-add shared scripts/tar-root for extracting root filesystem (used by dgxos5)

Key Features

  • Uses DGX installer's native force-ai mechanism for autoinstall config delivery
  • Automatically accepts NVIDIA EULA via nooemconfig kernel parameter
  • Installs full NVIDIA driver stack (570.x) with platform-specific packages
  • Properly resets cloud-init state for MAAS deployment
  • UEFI boot support (required for DGX OS 7)

Test Plan

  • Built dgxos7.tar.gz image (~2.9GB) on Proxmox build host
  • Uploaded image to MAAS
  • Deployed to test VM via MAAS
  • Verified: Ubuntu 24.04, kernel 6.8.0-87-generic, nvidia-driver-570-open installed
  • Verified: cloud-init completed successfully, hostname set by MAAS

dholt and others added 13 commits February 3, 2026 16:03
This adds support for building MAAS-deployable images of NVIDIA DGX OS 7,
which is based on Ubuntu 24.04 LTS with kernel 6.8.

Key changes:
- New dgxos7/ directory with packer template and autoinstall config
- Uses DGX installer's native force-ai parameter to deliver custom config
- Supports DGX H100, H200, B200, B300, and A100 platforms
- Requires UEFI boot (OVMF firmware)

Also includes:
- Fix scripts/tar-root to handle GPT partition layouts (root on p2 vs p1)
- Update README.md with template overview
- Update dgxos5/README.md to reference dgxos7 for newer hardware

Tested with DGX OS 7.3.1 ISO on QEMU/KVM with OVMF.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove cloud-init.disabled marker so cloud-init runs on deployment
- Disable cloud-init network config so MAAS can manage networking
- Add fallback netplan DHCP config for packer build phase

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add proper network section with DHCP for all en* interfaces
- Remove cloud-init network disable (let cloud-init work normally)
- Remove late-commands netplan hack (was virtio-only, broke Intel NICs)
- Keep rm of cloud-init.disabled marker so cloud-init runs on MAAS deploy

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The DGX installer's 99-installer.cfg has a deferred write_files that
recreates /etc/cloud/cloud-init.disabled after first boot, which breaks
MAAS deployment. Delete this config file in late-commands.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add provisioner to remove cloud-init.disabled and 99-installer.cfg
  after first boot but before tarball creation
- These files were causing cloud-init to be disabled on MAAS deploy
- Add sync command to ensure filesystem changes persist to disk
- Remove redundant late-command (now handled by provisioner)

Cloud-init now properly starts on MAAS-deployed systems.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…NSTALL_PKGS

The build was only installing base Ubuntu because:
1. force-platform was hardcoded to 'kvm' instead of using the platform variable
2. packer-ai.yaml was missing the CHANGE_INSTALL_PKGS placeholder

Changes:
- dgxos7.json: Use {{user `platform`}} (defaults to dgx_h100) instead of hardcoded kvm
- packer-ai.yaml: Add apt config to disable Ubuntu suites during install
- packer-ai.yaml: Add CHANGE_INSTALL_PKGS late-command for DGX package installation
- packer-ai.yaml: Mount ISO cdrom for package installation
- packer-ai.yaml: Restore apt sources.list.d after install
- packer-ai.yaml: Enable nvidia-persistenced service

The preseed.sh script substitutes CHANGE_INSTALL_PKGS with packages from
/ai/${platform}-pkgs based on the force-platform setting.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The boot_command was appending parameters to the wrong line in the
GRUB editor. With only one <down> after 'e', parameters landed on a
blank line instead of the linux command line.

Changes:
- Fix boot_command: use two <down> presses after 'e' to reach linux line
- Add nooemconfig parameter to skip OEM config packages
- Use hardcoded 10.0.2.2 for QEMU user-mode networking
- Add early-commands to replace nv-eula.sh with no-op
- Add late-commands for dpkg-divert EULA bypass
- Use hardcoded package list excluding nvidia-oem-config-eula
- Update .gitignore to exclude Claude Code files and build artifacts
- Update README with correct boot parameters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Run cloud-init clean --logs to reset cloud-init state before creating
the tarball. Without this, cloud-init thinks it already ran during
the packer build and won't re-run on MAAS deployment.

Also remove 90-installer-network.cfg and /etc/fstab to let MAAS
control networking and storage configuration.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add curtin-hooks and setup-bootloader scripts for proper UEFI boot
- Fix provisioner order: run cloud-init clean before removing disable files
- Add first-boot marker for dgx-release service
- Use fuse-tar-root instead of tar-root for image creation
- Update README with build dependencies and clarify tarball extraction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The tar-root script is needed by dgxos5 which uses the older
setup-nbd/tar-root approach. Auto-detects GPT vs MBR layouts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace hardcoded package list with CHANGE_INSTALL_PKGS placeholder
- preseed.sh substitutes this with platform-specific packages at install time
- Add fallback `|| apt-get install -f -y` to handle dependency issues
- Add EULA notice to README

This makes the template more maintainable as package updates come from
the ISO rather than requiring manual updates to the template.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add notes to README about:
- MAAS CLI requiring image file in current working directory
- How to replace an existing image in MAAS

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@dholt dholt self-assigned this Feb 4, 2026
@dholt dholt merged commit bf6e5c8 into DeepOps:master Feb 4, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant