From 5ed9d635ce0ef05efd3c3ccb61ed200077e259c7 Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Tue, 27 Jan 2026 11:59:55 -0700 Subject: [PATCH 01/13] Add DGX OS 7 (Ubuntu 24.04) packer template for MAAS This adds support for building MAAS-deployable images of NVIDIA DGX OS 7, which is based on Ubuntu 24.04 LTS with kernel 6.8. Key changes: - New dgxos7/ directory with packer template and autoinstall config - Uses DGX installer's native force-ai parameter to deliver custom config - Supports DGX H100, H200, B200, B300, and A100 platforms - Requires UEFI boot (OVMF firmware) Also includes: - Fix scripts/tar-root to handle GPT partition layouts (root on p2 vs p1) - Update README.md with template overview - Update dgxos5/README.md to reference dgxos7 for newer hardware Tested with DGX OS 7.3.1 ISO on QEMU/KVM with OVMF. Co-Authored-By: Claude Opus 4.5 --- README.md | 111 ++++++++++++------------- dgxos5/README.md | 6 ++ dgxos7/.gitignore | 1 + dgxos7/Makefile | 36 +++++++++ dgxos7/README.md | 162 +++++++++++++++++++++++++++++++++++++ dgxos7/dgxos7.json | 47 +++++++++++ dgxos7/http/packer-ai.yaml | 67 +++++++++++++++ 7 files changed, 376 insertions(+), 54 deletions(-) create mode 100644 dgxos7/.gitignore create mode 100644 dgxos7/Makefile create mode 100644 dgxos7/README.md create mode 100644 dgxos7/dgxos7.json create mode 100644 dgxos7/http/packer-ai.yaml diff --git a/README.md b/README.md index d59ec323..a625efff 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,11 @@ # Building Custom Images for MAAS with Packer -This repository provides [Packer](https://developer.hashicorp.com/packer) templates, scripts, and configuration to build custom operating system images for [MAAS](https://maas.io). +This repository provides [Packer](https://developer.hashicorp.com/packer) templates, scripts, and configuration to build custom operating system images for [MAAS](https://maas.io). Use these templates if you need: -- **Custom Ubuntu images** with pre-installed packages, security hardening, or organization-specific tweaks. -- **Non-Ubuntu images** (RHEL, CentOS, SLES, Windows, ESXi, etc.) that MAAS does not provide out-of-the-box. -- A **repeatable, automated build process** for images you can upload into MAAS. +- **Custom Ubuntu images** with pre-installed packages, security hardening, or organization-specific tweaks. +- **Non-Ubuntu images** (RHEL, CentOS, SLES, Windows, ESXi, etc.) that MAAS does not provide out-of-the-box. +- A **repeatable, automated build process** for images you can upload into MAAS. > ⚠️ If you only need stock Ubuntu images, see the [How to manage images](https://canonical.com/maas/docs/how-to-manage-images) guide instead. @@ -13,10 +13,10 @@ Use these templates if you need: ## Why build custom images? -- **Consistency**: Standardize environments across your MAAS deployments. -- **Control**: Add, remove, or patch software before deployment. -- **Compliance**: Ensure security and audit requirements are met. -- **Coverage**: Deploy non-Ubuntu operating systems through MAAS. +- **Consistency**: Standardize environments across your MAAS deployments. +- **Control**: Add, remove, or patch software before deployment. +- **Compliance**: Ensure security and audit requirements are met. +- **Coverage**: Deploy non-Ubuntu operating systems through MAAS. MAAS relies on these images when commissioning, deploying, and testing machines. Custom images let you tailor exactly what gets deployed. @@ -27,12 +27,12 @@ MAAS relies on these images when commissioning, deploying, and testing machines. Before building an image, prepare a build environment: - An Ubuntu host or VM with: - - 4 CPU cores - - 8 GB RAM - - 25 GB free storage - - Hardware-assisted virtualization enabled -- [Packer installed](https://developer.hashicorp.com/packer/tutorials/docker-get-started/get-started-install-cli#installing-packer) -- (Optional) QEMU with GUI if you want to see builds interactively + - 4 CPU cores + - 8 GB RAM + - 25 GB free storage + - Hardware-assisted virtualization enabled +- [Packer installed](https://developer.hashicorp.com/packer/tutorials/docker-get-started/get-started-install-cli#installing-packer) +- (Optional) QEMU with GUI if you want to see builds interactively Verify Packer is installed: @@ -56,13 +56,13 @@ cd packer-maas ### 2. Select a template Each supported operating system has its own directory with: -- One or more **HCL2 templates** -- A `scripts/` directory with helper scripts -- An `http/` directory with auto-configuration files -- A `README.md` explaining OS-specific details -- A `Makefile` for convenience +- One or more **HCL2 templates** +- A `scripts/` directory with helper scripts +- An `http/` directory with auto-configuration files +- A `README.md` explaining OS-specific details +- A `Makefile` for convenience -See the [Existing templates](#existing-templates) table below for the full list. +See the [Existing templates](#existing-templates) table below for the full list. ### 3. Build the image @@ -80,8 +80,8 @@ PACKER_LOG=1 packer build ubuntu.pkr.hcl ``` If you want to view the VM build process: -- Remove `"headless": true` from the template -- Or connect via VNC using the IP/port shown during build +- Remove `"headless": true` from the template +- Or connect via VNC using the IP/port shown during build ### 4. Upload the image to MAAS @@ -91,7 +91,7 @@ After building, upload the image to your MAAS region controller: maas $PROFILE boot-resources create name='custom/ubuntu-24.04' title='Ubuntu 24.04 Custom' architecture='amd64/generic' filetype='tgz' content@=ubuntu-24.04-custom.tgz ``` -> ℹ️ Commands vary slightly by OS — see the template’s `README.md` for exact syntax. +> ℹ️ Commands vary slightly by OS — see the template's `README.md` for exact syntax. ### 5. Verify in MAAS @@ -107,7 +107,7 @@ Then deploy a test machine with the new image and confirm you can log in. ### Ubuntu example (quick start) -Here’s a complete example for building and uploading a custom Ubuntu image. +Here's a complete example for building and uploading a custom Ubuntu image. 1. **Install dependencies** @@ -167,12 +167,12 @@ ssh ubuntu@ Every OS template can be adjusted to include your own configuration. Common options include: -- Adding extra packages in the provisioner step -- Including custom cloud-init or preseed files -- Adjusting the Packer `boot_command` to change installation behavior -- Changing image names (`name`, `title`) to avoid cache conflicts +- Adding extra packages in the provisioner step +- Including custom cloud-init or preseed files +- Adjusting the Packer `boot_command` to change installation behavior +- Changing image names (`name`, `title`) to avoid cache conflicts -Refer to the `README.md` inside each OS directory for supported parameters. +Refer to the `README.md` inside each OS directory for supported parameters. --- @@ -187,6 +187,7 @@ Refer to the `README.md` inside each OS directory for supported parameters. | [AzureLinux 2.0](azurelinux/README.md) | Beta | x86_64 | >= 3.3 | | [CentOS 6](centos6/README.md) | EOL | x86_64 | >= 1.6 | | [CentOS 7](centos7/README.md) | EOL | x86_64 | >= 2.3 | +| [CentOS 7 UFM](centos7-ufm/README.md) | Beta | x86_64 | >= 2.3 | | [CentOS 8](centos8/README.md) | EOL | x86_64 | >= 2.7 | | [CentOS 8 Stream](centos8-stream/README.md) | Beta | x86_64 | >= 3.2 | | [CentOS 9 Stream](centos9-stream/README.md) | Beta | x86_64 | >= 3.2 | @@ -194,6 +195,8 @@ Refer to the `README.md` inside each OS directory for supported parameters. | [Debian 11](debian/README.md) | Beta | x86_64 / aarch64 | >= 3.2 | | [Debian 12](debian/README.md) | Beta | x86_64 / aarch64 | >= 3.2 | | [Debian 13](debian/README.md) | Beta | x86_64 / aarch64 | >= 3.2 | +| [DGX OS 5](dgxos5/README.md) | Beta | x86_64 | >= 3.0 | +| [DGX OS 7](dgxos7/README.md) | Beta | x86_64 | >= 3.0 | | [Fedora Server 41](fedora-server/README.md) | Beta | x86_64 / aarch64 | >= 3.2 | | [Fedora Server 42](fedora-server/README.md) | Beta | x86_64 / aarch64 | >= 3.2 | | [OL8](ol8/README.md) | Alpha | x86_64 | >= 3.5 | @@ -225,56 +228,56 @@ Refer to the `README.md` inside each OS directory for supported parameters. ## Maturity levels -- **Stable**: Tested and suitable for production. -- **Beta**: Works in most cases but needs broader validation. -- **Alpha**: Depends on unreleased MAAS or Curtin versions; not production-ready. -- **EOL**: Upstream OS is no longer supported — not recommended. +- **Stable**: Tested and suitable for production. +- **Beta**: Works in most cases but needs broader validation. +- **Alpha**: Depends on unreleased MAAS or Curtin versions; not production-ready. +- **EOL**: Upstream OS is no longer supported — not recommended. --- ## Debugging builds -- Use `PACKER_LOG=1` to enable verbose logging. -- Use `FOREGROUND=1` to keep processes in the foreground. +- Use `PACKER_LOG=1` to enable verbose logging. +- Use `FOREGROUND=1` to keep processes in the foreground. - To view the build VM: - - Remove `headless=true` in the template, or - - Connect via VNC using the IP/port printed during build. + - Remove `headless=true` in the template, or + - Connect via VNC using the IP/port printed during build. --- ## Best practices for uploading images -- Follow examples in each OS’s `README.md`. +- Follow examples in each OS's `README.md`. - The `name` parameter is formatted as `prefix/os-name`. The `os-name` can include dashes, dots and numbers but no space and special characters. -- Use **unique names** for images to avoid cache collisions. +- Use **unique names** for images to avoid cache collisions. - The `title` parameter is free text format as long as enclosed in quotes. -- Upload from a machine close to the MAAS region controller to reduce latency. -- Test images on a small scale before wide deployment. +- Upload from a machine close to the MAAS region controller to reduce latency. +- Test images on a small scale before wide deployment. --- ## Contributing new templates -We welcome contributions. +We welcome contributions. ### Project structure Each OS directory typically contains: -- One or more `.pkr.hcl` templates -- `scripts/` for provisioning -- `http/` for installer automation -- A `README.md` with OS-specific instructions -- A `Makefile` for build automation +- One or more `.pkr.hcl` templates +- `scripts/` for provisioning +- `http/` for installer automation +- A `README.md` with OS-specific instructions +- A `Makefile` for build automation ### Submit a new template -1. [Fork the repo](https://github.com/canonical/packer-maas/fork). -2. Create a branch. -3. Add a new directory or `.pkr.hcl` template. -4. Run `packer validate .` to check. -5. Commit and push. -6. Open a pull request. +1. [Fork the repo](https://github.com/canonical/packer-maas/fork). +2. Create a branch. +3. Add a new directory or `.pkr.hcl` template. +4. Run `packer validate .` to check. +5. Commit and push. +6. Open a pull request. --- ## Next steps -- [How to manage images in MAAS](https://canonical.com/maas/docs/how-to-manage-images) +- [How to manage images in MAAS](https://canonical.com/maas/docs/how-to-manage-images) diff --git a/dgxos5/README.md b/dgxos5/README.md index 94c200f3..8c67df6d 100644 --- a/dgxos5/README.md +++ b/dgxos5/README.md @@ -1,3 +1,9 @@ +# DGX OS 5 Packer Template for MAAS + +> **Note:** For newer DGX systems (H100, H200, B200, B300) and newer DGX A100 systems running DGX OS 7 (Ubuntu 24.04), use the [dgxos7](../dgxos7/) template instead. This template is for legacy DGX-1, DGX-2, and older DGX A100 systems running DGX OS 5 (Ubuntu 20.04). + +## Prerequisites + Install dependencies (Ubuntu 18.04 and 20.04) ```sh diff --git a/dgxos7/.gitignore b/dgxos7/.gitignore new file mode 100644 index 00000000..726fb458 --- /dev/null +++ b/dgxos7/.gitignore @@ -0,0 +1 @@ +claude.md diff --git a/dgxos7/Makefile b/dgxos7/Makefile new file mode 100644 index 00000000..7530f2c3 --- /dev/null +++ b/dgxos7/Makefile @@ -0,0 +1,36 @@ +PACKER ?= packer +ISO ?= ${DGXOS7_ISO_PATH} +CHECKSUM ?= ${DGXOS7_SHA256SUM} +PLATFORM ?= dgx_h100 + +.PHONY: all clean check-iso + +all: dgxos7.tar.gz + +dgxos7.tar.gz: check-iso clean + sudo PACKER_LOG=1 ${PACKER} build \ + -var "dgxos7_iso=${ISO}" \ + -var "dgxos7_sha256sum=${CHECKSUM}" \ + -var "platform=${PLATFORM}" \ + dgxos7.json + reset + +check-iso: + @if [ -z "${ISO}" ]; then \ + echo "Error: DGXOS7_ISO_PATH environment variable not set"; \ + echo "Please set DGXOS7_ISO_PATH to the path of your DGX OS 7 ISO"; \ + exit 1; \ + fi + @if [ ! -f "${ISO}" ]; then \ + echo "Error: ISO file not found at ${ISO}"; \ + exit 1; \ + fi + @if [ -z "${CHECKSUM}" ]; then \ + echo "Error: DGXOS7_SHA256SUM environment variable not set"; \ + echo "Please set DGXOS7_SHA256SUM to the SHA256 checksum of your ISO"; \ + echo "You can generate it with: sha256sum ${ISO}"; \ + exit 1; \ + fi + +clean: + sudo ${RM} -rf output-qemu dgxos7.tar.gz diff --git a/dgxos7/README.md b/dgxos7/README.md new file mode 100644 index 00000000..3fd720cb --- /dev/null +++ b/dgxos7/README.md @@ -0,0 +1,162 @@ +# DGX OS 7 Packer Template for MAAS + +## Overview + +This directory contains a Packer template for building NVIDIA DGX OS 7 images deployable via MAAS (Metal as a Service). DGX OS 7 is based on Ubuntu 24.04 LTS with kernel 6.8 and includes NVIDIA-optimized configurations for the latest DGX hardware platforms. + +## Supported Platforms + +| Platform | Hardware | DGX OS 7 Support | +|-------------|----------------|------------------| +| dgx_h100 | DGX H100 | ✓ | +| dgx_h200 | DGX H200 | ✓ | +| dgx_b200 | DGX B200 | ✓ | +| dgx_b300 | DGX B300 | ✓ | +| dgx_a100 | DGX A100 | ✓ | + +**Note:** For older DGX-1, DGX-2, or legacy DGX A100 systems running DGX OS 5, use the [dgxos5](../dgxos5/) template instead. + +## Prerequisites + +### Software Requirements + +- **Packer** 1.7.0 or later +- **QEMU/KVM** with UEFI support (OVMF) +- **Linux build environment** (Ubuntu 22.04+ recommended) +- **sudo access** (required for NBD device operations) +- **MAAS** 3.0 or later (for deployment) + +### DGX OS 7 ISO + +DGX OS 7 ISOs are available to customers with an NVIDIA Enterprise Support account: + +1. Log in to the [NVIDIA Enterprise Support Portal](https://nvid.nvidia.com/) +2. Navigate to: **Software Downloads** → **DGX OS** +3. Download the latest DGX OS 7.x ISO +4. Verify the ISO integrity with the provided SHA256 checksum + +## Building the Image + +### Using Make (Recommended) + +```bash +export DGXOS7_ISO_PATH=/path/to/DGXOS-7.x.x.iso +export DGXOS7_SHA256SUM=$(sha256sum $DGXOS7_ISO_PATH | cut -d' ' -f1) + +# Build (uses kvm platform for VM builds) +sudo make +``` + +### Direct Packer Command + +```bash +export DGXOS7_ISO_PATH=/path/to/DGXOS-7.x.x.iso +export DGXOS7_SHA256SUM=$(sha256sum $DGXOS7_ISO_PATH | cut -d' ' -f1) + +sudo PACKER_LOG=1 packer build dgxos7.json +``` + +### Build Output + +After a successful build (~60 minutes), you will have: +- **dgxos7.tar.gz** - Deployable MAAS image (~1.3GB) + +## Uploading to MAAS + +```bash +PROFILE=admin + +maas $PROFILE boot-resources create \ + name='custom/dgxos7' \ + title='NVIDIA DGX OS 7' \ + architecture='amd64/generic' \ + filetype='tgz' \ + base_image='ubuntu/noble' \ + content@=dgxos7.tar.gz +``` + +## Deployment + +### UEFI Boot Requirement + +DGX OS 7 **requires UEFI boot**. Ensure your MAAS machines are configured for UEFI in the machine's Configuration → Firmware settings. + +### Deploy via CLI + +```bash +maas $PROFILE machine deploy \ + osystem='custom' \ + distro_series='dgxos7' +``` + +## How It Works + +This template uses the DGX OS 7 installer's native autoinstall mechanism: + +1. **Boot**: GRUB is edited to add `force-ai=http://...` parameter pointing to Packer's HTTP server +2. **Install**: The DGX installer's `preseed.sh` downloads our custom autoinstall config (`http/packer-ai.yaml`) +3. **Configure**: Our config sets up partitions, creates an `ubuntu` user, enables SSH +4. **Complete**: After install, Packer connects via SSH, shuts down the VM, and creates the MAAS tarball + +### Key Boot Parameters + +- `force-ai=http://...` - Points to custom autoinstall config +- `force-platform=kvm` - Uses VM-optimized configuration +- `force-bootdisk=vda` - Specifies boot disk +- `no-mlnx-fw-update` - Skips Mellanox firmware updates (not needed in VM) +- `ip=dhcp` - Enables networking for HTTP config fetch + +## Customization + +### Modifying the Autoinstall Config + +Edit `http/packer-ai.yaml` to customize: +- Partition layout +- User accounts and passwords +- SSH configuration +- Late-command scripts + +### Adjusting VM Resources + +Edit `dgxos7.json`: +- **Disk size**: `disk_size` (default: 12G) +- **Memory**: `memory` (default: 4096 MB) +- **CPUs**: `-smp` in `qemuargs` (default: 8) + +## Troubleshooting + +### SSH Connection Timeout + +If Packer times out waiting for SSH: +1. Set `"headless": false` in dgxos7.json to enable VNC +2. Connect to VNC to observe the installation +3. Check that the autoinstall config was downloaded (look for preseed.sh output) + +### YAML Parse Errors + +If you see YAML errors in the installer: +1. Validate your config: `python3 -c "import yaml; yaml.safe_load(open('http/packer-ai.yaml'))"` +2. Ensure no CHANGE_* placeholders remain (we bypass preseed.sh's substitution) + +### Wrong Partition in Tarball + +The `scripts/tar-root` script must mount partition 2 (root) not partition 1 (EFI) for GPT layouts. This was fixed to auto-detect. + +## Technical Notes + +### DGX OS 7 vs DGX OS 5 + +| Feature | DGX OS 5 | DGX OS 7 | +|-------------------|-----------------|-----------------| +| Base OS | Ubuntu 20.04 | Ubuntu 24.04 | +| Installer | Curtin | Subiquity | +| Config delivery | force-curtin= | force-ai= | +| Kernel | 5.4.x | 6.8.x | + +### Why force-ai Instead of Standard Autoinstall + +DGX OS 7 includes an embedded autoinstall config that takes precedence over standard cloud-init datasources. The `force-ai` parameter tells the DGX installer's `preseed.sh` to download and use our custom config instead. + +## License + +This Packer template follows the same license as the parent packer-maas repository. diff --git a/dgxos7/dgxos7.json b/dgxos7/dgxos7.json new file mode 100644 index 00000000..461eb026 --- /dev/null +++ b/dgxos7/dgxos7.json @@ -0,0 +1,47 @@ +{ + "variables": + { + "platform": "dgx_h100", + "dgxos7_iso": "{{env `DGXOS7_ISO_PATH`}}", + "dgxos7_sha256sum": "{{env `DGXOS7_SHA256SUM`}}" + }, + "builders": [ + { + "type": "qemu", + "communicator": "ssh", + "ssh_username": "ubuntu", + "ssh_password": "ubuntu", + "ssh_timeout": "90m", + "iso_url": "file:{{user `dgxos7_iso`}}", + "iso_checksum": "sha256:{{user `dgxos7_sha256sum`}}", + "http_directory": "http", + "boot_command": [ + "e", + " force-ai=http://{{ .HTTPIP }}:{{ .HTTPPort }}/packer-ai.yaml force-platform=kvm force-bootdisk=vda no-mlnx-fw-update ip=dhcp", + "" + ], + "boot_wait": "3s", + "disk_size": "12G", + "headless": true, + "vnc_bind_address": "0.0.0.0", + "vnc_use_password": false, + "memory": 4096, + "qemuargs": [ + [ "-serial", "stdio" ], + [ "-bios", "/usr/share/ovmf/OVMF.fd" ], + [ "-smp", "8"] + ] + } + ], + "post-processors": [ + { + "type": "shell-local", + "inline_shebang": "/bin/bash -e", + "inline": [ + "source ../scripts/setup-nbd", + "OUTPUT='dgxos7.tar.gz'", + "source ../scripts/tar-root" + ] + } + ] +} diff --git a/dgxos7/http/packer-ai.yaml b/dgxos7/http/packer-ai.yaml new file mode 100644 index 00000000..ec8519d4 --- /dev/null +++ b/dgxos7/http/packer-ai.yaml @@ -0,0 +1,67 @@ +version: 1 + +locale: en_US.UTF-8 + +storage: + swap: + size: 0 + grub: + reorder_uefi: false + config: + - id: vda + type: disk + ptable: gpt + path: /dev/vda + name: osdisk + wipe: superblock-recursive + - id: vda-part1 + type: partition + device: vda + number: 1 + size: 512MB + flag: boot + grub_device: true + - id: vda-part2 + type: partition + device: vda + number: 2 + size: -1 + - id: vda-part1-fs1 + type: format + fstype: fat32 + label: efi + volume: vda-part1 + - id: vda-part2-fs1 + type: format + fstype: ext4 + label: root + volume: vda-part2 + - id: vda-mount2 + type: mount + path: / + device: vda-part2-fs1 + options: errors=remount-ro + - id: boot-mount1 + type: mount + path: /boot/efi + device: vda-part1-fs1 + +identity: + realname: Ubuntu User + username: ubuntu + hostname: dgxos + password: "$6$.wrtx7qik5WIZFX3$sycY/k94LlOlX/v/VVVVo6HiBecSx5GM/8j1xR91cLFx.hiEEmsPlFnssKF0a9tEAHIv2YN/QTz1Yq.z8G1ST." + +ssh: + install-server: true + allow-pw: true + +late-commands: + - sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication yes/' /target/etc/ssh/sshd_config + - sed -i 's/^#*PermitRootLogin.*/PermitRootLogin yes/' /target/etc/ssh/sshd_config + - curtin in-target -- systemctl enable ssh + - curtin in-target -- systemctl set-default multi-user.target + - echo 'ubuntu ALL=(ALL) NOPASSWD:ALL' > /target/etc/sudoers.d/ubuntu + - chmod 440 /target/etc/sudoers.d/ubuntu + - curtin in-target -- systemctl disable oem-config.target oem-config.service || true + - curtin in-target -- systemctl mask oem-config.target oem-config.service || true From 5cb4af159df329f98d635b58fa41d017c417b573 Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Fri, 30 Jan 2026 12:42:41 -0700 Subject: [PATCH 02/13] Enable cloud-init and configure network for MAAS deployment - Remove cloud-init.disabled marker so cloud-init runs on deployment - Disable cloud-init network config so MAAS can manage networking - Add fallback netplan DHCP config for packer build phase Co-Authored-By: Claude Opus 4.5 --- dgxos7/http/packer-ai.yaml | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/dgxos7/http/packer-ai.yaml b/dgxos7/http/packer-ai.yaml index ec8519d4..90ca706c 100644 --- a/dgxos7/http/packer-ai.yaml +++ b/dgxos7/http/packer-ai.yaml @@ -65,3 +65,19 @@ late-commands: - chmod 440 /target/etc/sudoers.d/ubuntu - curtin in-target -- systemctl disable oem-config.target oem-config.service || true - curtin in-target -- systemctl mask oem-config.target oem-config.service || true + - rm -f /target/etc/cloud/cloud-init.disabled + - | + cat > /target/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg < /target/etc/netplan/00-dhcp.yaml < Date: Fri, 30 Jan 2026 13:36:09 -0700 Subject: [PATCH 03/13] Use autoinstall network section instead of late-commands hacks - Add proper network section with DHCP for all en* interfaces - Remove cloud-init network disable (let cloud-init work normally) - Remove late-commands netplan hack (was virtio-only, broke Intel NICs) - Keep rm of cloud-init.disabled marker so cloud-init runs on MAAS deploy Co-Authored-By: Claude Opus 4.5 --- dgxos7/http/packer-ai.yaml | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/dgxos7/http/packer-ai.yaml b/dgxos7/http/packer-ai.yaml index 90ca706c..9c32f6c3 100644 --- a/dgxos7/http/packer-ai.yaml +++ b/dgxos7/http/packer-ai.yaml @@ -2,6 +2,14 @@ version: 1 locale: en_US.UTF-8 +network: + version: 2 + ethernets: + all-en: + match: + name: en* + dhcp4: true + storage: swap: size: 0 @@ -66,18 +74,3 @@ late-commands: - curtin in-target -- systemctl disable oem-config.target oem-config.service || true - curtin in-target -- systemctl mask oem-config.target oem-config.service || true - rm -f /target/etc/cloud/cloud-init.disabled - - | - cat > /target/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg < /target/etc/netplan/00-dhcp.yaml < Date: Fri, 30 Jan 2026 14:18:17 -0700 Subject: [PATCH 04/13] Remove 99-installer.cfg that recreates cloud-init.disabled marker The DGX installer's 99-installer.cfg has a deferred write_files that recreates /etc/cloud/cloud-init.disabled after first boot, which breaks MAAS deployment. Delete this config file in late-commands. Co-Authored-By: Claude Opus 4.5 --- dgxos7/http/packer-ai.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/dgxos7/http/packer-ai.yaml b/dgxos7/http/packer-ai.yaml index 9c32f6c3..7b999e3f 100644 --- a/dgxos7/http/packer-ai.yaml +++ b/dgxos7/http/packer-ai.yaml @@ -74,3 +74,4 @@ late-commands: - curtin in-target -- systemctl disable oem-config.target oem-config.service || true - curtin in-target -- systemctl mask oem-config.target oem-config.service || true - rm -f /target/etc/cloud/cloud-init.disabled + - rm -f /target/etc/cloud/cloud.cfg.d/99-installer.cfg From 4424004e0cddb7e6e3ffca587d6ed45a4e2e4e83 Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Fri, 30 Jan 2026 16:42:43 -0700 Subject: [PATCH 05/13] Fix cloud-init for MAAS deployment - Add provisioner to remove cloud-init.disabled and 99-installer.cfg after first boot but before tarball creation - These files were causing cloud-init to be disabled on MAAS deploy - Add sync command to ensure filesystem changes persist to disk - Remove redundant late-command (now handled by provisioner) Cloud-init now properly starts on MAAS-deployed systems. Co-Authored-By: Claude Opus 4.5 --- dgxos7/dgxos7.json | 18 ++++++++++++++++++ dgxos7/http/packer-ai.yaml | 1 - 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/dgxos7/dgxos7.json b/dgxos7/dgxos7.json index 461eb026..ede5a345 100644 --- a/dgxos7/dgxos7.json +++ b/dgxos7/dgxos7.json @@ -33,6 +33,24 @@ ] } ], + "provisioners": [ + { + "type": "shell", + "inline": [ + "echo 'Removing cloud-init.disabled...'", + "sudo rm -fv /etc/cloud/cloud-init.disabled || true", + "echo 'Removing 99-installer.cfg...'", + "sudo rm -fv /etc/cloud/cloud.cfg.d/99-installer.cfg || true", + "echo 'Removing 99-installer clean script...'", + "sudo rm -fv /etc/cloud/clean.d/99-installer || true", + "echo 'Listing remaining files...'", + "ls -la /etc/cloud/cloud.cfg.d/ || true", + "ls -la /etc/cloud/clean.d/ || true", + "echo 'Syncing filesystem...'", + "sync" + ] + } + ], "post-processors": [ { "type": "shell-local", diff --git a/dgxos7/http/packer-ai.yaml b/dgxos7/http/packer-ai.yaml index 7b999e3f..9c32f6c3 100644 --- a/dgxos7/http/packer-ai.yaml +++ b/dgxos7/http/packer-ai.yaml @@ -74,4 +74,3 @@ late-commands: - curtin in-target -- systemctl disable oem-config.target oem-config.service || true - curtin in-target -- systemctl mask oem-config.target oem-config.service || true - rm -f /target/etc/cloud/cloud-init.disabled - - rm -f /target/etc/cloud/cloud.cfg.d/99-installer.cfg From e890d2299baae8b33aa6921ce5b4c2d24c310e34 Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Fri, 30 Jan 2026 17:10:11 -0700 Subject: [PATCH 06/13] Fix DGX package installation - use platform variable and add CHANGE_INSTALL_PKGS The build was only installing base Ubuntu because: 1. force-platform was hardcoded to 'kvm' instead of using the platform variable 2. packer-ai.yaml was missing the CHANGE_INSTALL_PKGS placeholder Changes: - dgxos7.json: Use {{user `platform`}} (defaults to dgx_h100) instead of hardcoded kvm - packer-ai.yaml: Add apt config to disable Ubuntu suites during install - packer-ai.yaml: Add CHANGE_INSTALL_PKGS late-command for DGX package installation - packer-ai.yaml: Mount ISO cdrom for package installation - packer-ai.yaml: Restore apt sources.list.d after install - packer-ai.yaml: Enable nvidia-persistenced service The preseed.sh script substitutes CHANGE_INSTALL_PKGS with packages from /ai/${platform}-pkgs based on the force-platform setting. Co-Authored-By: Claude Opus 4.5 --- dgxos7/dgxos7.json | 2 +- dgxos7/http/packer-ai.yaml | 48 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+), 1 deletion(-) diff --git a/dgxos7/dgxos7.json b/dgxos7/dgxos7.json index ede5a345..8af1527e 100644 --- a/dgxos7/dgxos7.json +++ b/dgxos7/dgxos7.json @@ -17,7 +17,7 @@ "http_directory": "http", "boot_command": [ "e", - " force-ai=http://{{ .HTTPIP }}:{{ .HTTPPort }}/packer-ai.yaml force-platform=kvm force-bootdisk=vda no-mlnx-fw-update ip=dhcp", + " force-ai=http://{{ .HTTPIP }}:{{ .HTTPPort }}/packer-ai.yaml force-platform={{user `platform`}} force-bootdisk=vda no-mlnx-fw-update ip=dhcp", "" ], "boot_wait": "3s", diff --git a/dgxos7/http/packer-ai.yaml b/dgxos7/http/packer-ai.yaml index 9c32f6c3..7f0fd66c 100644 --- a/dgxos7/http/packer-ai.yaml +++ b/dgxos7/http/packer-ai.yaml @@ -2,6 +2,15 @@ version: 1 locale: en_US.UTF-8 +apt: + preserve_sources_list: false + conf: | + Dpkg::Options { + "--force-confdef"; + "--force-confold"; + }; + disable_suites: [release, updates, backports, security, proposed] + network: version: 2 ethernets: @@ -65,12 +74,51 @@ ssh: allow-pw: true late-commands: + # Mount ISO for package installation + - mkdir -p /target/tmp/cdrom + - mount --bind /cdrom /target/tmp/cdrom || true + - | + cat << EOF > /target/etc/apt/sources.list + deb [check-date=no] file:///tmp/cdrom/ noble main restricted + EOF + + # Install DGX packages (CHANGE_INSTALL_PKGS substituted by preseed.sh) + - touch /target/tmp/ota_skip_write + - curtin in-target -- apt-get update -y + - curtin in-target -- /bin/bash -c "DEBIAN_FRONTEND=noninteractive RUN_FW_UPDATER=no apt-get install -y --no-install-recommends CHANGE_INSTALL_PKGS" + - curtin in-target -- apt-get purge -y unattended-upgrades || true + + # Restore apt sources for post-install + - echo "# Ubuntu sources have moved to /etc/apt/sources.list.d/ubuntu.sources" > /target/etc/apt/sources.list + - | + cat << EOF > /target/etc/apt/sources.list.d/ubuntu.sources + Types: deb + URIs: http://us.archive.ubuntu.com/ubuntu/ + Suites: noble noble-updates noble-backports + Components: main restricted universe multiverse + Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg + + Types: deb + URIs: http://security.ubuntu.com/ubuntu/ + Suites: noble-security + Components: main restricted universe multiverse + Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg + EOF + + # SSH configuration for packer - sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication yes/' /target/etc/ssh/sshd_config - sed -i 's/^#*PermitRootLogin.*/PermitRootLogin yes/' /target/etc/ssh/sshd_config - curtin in-target -- systemctl enable ssh - curtin in-target -- systemctl set-default multi-user.target - echo 'ubuntu ALL=(ALL) NOPASSWD:ALL' > /target/etc/sudoers.d/ubuntu - chmod 440 /target/etc/sudoers.d/ubuntu + + # Disable oem-config - curtin in-target -- systemctl disable oem-config.target oem-config.service || true - curtin in-target -- systemctl mask oem-config.target oem-config.service || true + + # Enable NVIDIA services + - curtin in-target -- systemctl enable nvidia-persistenced.service || true + + # Remove cloud-init.disabled (packer provisioner also does this as backup) - rm -f /target/etc/cloud/cloud-init.disabled From 8a56b88b4f6ac91ab0f1b3b678f0d12758cacf00 Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Tue, 3 Feb 2026 09:33:28 -0700 Subject: [PATCH 07/13] Fix GRUB boot_command to correctly append kernel parameters The boot_command was appending parameters to the wrong line in the GRUB editor. With only one after 'e', parameters landed on a blank line instead of the linux command line. Changes: - Fix boot_command: use two presses after 'e' to reach linux line - Add nooemconfig parameter to skip OEM config packages - Use hardcoded 10.0.2.2 for QEMU user-mode networking - Add early-commands to replace nv-eula.sh with no-op - Add late-commands for dpkg-divert EULA bypass - Use hardcoded package list excluding nvidia-oem-config-eula - Update .gitignore to exclude Claude Code files and build artifacts - Update README with correct boot parameters Co-Authored-By: Claude Opus 4.5 --- dgxos7/.gitignore | 14 ++++++++++++++ dgxos7/README.md | 3 ++- dgxos7/dgxos7.json | 2 +- dgxos7/http/packer-ai.yaml | 31 +++++++++++++++++++++++++++++-- 4 files changed, 46 insertions(+), 4 deletions(-) diff --git a/dgxos7/.gitignore b/dgxos7/.gitignore index 726fb458..adea8e09 100644 --- a/dgxos7/.gitignore +++ b/dgxos7/.gitignore @@ -1 +1,15 @@ +# Claude Code +.claude/ claude.md +CLAUDE.md +tasks/ + +# Build artifacts +*.tar.gz +output-qemu/ + +# macOS +.DS_Store + +# Test files +*-test.json diff --git a/dgxos7/README.md b/dgxos7/README.md index 3fd720cb..ceb42ffc 100644 --- a/dgxos7/README.md +++ b/dgxos7/README.md @@ -101,8 +101,9 @@ This template uses the DGX OS 7 installer's native autoinstall mechanism: ### Key Boot Parameters - `force-ai=http://...` - Points to custom autoinstall config -- `force-platform=kvm` - Uses VM-optimized configuration +- `force-platform=dgx_h100` - Platform type (can be changed via `platform` variable) - `force-bootdisk=vda` - Specifies boot disk +- `nooemconfig` - Disables OEM config packages that require interactive EULA acceptance - `no-mlnx-fw-update` - Skips Mellanox firmware updates (not needed in VM) - `ip=dhcp` - Enables networking for HTTP config fetch diff --git a/dgxos7/dgxos7.json b/dgxos7/dgxos7.json index 8af1527e..76237c5a 100644 --- a/dgxos7/dgxos7.json +++ b/dgxos7/dgxos7.json @@ -17,7 +17,7 @@ "http_directory": "http", "boot_command": [ "e", - " force-ai=http://{{ .HTTPIP }}:{{ .HTTPPort }}/packer-ai.yaml force-platform={{user `platform`}} force-bootdisk=vda no-mlnx-fw-update ip=dhcp", + " force-ai=http://10.0.2.2:{{ .HTTPPort }}/packer-ai.yaml force-platform={{user `platform`}} force-bootdisk=vda nooemconfig no-mlnx-fw-update ip=dhcp", "" ], "boot_wait": "3s", diff --git a/dgxos7/http/packer-ai.yaml b/dgxos7/http/packer-ai.yaml index 7f0fd66c..514f4fe2 100644 --- a/dgxos7/http/packer-ai.yaml +++ b/dgxos7/http/packer-ai.yaml @@ -73,6 +73,19 @@ ssh: install-server: true allow-pw: true +early-commands: + # APPROACH: Replace nv-eula.sh with no-op in live environment before it runs + # The live system uses overlayfs, so we can modify "read-only" squashfs files + - | + if [ -f /usr/share/nvidia/nv-eula.sh ]; then + echo '#!/bin/sh' > /usr/share/nvidia/nv-eula.sh + echo 'exit 0' >> /usr/share/nvidia/nv-eula.sh + chmod +x /usr/share/nvidia/nv-eula.sh + echo "Replaced nv-eula.sh with no-op" + fi + # Backup: pre-seed debconf (in case replacement fails) + - echo 'nvidia nvidia/accepted-eula boolean true' | debconf-set-selections || true + late-commands: # Mount ISO for package installation - mkdir -p /target/tmp/cdrom @@ -82,10 +95,24 @@ late-commands: deb [check-date=no] file:///tmp/cdrom/ noble main restricted EOF - # Install DGX packages (CHANGE_INSTALL_PKGS substituted by preseed.sh) + # Pre-seed NVIDIA EULA in target before package installation + - curtin in-target -- /bin/bash -c "echo 'nvidia nvidia/accepted-eula boolean true' | debconf-set-selections" + + # Divert nv-eula.sh and replace with no-op BEFORE any package installation + - curtin in-target -- mkdir -p /usr/share/nvidia + - curtin in-target -- dpkg-divert --add --rename --divert /usr/share/nvidia/nv-eula.sh.real /usr/share/nvidia/nv-eula.sh || true + - | + cat << 'EOFSCRIPT' > /target/usr/share/nvidia/nv-eula.sh + #!/bin/sh + # No-op replacement - EULA auto-accepted for automated install + exit 0 + EOFSCRIPT + - chmod +x /target/usr/share/nvidia/nv-eula.sh + + # Install DGX packages - hardcoded list excluding nvidia-oem-config-* to avoid EULA prompt - touch /target/tmp/ota_skip_write - curtin in-target -- apt-get update -y - - curtin in-target -- /bin/bash -c "DEBIAN_FRONTEND=noninteractive RUN_FW_UPDATER=no apt-get install -y --no-install-recommends CHANGE_INSTALL_PKGS" + - curtin in-target -- /bin/bash -c "DEBIAN_FRONTEND=noninteractive RUN_FW_UPDATER=no apt-get install -y --no-install-recommends dgx-server-grub cuda-nvml-dev-12-8 nvidia-driver-570-open libnvidia-nscq-570 nvidia-fabricmanager-570 nvidia-persistenced nvidia-peermem-loader nvdebug nvfwupd nvsm nvidia-acs-disable datacenter-gpu-manager-4-cuda12 nvidia-mig-manager nvidia-system-core nvidia-system-utils nvidia-system-extra nvidia-grub-params nvidia-fs nvidia-fs-dkms nvidia-conf-cachefilesd nvidia-ipmisol nvidia-pci-bridge-power nvidia-modprobe dpkg-dev nvidia-fs-loader" - curtin in-target -- apt-get purge -y unattended-upgrades || true # Restore apt sources for post-install From c3307ef3a7ff9ff139b8ec86c3cf33fba95e906c Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Tue, 3 Feb 2026 09:34:53 -0700 Subject: [PATCH 08/13] Update README with correct platform and image size --- dgxos7/README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/dgxos7/README.md b/dgxos7/README.md index ceb42ffc..80067b10 100644 --- a/dgxos7/README.md +++ b/dgxos7/README.md @@ -43,7 +43,7 @@ DGX OS 7 ISOs are available to customers with an NVIDIA Enterprise Support accou export DGXOS7_ISO_PATH=/path/to/DGXOS-7.x.x.iso export DGXOS7_SHA256SUM=$(sha256sum $DGXOS7_ISO_PATH | cut -d' ' -f1) -# Build (uses kvm platform for VM builds) +# Build (defaults to dgx_h100 platform) sudo make ``` @@ -59,7 +59,7 @@ sudo PACKER_LOG=1 packer build dgxos7.json ### Build Output After a successful build (~60 minutes), you will have: -- **dgxos7.tar.gz** - Deployable MAAS image (~1.3GB) +- **dgxos7.tar.gz** - Deployable MAAS image (~2.5GB) ## Uploading to MAAS @@ -93,10 +93,10 @@ maas $PROFILE machine deploy \ This template uses the DGX OS 7 installer's native autoinstall mechanism: -1. **Boot**: GRUB is edited to add `force-ai=http://...` parameter pointing to Packer's HTTP server -2. **Install**: The DGX installer's `preseed.sh` downloads our custom autoinstall config (`http/packer-ai.yaml`) -3. **Configure**: Our config sets up partitions, creates an `ubuntu` user, enables SSH -4. **Complete**: After install, Packer connects via SSH, shuts down the VM, and creates the MAAS tarball +1. **Boot**: GRUB is edited to add kernel parameters (`force-ai`, `force-platform`, `nooemconfig`, etc.) +2. **Download**: The DGX installer's `preseed.sh` downloads our custom autoinstall config (`http/packer-ai.yaml`) +3. **Install**: Our config sets up partitions, installs NVIDIA packages (with EULA bypassed), creates an `ubuntu` user, enables SSH +4. **Complete**: After install, Packer connects via SSH, removes cloud-init.disabled, shuts down the VM, and creates the MAAS tarball ### Key Boot Parameters From 8644794765c5f8fba0b200a7bd40254556fa732d Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Tue, 3 Feb 2026 13:03:17 -0700 Subject: [PATCH 09/13] Fix cloud-init state for MAAS deployment Run cloud-init clean --logs to reset cloud-init state before creating the tarball. Without this, cloud-init thinks it already ran during the packer build and won't re-run on MAAS deployment. Also remove 90-installer-network.cfg and /etc/fstab to let MAAS control networking and storage configuration. Co-Authored-By: Claude Opus 4.5 --- dgxos7/dgxos7.json | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/dgxos7/dgxos7.json b/dgxos7/dgxos7.json index 76237c5a..d26a0aa7 100644 --- a/dgxos7/dgxos7.json +++ b/dgxos7/dgxos7.json @@ -37,15 +37,18 @@ { "type": "shell", "inline": [ - "echo 'Removing cloud-init.disabled...'", + "echo 'Removing cloud-init disable files...'", "sudo rm -fv /etc/cloud/cloud-init.disabled || true", - "echo 'Removing 99-installer.cfg...'", "sudo rm -fv /etc/cloud/cloud.cfg.d/99-installer.cfg || true", - "echo 'Removing 99-installer clean script...'", "sudo rm -fv /etc/cloud/clean.d/99-installer || true", - "echo 'Listing remaining files...'", + "echo 'Removing installer-specific network config...'", + "sudo rm -fv /etc/cloud/cloud.cfg.d/90-installer-network.cfg || true", + "echo 'Resetting cloud-init state for MAAS deployment...'", + "sudo cloud-init clean --logs", + "echo 'Removing fstab to let MAAS configure storage...'", + "sudo rm -fv /etc/fstab || true", + "echo 'Listing remaining cloud-init config files...'", "ls -la /etc/cloud/cloud.cfg.d/ || true", - "ls -la /etc/cloud/clean.d/ || true", "echo 'Syncing filesystem...'", "sync" ] From c558ce6d89c99cd19e4f8f463ca5c1e7a13c29fd Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Tue, 3 Feb 2026 15:47:43 -0700 Subject: [PATCH 10/13] Add curtin hooks and fix cloud-init for MAAS deployment - Add curtin-hooks and setup-bootloader scripts for proper UEFI boot - Fix provisioner order: run cloud-init clean before removing disable files - Add first-boot marker for dgx-release service - Use fuse-tar-root instead of tar-root for image creation - Update README with build dependencies and clarify tarball extraction Co-Authored-By: Claude Opus 4.5 --- dgxos7/README.md | 17 +++-- dgxos7/dgxos7.json | 32 ++++++--- dgxos7/scripts/curtin-hooks | 111 ++++++++++++++++++++++++++++++++ dgxos7/scripts/setup-bootloader | 30 +++++++++ 4 files changed, 177 insertions(+), 13 deletions(-) create mode 100755 dgxos7/scripts/curtin-hooks create mode 100755 dgxos7/scripts/setup-bootloader diff --git a/dgxos7/README.md b/dgxos7/README.md index 80067b10..5de669cf 100644 --- a/dgxos7/README.md +++ b/dgxos7/README.md @@ -22,10 +22,19 @@ This directory contains a Packer template for building NVIDIA DGX OS 7 images de - **Packer** 1.7.0 or later - **QEMU/KVM** with UEFI support (OVMF) -- **Linux build environment** (Ubuntu 22.04+ recommended) -- **sudo access** (required for NBD device operations) +- **Linux build environment** (Debian/Ubuntu recommended) - **MAAS** 3.0 or later (for deployment) +### Build Dependencies + +Install the required packages for image creation: + +```bash +# Debian/Ubuntu +apt-get install -y qemu-system-x86 qemu-utils ovmf packer \ + nbdkit libnbd-bin fuse2fs +``` + ### DGX OS 7 ISO DGX OS 7 ISOs are available to customers with an NVIDIA Enterprise Support account: @@ -96,7 +105,7 @@ This template uses the DGX OS 7 installer's native autoinstall mechanism: 1. **Boot**: GRUB is edited to add kernel parameters (`force-ai`, `force-platform`, `nooemconfig`, etc.) 2. **Download**: The DGX installer's `preseed.sh` downloads our custom autoinstall config (`http/packer-ai.yaml`) 3. **Install**: Our config sets up partitions, installs NVIDIA packages (with EULA bypassed), creates an `ubuntu` user, enables SSH -4. **Complete**: After install, Packer connects via SSH, removes cloud-init.disabled, shuts down the VM, and creates the MAAS tarball +4. **Complete**: After install, Packer connects via SSH, installs curtin hooks, resets cloud-init state for MAAS, and creates the tarball ### Key Boot Parameters @@ -141,7 +150,7 @@ If you see YAML errors in the installer: ### Wrong Partition in Tarball -The `scripts/tar-root` script must mount partition 2 (root) not partition 1 (EFI) for GPT layouts. This was fixed to auto-detect. +For GPT layouts, root is on partition 2 (partition 1 is EFI). The `ROOT_PARTITION=2` variable in the post-processor ensures the correct partition is extracted. ## Technical Notes diff --git a/dgxos7/dgxos7.json b/dgxos7/dgxos7.json index d26a0aa7..69d9936a 100644 --- a/dgxos7/dgxos7.json +++ b/dgxos7/dgxos7.json @@ -34,21 +34,34 @@ } ], "provisioners": [ + { + "type": "file", + "sources": ["scripts/curtin-hooks", "scripts/setup-bootloader"], + "destination": "/tmp/" + }, { "type": "shell", "inline": [ - "echo 'Removing cloud-init disable files...'", + "echo 'Installing curtin hooks for MAAS deployment...'", + "sudo mkdir -p /curtin", + "sudo mv /tmp/curtin-hooks /curtin/", + "sudo mv /tmp/setup-bootloader /curtin/", + "sudo chmod 750 /curtin/curtin-hooks /curtin/setup-bootloader", + "echo 'Resetting cloud-init state for MAAS deployment...'", + "sudo cloud-init clean --logs || true", + "echo 'Removing cloud-init disable files (after clean)...'", "sudo rm -fv /etc/cloud/cloud-init.disabled || true", "sudo rm -fv /etc/cloud/cloud.cfg.d/99-installer.cfg || true", - "sudo rm -fv /etc/cloud/clean.d/99-installer || true", - "echo 'Removing installer-specific network config...'", "sudo rm -fv /etc/cloud/cloud.cfg.d/90-installer-network.cfg || true", - "echo 'Resetting cloud-init state for MAAS deployment...'", - "sudo cloud-init clean --logs", + "sudo rm -fv /etc/cloud/clean.d/99-installer || true", "echo 'Removing fstab to let MAAS configure storage...'", "sudo rm -fv /etc/fstab || true", - "echo 'Listing remaining cloud-init config files...'", - "ls -la /etc/cloud/cloud.cfg.d/ || true", + "echo 'Verifying cloud-init is enabled...'", + "ls -la /etc/cloud/ | grep -v '^d' || true", + "echo 'Creating first-boot marker for dgx-release service...'", + "sudo touch /var/tmp/first-boot-dgx-release", + "echo 'Listing curtin hooks...'", + "ls -la /curtin/", "echo 'Syncing filesystem...'", "sync" ] @@ -59,9 +72,10 @@ "type": "shell-local", "inline_shebang": "/bin/bash -e", "inline": [ - "source ../scripts/setup-nbd", + "ROOT_PARTITION=2", "OUTPUT='dgxos7.tar.gz'", - "source ../scripts/tar-root" + "source ../scripts/fuse-nbd", + "source ../scripts/fuse-tar-root" ] } ] diff --git a/dgxos7/scripts/curtin-hooks b/dgxos7/scripts/curtin-hooks new file mode 100755 index 00000000..037e5817 --- /dev/null +++ b/dgxos7/scripts/curtin-hooks @@ -0,0 +1,111 @@ +#!/usr/bin/env python3 +# +# curtin-hooks - Curtin installation hooks for Ubuntu +# +# Based on canonical/packer-maas curtin-hooks script +# Modified for DGX OS 7 +# +# Copyright (C) 2021 Canonical +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as +# published by the Free Software Foundation, either version 3 of the +# License, or (at your option) any later version. + +import os +import platform +import shutil +import sys + +from curtin import distro, util +from curtin.commands import apt_config, curthooks +from curtin.config import load_command_config +from curtin.log import DEBUG, LOG, basicConfig +from curtin.paths import target_path +from curtin.util import ChrootableTarget, load_command_environment + + +def run_hook_in_target(target, hook): + """Look for "hook" in "target" and run in a chroot""" + target_hook = target_path(target, "/curtin/" + hook) + if os.path.isfile(target_hook): + LOG.debug("running %s" % target_hook) + with ChrootableTarget(target=target) as in_chroot: + in_chroot.subp(["/curtin/" + hook]) + return True + return False + + +def curthook(cfg, target, state): + """Configure network and bootloader""" + LOG.info("Running curtin builtin curthooks") + state_etcd = os.path.split(state["fstab"])[0] + machine = platform.machine() + + distro_info = distro.get_distroinfo(target=target) + if not distro_info: + raise RuntimeError("Failed to determine target distro") + osfamily = distro_info.family + LOG.info( + "Configuring target system for distro: %s osfamily: %s", + distro_info.variant, + osfamily, + ) + + sources = cfg.get("sources", {}) + dd_image = len(util.get_dd_images(sources)) > 0 + + curthooks.disable_overlayroot(cfg, target) + curthooks.disable_update_initramfs(cfg, target, machine) + curthooks.install_missing_packages(cfg, target, osfamily=osfamily) + + if not dd_image: + curthooks.configure_iscsi(cfg, state_etcd, target, osfamily=osfamily) + curthooks.configure_mdadm(cfg, state_etcd, target, osfamily=osfamily) + curthooks.copy_fstab(state.get("fstab"), target) + curthooks.add_swap(cfg, target, state.get("fstab")) + + run_hook_in_target(target, "install-custom-packages") + + if not dd_image: + curthooks.setup_kernel_img_conf(target) + + crypttab_location = os.path.join(os.path.split(state["fstab"])[0], "crypttab") + if os.path.exists(crypttab_location): + curthooks.copy_crypttab(crypttab_location, target) + + udev_rules_d = os.path.join(state["scratch"], "rules.d") + if os.path.isdir(udev_rules_d): + curthooks.copy_dname_rules(udev_rules_d, target) + + apt_config.apply_debconf_selections(cfg, target) + + curthooks.apply_networking(target, state) + curthooks.handle_pollinate_user_agent(cfg, target) + + # re-enable update_initramfs + curthooks.enable_update_initramfs(cfg, target, machine) + curthooks.update_initramfs(target, all_kernels=True) + + run_hook_in_target(target, "setup-bootloader") + + +def cleanup(): + """Remove curtin-hooks so its as if we were never here.""" + curtin_dir = os.path.dirname(__file__) + shutil.rmtree(curtin_dir) + + +def main(): + state = load_command_environment() + config = load_command_config(None, state) + target = state["target"] + + basicConfig(stream=sys.stderr, verbosity=DEBUG) + + curthook(config, target, state) + cleanup() + + +if __name__ == "__main__": + main() diff --git a/dgxos7/scripts/setup-bootloader b/dgxos7/scripts/setup-bootloader new file mode 100755 index 00000000..ac224f1c --- /dev/null +++ b/dgxos7/scripts/setup-bootloader @@ -0,0 +1,30 @@ +#!/bin/bash -ex +# +# setup-bootloader - Install GRUB bootloader for UEFI systems +# +# Based on canonical/packer-maas setup-bootloader script +# Modified for DGX OS 7 + +export DEBIAN_FRONTEND=noninteractive + +ARCH=$(dpkg --print-architecture) + +# Reconfigure GRUB and install to EFI partition +dpkg-reconfigure grub-efi-${ARCH} || true +update-grub + +if [ "${ARCH}" == "amd64" ]; then + GRUB_TARGET="x86_64-efi" +else + GRUB_TARGET="arm64-efi" +fi + +grub-install \ + --target=${GRUB_TARGET} \ + --efi-directory=/boot/efi \ + --bootloader-id=ubuntu \ + --recheck + +update-initramfs -uk all + +efibootmgr -v || true From 2196baaac9ec71156a8a1d91c4de4a672697eb53 Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Tue, 3 Feb 2026 16:06:19 -0700 Subject: [PATCH 11/13] Add tar-root script for dgxos5 compatibility The tar-root script is needed by dgxos5 which uses the older setup-nbd/tar-root approach. Auto-detects GPT vs MBR layouts. Co-Authored-By: Claude Opus 4.5 --- dgxos7/README.md | 2 +- scripts/tar-root | 56 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 scripts/tar-root diff --git a/dgxos7/README.md b/dgxos7/README.md index 5de669cf..13f960c0 100644 --- a/dgxos7/README.md +++ b/dgxos7/README.md @@ -113,7 +113,7 @@ This template uses the DGX OS 7 installer's native autoinstall mechanism: - `force-platform=dgx_h100` - Platform type (can be changed via `platform` variable) - `force-bootdisk=vda` - Specifies boot disk - `nooemconfig` - Disables OEM config packages that require interactive EULA acceptance -- `no-mlnx-fw-update` - Skips Mellanox firmware updates (not needed in VM) +- `no-mlnx-fw-update` - Skips Mellanox firmware updates - `ip=dhcp` - Enables networking for HTTP config fetch ## Customization diff --git a/scripts/tar-root b/scripts/tar-root new file mode 100644 index 00000000..da0b891f --- /dev/null +++ b/scripts/tar-root @@ -0,0 +1,56 @@ +#!/bin/bash -e +# +# tar-root - Create a tar.gz from a binded /dev/nbd device +# +# Author: Lee Trager +# +# Copyright (C) 2020-2021 Canonical +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as +# published by the Free Software Foundation, either version 3 of the +# License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +if [ $UID -ne 0 ]; then + echo "ERROR: Must be run as root!" >&2 + exit 1 +fi + +TMP_DIR=$(mktemp -d /tmp/packer-maas-XXXX) + +echo 'Mounting root partition...' +# Use p2 for GPT layouts (EFI on p1, root on p2), p1 for MBR layouts +if [ -b "${nbd}p2" ]; then + mount "${nbd}p2" $TMP_DIR +else + mount "${nbd}p1" $TMP_DIR +fi + +if [ -d curtin ] || [ -d "$CURTIN_HOOKS" ]; then + echo 'Adding Curtin hooks...' + cp -r ${CURTIN_HOOKS:-curtin} $TMP_DIR +fi + +echo "Creating MAAS image $OUTPUT..." +tar -Sczpf $OUTPUT --acls --selinux --xattrs -C $TMP_DIR . + +if [ -n "$MANIFEST" ]; then + echo "Creating manifest..." + # RPM on CentOS/RHEL 7 needs /dev mounted so it can use /dev/urandom + mount -o bind /dev $TMP_DIR/dev + chroot $TMP_DIR rpm -qa | sort -u -o $MANIFEST + umount $TMP_DIR/dev +fi + +echo 'Unmounting image...' +umount $TMP_DIR +qemu-nbd -d $nbd +rmdir $TMP_DIR From c74344589f0a88ce8ab59526676f5581380db88f Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Wed, 4 Feb 2026 10:53:51 -0700 Subject: [PATCH 12/13] Use dynamic package list from preseed.sh instead of hardcoded packages - Replace hardcoded package list with CHANGE_INSTALL_PKGS placeholder - preseed.sh substitutes this with platform-specific packages at install time - Add fallback `|| apt-get install -f -y` to handle dependency issues - Add EULA notice to README This makes the template more maintainable as package updates come from the ISO rather than requiring manual updates to the template. Co-Authored-By: Claude Opus 4.5 --- dgxos7/README.md | 2 ++ dgxos7/http/packer-ai.yaml | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/dgxos7/README.md b/dgxos7/README.md index 13f960c0..a15786da 100644 --- a/dgxos7/README.md +++ b/dgxos7/README.md @@ -1,5 +1,7 @@ # DGX OS 7 Packer Template for MAAS +> **EULA Notice:** By building and deploying images with this template, you accept the NVIDIA DGX Software License Agreement. The build process automatically accepts the EULA on your behalf. + ## Overview This directory contains a Packer template for building NVIDIA DGX OS 7 images deployable via MAAS (Metal as a Service). DGX OS 7 is based on Ubuntu 24.04 LTS with kernel 6.8 and includes NVIDIA-optimized configurations for the latest DGX hardware platforms. diff --git a/dgxos7/http/packer-ai.yaml b/dgxos7/http/packer-ai.yaml index 514f4fe2..aa7a60ca 100644 --- a/dgxos7/http/packer-ai.yaml +++ b/dgxos7/http/packer-ai.yaml @@ -109,10 +109,10 @@ late-commands: EOFSCRIPT - chmod +x /target/usr/share/nvidia/nv-eula.sh - # Install DGX packages - hardcoded list excluding nvidia-oem-config-* to avoid EULA prompt + # Install DGX packages (CHANGE_INSTALL_PKGS substituted by preseed.sh with platform-specific packages) - touch /target/tmp/ota_skip_write - curtin in-target -- apt-get update -y - - curtin in-target -- /bin/bash -c "DEBIAN_FRONTEND=noninteractive RUN_FW_UPDATER=no apt-get install -y --no-install-recommends dgx-server-grub cuda-nvml-dev-12-8 nvidia-driver-570-open libnvidia-nscq-570 nvidia-fabricmanager-570 nvidia-persistenced nvidia-peermem-loader nvdebug nvfwupd nvsm nvidia-acs-disable datacenter-gpu-manager-4-cuda12 nvidia-mig-manager nvidia-system-core nvidia-system-utils nvidia-system-extra nvidia-grub-params nvidia-fs nvidia-fs-dkms nvidia-conf-cachefilesd nvidia-ipmisol nvidia-pci-bridge-power nvidia-modprobe dpkg-dev nvidia-fs-loader" + - curtin in-target -- /bin/bash -c "DEBIAN_FRONTEND=noninteractive RUN_FW_UPDATER=no apt-get install -y --no-install-recommends CHANGE_INSTALL_PKGS || apt-get install -f -y" - curtin in-target -- apt-get purge -y unattended-upgrades || true # Restore apt sources for post-install From ded261bf8a3f6efd914ce46f41b45fa7d02f467c Mon Sep 17 00:00:00 2001 From: Douglas Holt Date: Wed, 4 Feb 2026 12:28:21 -0700 Subject: [PATCH 13/13] Document MAAS CLI file path requirements and image replacement Add notes to README about: - MAAS CLI requiring image file in current working directory - How to replace an existing image in MAAS Co-Authored-By: Claude Opus 4.5 --- dgxos7/README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/dgxos7/README.md b/dgxos7/README.md index a15786da..50209a6b 100644 --- a/dgxos7/README.md +++ b/dgxos7/README.md @@ -74,7 +74,14 @@ After a successful build (~60 minutes), you will have: ## Uploading to MAAS +**Note:** The MAAS CLI requires the image file to be in your current working directory. Copy the image to your home directory and run the command from there: + ```bash +# Copy image to MAAS server home directory +scp dgxos7.tar.gz maas@:~/ + +# SSH to MAAS server and upload +ssh maas@ PROFILE=admin maas $PROFILE boot-resources create \ @@ -86,6 +93,18 @@ maas $PROFILE boot-resources create \ content@=dgxos7.tar.gz ``` +To replace an existing image, first delete the old one: + +```bash +# Find the image ID +maas $PROFILE boot-resources read | jq '.[] | select(.name | contains("dgxos7")) | {id, name}' + +# Delete the old image +maas $PROFILE boot-resource delete + +# Upload the new image (as above) +``` + ## Deployment ### UEFI Boot Requirement