fix(ci): add pre-push container vulnerability scanning to image builds by jay7-tech · Pull Request #3236 · kubeflow/trainer

jay7-tech · 2026-02-23T09:40:59Z

What this PR does / why we need it:
I was tracing through the [build-and-push-images.yaml] flow and noticed a vulnerability leak path. Let me know if I'm misunderstanding the pipeline:

Currently, the template-publish-image composite action uses Docker Buildx to generate the multi-arch images and pushes them directly to GHCR/DockerHub. If nvidia/cuda or python base images inherit a critical CVE, the composite action will blindly push those vulnerable ML runtimes to the public registries because there is no pre-push container scanning step.

This PR injects a lightweight verification layer into the composite action.
Before executing the multi-arch push, it:

Builds a local, single-platform AMD64 version of the target image and loads it to the local Daemon.
Runs a fast aquasecurity/trivy-action image-mode scan hunting specifically for CRITICAL or HIGH vulnerabilities.
Acts as a hard gate—if a critical CVE is found, the action fails (exit-code 1), protecting the master registries.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes # (Proactive Infrastructure Patch)

@andreyvelich I pushed a second architectural patch to this PR regarding your TODO in build-and-push-images.yaml.

I profiled the mlx-runtime ARM64 build constraints. Since mlx[cuda] wheels are missing for aarch64, I injected a multi-arch deployment layer into the MLX Dockerfile. It now reads the TARGETARCH build argument from Docker Buildx, dynamically cross-compiling with mlx[cpu] for the ARM matrix while preserving the CUDA backend for AMD64.

This successfully unblocks the MLX runtime for multi-arch edge deployments.

Checklist:

Docs included if any changes are user facing

Signed-off-by: jay7tech <jayadeepgowda24@gmail.com>

google-oss-prow · 2026-02-23T09:41:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign terrytangyuan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2026-02-23T09:41:09Z

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Slack: Join our #kubeflow-trainer Slack channel.
Meetings: Attend the Kubeflow AutoML and Training Working Group bi-weekly meetings.

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copilot

Pull request overview

This PR adds pre-push vulnerability scanning to the image build pipeline using Trivy. The change addresses a security gap where vulnerable base images (like nvidia/cuda or python) could be pushed to public registries (GHCR/DockerHub) without any vulnerability checks.

Changes:

Adds a single-platform (AMD64) build step before the multi-arch build that loads the image locally
Integrates Trivy vulnerability scanning to check for CRITICAL and HIGH severity vulnerabilities
Implements a hard gate that fails the workflow if vulnerabilities are detected, preventing vulnerable images from being pushed

Copilot · 2026-02-23T09:43:46Z

.github/workflows/template-publish-image/action.yaml

+        cache-to: type=gha,ignore-error=true
+
+    - name: Run Trivy Vulnerability Scan on Local Artifact
+      uses: aquasecurity/trivy-action@master


The trivy-action should be pinned to a specific version instead of using @master. The existing trivy-scan.yaml workflow uses @0.34.0, and all other GitHub Actions in this repository use semantic versioning. Using @master introduces unpredictability as upstream changes could break the workflow or change scanning behavior without warning.

Suggested change

uses: aquasecurity/trivy-action@master

uses: aquasecurity/trivy-action@0.34.0

Resolve MLX/ARM64 constraints by dynamically cross-compiling with mlx[cpu] to bypass the missing mlx[cuda] aarch64 PIP wheels. Signed-off-by: jay7tech <jayadeepgowda24@gmail.com>

ci: add pre-push container vulnerability scanning to image builds

6240f98

Signed-off-by: jay7tech <jayadeepgowda24@gmail.com>

Copilot AI review requested due to automatic review settings February 23, 2026 09:41

google-oss-prow bot requested a review from akshaychitneni February 23, 2026 09:41

google-oss-prow bot requested a review from kuizhiqing February 23, 2026 09:41

google-oss-prow bot added the size/S label Feb 23, 2026

Copilot started reviewing on behalf of jay7-tech February 23, 2026 09:41 View session

jay7-tech changed the title ~~ci: add pre-push container vulnerability scanning to image builds~~ fix(ci): add pre-push container vulnerability scanning to image builds Feb 23, 2026

Copilot AI reviewed Feb 23, 2026

View reviewed changes

fix(ci): unblock MLX runtime ARM64 multi-arch builds

09b12a2

Resolve MLX/ARM64 constraints by dynamically cross-compiling with mlx[cpu] to bypass the missing mlx[cuda] aarch64 PIP wheels. Signed-off-by: jay7tech <jayadeepgowda24@gmail.com>

google-oss-prow bot added size/M and removed size/S labels Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): add pre-push container vulnerability scanning to image builds#3236

fix(ci): add pre-push container vulnerability scanning to image builds#3236
jay7-tech wants to merge 2 commits intokubeflow:masterfrom
jay7-tech:feature/gsoc-container-vuln-scan

jay7-tech commented Feb 23, 2026 •

edited

Loading

Uh oh!

google-oss-prow bot commented Feb 23, 2026

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	uses: aquasecurity/trivy-action@master
	uses: aquasecurity/trivy-action@0.34.0

Conversation

jay7-tech commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-oss-prow bot commented Feb 23, 2026

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jay7-tech commented Feb 23, 2026 •

edited

Loading