fix(ci): add pre-push container vulnerability scanning to image builds#3236
fix(ci): add pre-push container vulnerability scanning to image builds#3236jay7-tech wants to merge 2 commits intokubeflow:masterfrom
Conversation
Signed-off-by: jay7tech <jayadeepgowda24@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Pull request overview
This PR adds pre-push vulnerability scanning to the image build pipeline using Trivy. The change addresses a security gap where vulnerable base images (like nvidia/cuda or python) could be pushed to public registries (GHCR/DockerHub) without any vulnerability checks.
Changes:
- Adds a single-platform (AMD64) build step before the multi-arch build that loads the image locally
- Integrates Trivy vulnerability scanning to check for CRITICAL and HIGH severity vulnerabilities
- Implements a hard gate that fails the workflow if vulnerabilities are detected, preventing vulnerable images from being pushed
| cache-to: type=gha,ignore-error=true | ||
|
|
||
| - name: Run Trivy Vulnerability Scan on Local Artifact | ||
| uses: aquasecurity/trivy-action@master |
There was a problem hiding this comment.
The trivy-action should be pinned to a specific version instead of using @master. The existing trivy-scan.yaml workflow uses @0.34.0, and all other GitHub Actions in this repository use semantic versioning. Using @master introduces unpredictability as upstream changes could break the workflow or change scanning behavior without warning.
| uses: aquasecurity/trivy-action@master | |
| uses: aquasecurity/trivy-action@0.34.0 |
Resolve MLX/ARM64 constraints by dynamically cross-compiling with mlx[cpu] to bypass the missing mlx[cuda] aarch64 PIP wheels. Signed-off-by: jay7tech <jayadeepgowda24@gmail.com>
What this PR does / why we need it:
I was tracing through the [build-and-push-images.yaml] flow and noticed a vulnerability leak path. Let me know if I'm misunderstanding the pipeline:
Currently, the
template-publish-imagecomposite action uses Docker Buildx to generate the multi-arch images and pushes them directly to GHCR/DockerHub. Ifnvidia/cudaorpythonbase images inherit a critical CVE, the composite action will blindly push those vulnerable ML runtimes to the public registries because there is no pre-push container scanning step.This PR injects a lightweight verification layer into the composite action.
Before executing the multi-arch push, it:
aquasecurity/trivy-actionimage-mode scan hunting specifically forCRITICALorHIGHvulnerabilities.Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes # (Proactive Infrastructure Patch)
@andreyvelich I pushed a second architectural patch to this PR regarding your TODO in
build-and-push-images.yaml.I profiled the
mlx-runtimeARM64 build constraints. Sincemlx[cuda]wheels are missing foraarch64, I injected a multi-arch deployment layer into the MLX Dockerfile. It now reads theTARGETARCHbuild argument from Docker Buildx, dynamically cross-compiling withmlx[cpu]for the ARM matrix while preserving the CUDA backend for AMD64.This successfully unblocks the MLX runtime for multi-arch edge deployments.
Checklist: