feat: add support for preserving and labeling intermediate stage images by ezopezo · Pull Request #6556 · containers/buildah

ezopezo · 2025-12-01T14:16:45Z

This adds support for preserving and labeling intermediate stage images (including final image) in multi-stage builds. --save-stages preserves only the final image from each intermediate stage, not every instruction layer (as --layers flag). This also keeps the final image's layer count same as in regular builds (with no additional args). --stage-labels adds label metadata containing base image and alias from Containerfile at the beginning of the stage.

New flags:

--save-stages: save intermediate stage final images instead of removing them.
Build without added --layers, intermediate images are expected to accumulate on each build (no cache reuse).
Build with added --layers, enables layer cache reuse for subsequent builds.
--stage-labels: adds metadata labels to intermediate stage images and final image:
- io.buildah.stage.name: the stage name (alias or stage index if alias not specified)
- io.buildah.stage.base: the base image (external image reference - pullpsec or
  parent stage image ID if its base is another stage)
  Requires --save-stages.
  Cache reuse depends of if first or second build has or has not –stage-labels argument or if alias/pullspec changed between builds in Containerfile

The implementation includes:

Validation that --stage-labels requires --save-stages
Detection when a stage uses another intermediate stage as base (and labeling the stage accordingly)
Stage labels are added to Containerfile via parse tree injection as LABEL at the beginning of the stage including final
Various test coverage for saving, labeling, and caching scenarios

What type of PR is this?

/kind feature

What this PR does / why we need it:

General use: This functionality is useful for identification and debugging intermediate stage images in multi-stage builds.

Specific need: Identifying the content copied from intermediate stages in multi-stage builds into the final image is a hard requirement for supporting Contextual SBOM - an SBOM that understands the origin of each package.
While intermediate images can be extracted using the --layers option, this approach has several issues for our use case:

Intermediate stage images are unlabeled, making it difficult to determine which image corresponds to which build stage - especially when the Containerfile reuses the same pullspec across multiple stages (pullspecs does not have to be unique, aliases must be).
All instructions from all intermediate stages appear in the cache (visible via buildah images --all), which introduces unnecessary noise for our purposes.
rootfs.diff_ids are not squashed in final stage: the final-stage image ends up containing diff IDs for every instruction in the final stage (due to the instructions reuse also for final stage). However, we need the final build image to resemble a regular build (without --layers), meaning:
- it should contain the diff IDs inherited from the base image, and
- exactly one diff ID representing the squashed final-stage instructions.

Related repositories:
konflux (uses mobster for SBOM generation),
mobster (implements contextual SBOM functionality requiring this change),
capo (wraps builder content identification functionality for mobster),
Contact person: emravec (RedHat) / @ezopezo (Github)

How to verify it

Run any multistage build with intermediate stage, with implemented arguments. Resulting intermediate images should be saved and correctly labeled. Example:
buildah build --save-stages --stage-labels -t test:0.1 .

Which issue(s) this PR fixes:

Fixes: #6257
Internal Jira: https://issues.redhat.com/browse/ISV-6122

Does this PR introduce a user-facing change?

Add `--save-stages` and `--stage-labels` flags for preserving and labeling intermediate stage images in multi-stage builds.

openshift-ci · 2025-12-01T14:16:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ezopezo
Once this PR has been reviewed and has the lgtm label, please assign luap99 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

packit-as-a-service · 2025-12-01T14:18:30Z

Ephemeral COPR build failed. @containers/packit-build please check.

ezopezo · 2025-12-02T21:17:22Z

/retest

openshift-ci · 2025-12-02T21:17:38Z

@ezopezo: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ezopezo · 2025-12-02T21:38:11Z

@nalind can you please take a look and put ok-to-test label? It seems to me that tests are failing most likely with some timeouts and thus I would like to try to re-run them (or please tell me what I just broke :) ).

nalind · 2025-12-02T22:05:26Z

/ok-to-test

ezopezo · 2025-12-03T09:22:06Z

/test

openshift-ci · 2025-12-03T09:22:22Z

@ezopezo: No presubmit jobs available for containers/buildah@main

Details

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ezopezo · 2025-12-04T09:21:56Z

@nalind @mtrmac @TomSweeneyRedHat can you please take a look on this? (or pick up some appropriate reviewers?) Thanks in advance!

mtrmac · 2025-12-09T13:45:13Z

@flouthoc if you have time

TomSweeneyRedHat · 2026-01-06T15:30:54Z

@Luap99 @nalind PTAL

docs/buildah-build.1.md

TomSweeneyRedHat · 2026-01-08T00:19:41Z

docs/buildah-build.1.md


 buildah build --layers -t imageName .

+buildah build --cache-stages --stage-labels -t imageName .


just to show the bool values in play, perhaps

Suggested change

buildah build --cache-stages --stage-labels -t imageName .

buildah build --cache-stages false --stage-labels false -t imageName .

I skipped that since this is a relatively uncommon use case for boolean flags, because the false behavior is implicit whenever the flag is absent. Other flags such as --no-cache or --layers follow the same pattern (present when true, absent when not, without presence of the --layers false) so I wanted to stay consistent with your docs. If you feel strongly about it, I certainly can add the examples :)

There's a long-standing bit of confusion in the man pages where, even though the boolean argument values are optional, we don't suggest that they always have to be supplied in the --flag=value form, with an equal sign, to prevent them from being treated as unrelated arguments.

@nalind Does it make a sense to you to add args with equal sign and boolean or leave it as it is? What do you think?

If we want an example where the flag is set to false, the equal sign is going to be necessary.

docs/buildah-build.1.md

pkg/cli/build.go

pkg/cli/common.go

tests/bud.bats

TomSweeneyRedHat

First pass LGTM, just a couple of nitty things here and there.

@nalind or @Luap99 PTAL

Also, have you tried vendoring Buildah into Podman with these changes yet?

@ddarrah should we have a feature/epic for this in RHEL 9.8/10.2?

docs/buildah-build.1.md

ezopezo · 2026-02-06T10:05:58Z

@nalind I am re-pushing changes, because /retest flag just does not work for me (maybe I need some privileges?) and checks are often sporadically failing.
Also I don't know how to proceed with podman vendoring test I left a comment in PR. Can you please help me?

nalind · 2026-02-10T20:30:30Z

@nalind I am re-pushing changes, because /retest flag just does not work for me (maybe I need some privileges?) and checks are often sporadically failing. Also I don't know how to proceed with podman vendoring test I left a comment in PR. Can you please help me?

The configuration for the bot which handled those things was updated to not drop most containers-org repositories by openshift/release#71397; anything that was in-progress before it was merged would still include a record of previous interactions with the bot.

nalind · 2026-02-10T21:04:15Z

--layers=true is the default used by podman build, so I'd be hesitant about making a new feature mutually exclusive with it. If we weren't worried about reusing cache images with previously-generated build UUIDs, I would generally expect the images we commit for each stage, including the final one, to use multiple layers when it's appropriate. Would forcing checkForLayers to be false at the start of stageExecutor.execute() work as expected for the use case? I could see an argument for "fresh build, every time" when the new flag is used, and that would still allow subsequent builds that don't use the new flag to use the results as cache hits. If we did that, would the cache-related labels be left alone or altered/cleared for the products of that build?

Yes, it works well buildah build --layers --cache-stages --stage-labels stores all the layers of the Containerfile, correctly labels them and subsequent plain buildah build --layers reuses those by hitting cache - those layers remain intact, labels are preserved. Are we okay with such behavior?

I had to think about this for a bit, and I'm not sure it's what we want. Much like earlier versions of build --label, adding the new labels when we commit, without having their presence show up in the history, prevents the cache evaluation logic from "seeing" them.

For the --label CLI flag, we eventually reworked the build so that it internally appended a LABEL instruction with its arguments to the set of build steps in executor.buildStage() before processing the steps for the stage, so that labels from the command line could be accounted for.

Would this use case be better served by doing something similar? The buildStage() method potentially appends a LABEL instruction to the end of the final stage, and potentially prepends an ENV instruction (for --env values) at the beginning of every stage, so we have options. Adding the stage label instruction at the start of the stage's instructions would cause cache misses and naturally provide the "fresh build, every time" behavior which I think we're aiming for. Adding it as an instruction at the end of the stage would allow every instruction before it to be a cache hit, if we want it the other way.

ezopezo · 2026-02-11T15:35:17Z

--layers=true is the default used by podman build, so I'd be hesitant about making a new feature mutually exclusive with it. If we weren't worried about reusing cache images with previously-generated build UUIDs, I would generally expect the images we commit for each stage, including the final one, to use multiple layers when it's appropriate. Would forcing checkForLayers to be false at the start of stageExecutor.execute() work as expected for the use case? I could see an argument for "fresh build, every time" when the new flag is used, and that would still allow subsequent builds that don't use the new flag to use the results as cache hits. If we did that, would the cache-related labels be left alone or altered/cleared for the products of that build?

Yes, it works well buildah build --layers --cache-stages --stage-labels stores all the layers of the Containerfile, correctly labels them and subsequent plain buildah build --layers reuses those by hitting cache - those layers remain intact, labels are preserved. Are we okay with such behavior?

I had to think about this for a bit, and I'm not sure it's what we want. Much like earlier versions of build --label, adding the new labels when we commit, without having their presence show up in the history, prevents the cache evaluation logic from "seeing" them.

For the --label CLI flag, we eventually reworked the build so that it internally appended a LABEL instruction with its arguments to the set of build steps in executor.buildStage() before processing the steps for the stage, so that labels from the command line could be accounted for.

Would this use case be better served by doing something similar? The buildStage() method potentially appends a LABEL instruction to the end of the final stage, and potentially prepends an ENV instruction (for --env values) at the beginning of every stage, so we have options. Adding the stage label instruction at the start of the stage's instructions would cause cache misses and naturally provide the "fresh build, every time" behavior which I think we're aiming for. Adding it as an instruction at the end of the stage would allow every instruction before it to be a cache hit, if we want it the other way.

prevents the cache evaluation logic from "seeing" them.

I may be misunderstanding this concern. Our implementation explicitly disables cache lookup when --cache-stages is used (stage_executor.go:1260-1262):

if s.executor.cacheStages {
    checkForLayers = false
}

This ensures every build with --cache-stages is a fresh build, and caches new intermediate images with a new build UUID. Cache evaluation logic (search) for stages doesn't run at all when --cache-stages is used.

Regarding cache reuse scenarios:
Scenario 1:

Build 1: --cache-stages (without --layers) → creates non-layered intermediate images
Build 2: --cache-stages (without --layers) → CANNOT reuse (explicitly disabled cache lookup)

Scenario 2:

Build 1: --cache-stages (without --layers) → creates non-layered intermediate images
Build 2: (without --cache-stages and --layers) → CANNOT reuse (insufficient history for cache hit, if I understand logic correctly), but that's fine for our use case

Scenario 3 (considering that podman has always --layers enabled):

Build 1: --cache-stages --layers → creates layered intermediate images
Build 2: --layers (without --cache-stages) → CAN reuse those layers (you're right - even when labeled, because of their absence in history - I'll expand this later*)

However, our actual use case (Konflux SBOM generation) will use --cache-stages without --layers, so the layered cache scenario isn't relevant for us.
The --layers compatibility was added based on your earlier suggestion that it's podman's default, but our production workflow won't use it.
The intermediate images from --cache-stages builds are intended for inspection (SBOM generation), not for cache reuse. We don't need subsequent builds to reuse them.

For the --label CLI flag...

*I see the architectural similarity, but there's a practical difference IMHO:

--label: user-defined, cache evaluation is active (user applies different labels across builds = cache miss)
--stage-labels: system metadata, cache evaluation is explicitly disabled (checkForLayers = false by mandatory preceeding --cache-stages)

Given that cache is disabled when --cache-stages is used, is there still a concern about labels (added by --stage labels, that cannot be used without --cache-stages) not being in build history?

Could you clarify what specific scenario you're concerned about where the current approach would fail?

ezopezo · 2026-02-12T09:30:34Z

@nalind I am re-pushing changes, because /retest flag just does not work for me (maybe I need some privileges?) and checks are often sporadically failing. Also I don't know how to proceed with podman vendoring test I left a comment in PR. Can you please help me?

The configuration for the bot which handled those things was updated to not drop most containers-org repositories by openshift/release#71397; anything that was in-progress before it was merged would still include a record of previous interactions with the bot.

@nalind I understand, thank you for explanation! I am just repushing commit with chnages to trigger pipeline - is it common that I am getting so much failures from timeouts? It is very hard for me to get all-green (I wittnessed it few times) :|

nalind · 2026-02-13T02:36:18Z

--layers=true is the default used by podman build, so I'd be hesitant about making a new feature mutually exclusive with it. If we weren't worried about reusing cache images with previously-generated build UUIDs, I would generally expect the images we commit for each stage, including the final one, to use multiple layers when it's appropriate. Would forcing checkForLayers to be false at the start of stageExecutor.execute() work as expected for the use case? I could see an argument for "fresh build, every time" when the new flag is used, and that would still allow subsequent builds that don't use the new flag to use the results as cache hits. If we did that, would the cache-related labels be left alone or altered/cleared for the products of that build?

Yes, it works well buildah build --layers --cache-stages --stage-labels stores all the layers of the Containerfile, correctly labels them and subsequent plain buildah build --layers reuses those by hitting cache - those layers remain intact, labels are preserved. Are we okay with such behavior?

I had to think about this for a bit, and I'm not sure it's what we want. Much like earlier versions of build --label, adding the new labels when we commit, without having their presence show up in the history, prevents the cache evaluation logic from "seeing" them.
For the --label CLI flag, we eventually reworked the build so that it internally appended a LABEL instruction with its arguments to the set of build steps in executor.buildStage() before processing the steps for the stage, so that labels from the command line could be accounted for.
Would this use case be better served by doing something similar? The buildStage() method potentially appends a LABEL instruction to the end of the final stage, and potentially prepends an ENV instruction (for --env values) at the beginning of every stage, so we have options. Adding the stage label instruction at the start of the stage's instructions would cause cache misses and naturally provide the "fresh build, every time" behavior which I think we're aiming for. Adding it as an instruction at the end of the stage would allow every instruction before it to be a cache hit, if we want it the other way.

prevents the cache evaluation logic from "seeing" them.

I may be misunderstanding this concern. Our implementation explicitly disables cache lookup when --cache-stages is used (stage_executor.go:1260-1262):
if s.executor.cacheStages {
    checkForLayers = false
}
This ensures every build with --cache-stages is a fresh build, and caches new intermediate images with a new build UUID. Cache evaluation logic (search) for stages doesn't run at all when --cache-stages is used.

Yes, I was hoping to be able to remove this special-case check.

Regarding cache reuse scenarios: Scenario 1:

* Build 1: `--cache-stages` (without `--layers`) → creates non-layered intermediate images

* Build 2:  `--cache-stages` (without `--layers`) → CANNOT reuse (explicitly disabled cache lookup)

Scenario 2:

* Build 1: `--cache-stages` (without `--layers`) → creates non-layered intermediate images

* Build 2: (without --cache-stages and `--layers`) → CANNOT reuse (insufficient history for cache hit, if I understand logic correctly), but that's fine for our use case

Your understand matches mine, yes.

Scenario 3 (considering that podman has always --layers enabled):
* Build 1: `--cache-stages --layers` → creates layered intermediate images

* Build 2: `--layers` (without `--cache-stages`) → CAN reuse those layers (you're right - even when labeled, because of their absence in history - I'll expand this later*)
However, our actual use case (Konflux SBOM generation) will use --cache-stages without --layers, so the layered cache scenario isn't relevant for us. The --layers compatibility was added based on your earlier suggestion that it's podman's default, but our production workflow won't use it. The intermediate images from --cache-stages builds are intended for inspection (SBOM generation), not for cache reuse. We don't need subsequent builds to reuse them.

For the --label CLI flag...

*I see the architectural similarity, but there's a practical difference IMHO:
* --label: user-defined, cache evaluation is active (user applies different labels across builds = cache miss)

* --stage-labels: system metadata, cache evaluation is explicitly disabled (checkForLayers = false by mandatory preceeding --cache-stages)

There's really nothing enforcing that distinction, though, and I would advise against assuming otherwise when using the produced images.

Given that cache is disabled when --cache-stages is used, is there still a concern about labels (added by --stage labels, that cannot be used without --cache-stages) not being in build history?

Could you clarify what specific scenario you're concerned about where the current approach would fail?

Throwing the stage labels into the history makes the new feature's interaction with caching much easier to reason about - based on where in the parsed tree the new labels are added, we can predict how the produced images will interact with the cache evaluation logic in future builds, and I value that.

nalind

Possible issues when --jobs is used with values greater than 1, and questions around labels in the final stage and in stages that inherit from other stages in multi-layer builds.

imagebuildah/stage_executor.go

imagebuildah/executor.go

pkg/cli/build.go

tests/bud.bats

imagebuildah/executor.go

nalind

Some questions about the tests.

tests/bud.bats

nalind · 2026-03-16T21:38:55Z

tests/bud/save-stages/Dockerfile.arg-build-stages-and-chained-build-stages

+FROM alpine
+ARG SECOND_STAGE=second-stage
+COPY --from=${SECOND_STAGE} /output.txt /app/output.txt
+RUN cat /app/output.txt


Does the test that uses this file check the contents of the file, to ensure that it has the intended content, i.e., that it was copied from the correct stage? Would the test succeed if line 11 set SECOND_STAGE to the value "first-stage"?

Well that ARG is actually not relevant and even redundant (sorry for confusion I think I added it there because I found out that ARGS stated at the beginning of the stage are not seemingly defined in final stage - weird, but I did not researched why is that - is it a bug?). I just removed it and used COPY --from=second-stage ... instead.
This is purely a sanity test to verify that ARG variables are correctly resolved to their corresponding values (stage aliases, image ID, and pullspec) and that these resolved values are properly reflected in the stage labels. The file contents and the actual stage instructions are not relevant to what we're testing - they're only present to create intermediate stages with some layer content and this content is COPY-ed to final imaeg just to reference those stages to be actually built. I understand that ARG resolution happens at parse time, before stage label generation, so this test might seem somewhat redundant, but it does verify the end-to-end flow. In the meantime, I removed the pointless ARG SECOND_STAGE=second-stage line in the final stage (and added ARG PULLSPEC=alpine to test pullspec resolution as well), but feel free to decide if this test is even useful.

I don't think I understand the question you're asking about ARGs in stages in that first paragraph...?

Sorry I wrote it wrongly, I meant; ARGS stated at the top of the Containerfile are not resolved inside of any stage (only in FROM instructions as pullspecs or aliases). But it makes a sense I guess, that stages have their own ARG scope. Nevermind it is not important, I don't want to tangle into this I just wanted to explain why I did this weird stuff before:

ARG SECOND_STAGE=second-stage COPY --from=${SECOND_STAGE} ...

But it is not needed at all as I explained

Anyway, is the Containerfile and test okay as it is now?

ARGs defined before the first FROM instruction (i.e., in the header) are available in stages once they're declared, per https://docs.docker.com/build/building/variables/#scoping. The part where a stage inherits the ARGs defined in a stage that it's based on, however, was only recently fixed.
The test is fine, it's just confusing to see it appear to set the file's contents in a way that a test would if it was going to check on them, and then not do that.

This adds support for saving and labeling all of the intermediate stage images (including final image) in the multi-stage builds. --save-stages preserves only the final image from each intermediate stage, not every instruction layer (as --layers flag). This also keeps the final image's layer count same as in regular builds (with no additional args). --stage-labels adds label metadata containing base image and alias from Containerfile at the beginning of the stage. New flags: - --save-stages: save intermediate stage final images instead of removing them. Build without added --layers, intermediate images are expected to accumulate on each build (no cache reuse). Build with added --layers, enables layer cache reuse for subsequent builds. - --stage-labels: add metadata labels to intermediate stage images and final image: * io.buildah.stage.name: the stage name (alias or stage index if alias not specified) * io.buildah.stage.base: the base image (external image reference - pullpsec or parent stage image ID if its base is another stage) Requires --save-stages. Cache reuse depends of if first or second build has or has not –stage-labels argument or if alias/pullspec changed between builds in Containerfile. The implementation includes: - Validation that --stage-labels requires --save-stages - Detection when a stage uses another intermediate stage as base (and labeling the stage accordingly) - Stage labels are added to Containerfile via parse tree injection as LABEL at the beginning of the stage including final - Various test coverage for saving, labeling, and caching scenarios This functionality is useful for debugging, exploring, and reusing intermediate stage images from multi-stage builds. Signed-off-by: Erik Mravec <emravec@redhat.com>

nalind

LGTM
@containers/buildah-maintainers PTAL

openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 1, 2025

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2025

ezopezo force-pushed the emravec/preserve-intermediate-images branch from 59cd9ae to 6bd3187 Compare December 1, 2025 14:24

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2025

ezopezo force-pushed the emravec/preserve-intermediate-images branch 2 times, most recently from b7e81df to 3314051 Compare December 2, 2025 16:30

openshift-ci bot added the ok-to-test label Dec 2, 2025

ezopezo force-pushed the emravec/preserve-intermediate-images branch from 3314051 to 3955b20 Compare December 3, 2025 09:25