feat: add support for preserving and labeling intermediate stage images#6556
feat: add support for preserving and labeling intermediate stage images#6556ezopezo wants to merge 1 commit intocontainers:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ezopezo The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Ephemeral COPR build failed. @containers/packit-build please check. |
59cd9ae to
6bd3187
Compare
b7e81df to
3314051
Compare
|
/retest |
|
@ezopezo: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@nalind can you please take a look and put ok-to-test label? It seems to me that tests are failing most likely with some timeouts and thus I would like to try to re-run them (or please tell me what I just broke :) ). |
|
/ok-to-test |
|
/test |
|
@ezopezo: No presubmit jobs available for containers/buildah@main DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
3314051 to
3955b20
Compare
|
@nalind @mtrmac @TomSweeneyRedHat can you please take a look on this? (or pick up some appropriate reviewers?) Thanks in advance! |
|
@flouthoc if you have time |
docs/buildah-build.1.md
Outdated
|
|
||
| buildah build --layers -t imageName . | ||
|
|
||
| buildah build --cache-stages --stage-labels -t imageName . |
There was a problem hiding this comment.
just to show the bool values in play, perhaps
| buildah build --cache-stages --stage-labels -t imageName . | |
| buildah build --cache-stages false --stage-labels false -t imageName . |
There was a problem hiding this comment.
I skipped that since this is a relatively uncommon use case for boolean flags, because the false behavior is implicit whenever the flag is absent. Other flags such as --no-cache or --layers follow the same pattern (present when true, absent when not, without presence of the --layers false) so I wanted to stay consistent with your docs. If you feel strongly about it, I certainly can add the examples :)
There was a problem hiding this comment.
There's a long-standing bit of confusion in the man pages where, even though the boolean argument values are optional, we don't suggest that they always have to be supplied in the --flag=value form, with an equal sign, to prevent them from being treated as unrelated arguments.
There was a problem hiding this comment.
@nalind Does it make a sense to you to add args with equal sign and boolean or leave it as it is? What do you think?
There was a problem hiding this comment.
If we want an example where the flag is set to false, the equal sign is going to be necessary.
6b2baab to
3272226
Compare
ea82074 to
3818667
Compare
The configuration for the bot which handled those things was updated to not drop most containers-org repositories by openshift/release#71397; anything that was in-progress before it was merged would still include a record of previous interactions with the bot. |
I had to think about this for a bit, and I'm not sure it's what we want. Much like earlier versions of For the Would this use case be better served by doing something similar? The |
I may be misunderstanding this concern. Our implementation explicitly disables cache lookup when --cache-stages is used (stage_executor.go:1260-1262): This ensures every build with --cache-stages is a fresh build, and caches new intermediate images with a new build UUID. Cache evaluation logic (search) for stages doesn't run at all when --cache-stages is used. Regarding cache reuse scenarios:
Scenario 2:
Scenario 3 (considering that podman has always
However, our actual use case (Konflux SBOM generation) will use --cache-stages without --layers, so the layered cache scenario isn't relevant for us.
*I see the architectural similarity, but there's a practical difference IMHO:
Given that cache is disabled when --cache-stages is used, is there still a concern about labels (added by --stage labels, that cannot be used without --cache-stages) not being in build history? Could you clarify what specific scenario you're concerned about where the current approach would fail? |
cf654f6 to
7031351
Compare
@nalind I understand, thank you for explanation! I am just repushing commit with chnages to trigger pipeline - is it common that I am getting so much failures from timeouts? It is very hard for me to get all-green (I wittnessed it few times) :| |
Yes, I was hoping to be able to remove this special-case check.
Your understand matches mine, yes.
There's really nothing enforcing that distinction, though, and I would advise against assuming otherwise when using the produced images.
Throwing the stage labels into the history makes the new feature's interaction with caching much easier to reason about - based on where in the parsed tree the new labels are added, we can predict how the produced images will interact with the cache evaluation logic in future builds, and I value that. |
89ead6e to
73b4f11
Compare
nalind
left a comment
There was a problem hiding this comment.
Possible issues when --jobs is used with values greater than 1, and questions around labels in the final stage and in stages that inherit from other stages in multi-layer builds.
6e0560e to
4d61cfa
Compare
nalind
left a comment
There was a problem hiding this comment.
Some questions about the tests.
| FROM alpine | ||
| ARG SECOND_STAGE=second-stage | ||
| COPY --from=${SECOND_STAGE} /output.txt /app/output.txt | ||
| RUN cat /app/output.txt |
There was a problem hiding this comment.
Does the test that uses this file check the contents of the file, to ensure that it has the intended content, i.e., that it was copied from the correct stage? Would the test succeed if line 11 set SECOND_STAGE to the value "first-stage"?
There was a problem hiding this comment.
Well that ARG is actually not relevant and even redundant (sorry for confusion I think I added it there because I found out that ARGS stated at the beginning of the stage are not seemingly defined in final stage - weird, but I did not researched why is that - is it a bug?). I just removed it and used COPY --from=second-stage ... instead.
This is purely a sanity test to verify that ARG variables are correctly resolved to their corresponding values (stage aliases, image ID, and pullspec) and that these resolved values are properly reflected in the stage labels. The file contents and the actual stage instructions are not relevant to what we're testing - they're only present to create intermediate stages with some layer content and this content is COPY-ed to final imaeg just to reference those stages to be actually built. I understand that ARG resolution happens at parse time, before stage label generation, so this test might seem somewhat redundant, but it does verify the end-to-end flow. In the meantime, I removed the pointless ARG SECOND_STAGE=second-stage line in the final stage (and added ARG PULLSPEC=alpine to test pullspec resolution as well), but feel free to decide if this test is even useful.
There was a problem hiding this comment.
I don't think I understand the question you're asking about ARGs in stages in that first paragraph...?
There was a problem hiding this comment.
Sorry I wrote it wrongly, I meant; ARGS stated at the top of the Containerfile are not resolved inside of any stage (only in FROM instructions as pullspecs or aliases). But it makes a sense I guess, that stages have their own ARG scope. Nevermind it is not important, I don't want to tangle into this I just wanted to explain why I did this weird stuff before:
ARG SECOND_STAGE=second-stage
COPY --from=${SECOND_STAGE} ...
But it is not needed at all as I explained
Anyway, is the Containerfile and test okay as it is now?
There was a problem hiding this comment.
ARGs defined before the first FROM instruction (i.e., in the header) are available in stages once they're declared, per https://docs.docker.com/build/building/variables/#scoping. The part where a stage inherits the ARGs defined in a stage that it's based on, however, was only recently fixed.
The test is fine, it's just confusing to see it appear to set the file's contents in a way that a test would if it was going to check on them, and then not do that.
This adds support for saving and labeling all of the intermediate stage images
(including final image) in the multi-stage builds.
--save-stages preserves only the final image from each intermediate stage,
not every instruction layer (as --layers flag). This also keeps the final image's
layer count same as in regular builds (with no additional args).
--stage-labels adds label metadata containing base image and alias
from Containerfile at the beginning of the stage.
New flags:
- --save-stages: save intermediate stage final images instead of removing them.
Build without added --layers, intermediate images are expected to accumulate on each build (no cache reuse).
Build with added --layers, enables layer cache reuse for subsequent builds.
- --stage-labels: add metadata labels to intermediate stage images and final image:
* io.buildah.stage.name: the stage name (alias or stage index if alias not specified)
* io.buildah.stage.base: the base image (external image reference - pullpsec or
parent stage image ID if its base is another stage)
Requires --save-stages.
Cache reuse depends of if first or second build has or has not –stage-labels
argument or if alias/pullspec changed between builds in Containerfile.
The implementation includes:
- Validation that --stage-labels requires --save-stages
- Detection when a stage uses another intermediate stage as base (and labeling the stage accordingly)
- Stage labels are added to Containerfile via parse tree injection as LABEL at the beginning of the stage including final
- Various test coverage for saving, labeling, and caching scenarios
This functionality is useful for debugging, exploring, and reusing intermediate stage images from multi-stage builds.
Signed-off-by: Erik Mravec <emravec@redhat.com>
4d61cfa to
c5768c7
Compare
nalind
left a comment
There was a problem hiding this comment.
LGTM
@containers/buildah-maintainers PTAL
This adds support for preserving and labeling intermediate stage images (including final image) in multi-stage builds.
--save-stagespreserves only the final image from each intermediate stage, not every instruction layer (as--layersflag). This also keeps the final image's layer count same as in regular builds (with no additional args).--stage-labelsadds label metadata containing base image and alias from Containerfile at the beginning of the stage.New flags:
--save-stages: save intermediate stage final images instead of removing them.Build without added
--layers, intermediate images are expected to accumulate on each build (no cache reuse).Build with added
--layers, enables layer cache reuse for subsequent builds.--stage-labels: adds metadata labels to intermediate stage images and final image:io.buildah.stage.name: the stage name (alias or stage index if alias not specified)io.buildah.stage.base: the base image (external image reference - pullpsec orparent stage image ID if its base is another stage)
Requires
--save-stages.Cache reuse depends of if first or second build has or has not
–stage-labelsargument or if alias/pullspec changed between builds in ContainerfileThe implementation includes:
--stage-labelsrequires--save-stagesWhat type of PR is this?
/kind feature
What this PR does / why we need it:
General use: This functionality is useful for identification and debugging intermediate stage images in multi-stage builds.
Specific need: Identifying the content copied from intermediate stages in multi-stage builds into the final image is a hard requirement for supporting Contextual SBOM - an SBOM that understands the origin of each package.
While intermediate images can be extracted using the
--layersoption, this approach has several issues for our use case:buildah images --all), which introduces unnecessary noise for our purposes.--layers), meaning:Related repositories:
konflux (uses mobster for SBOM generation),
mobster (implements contextual SBOM functionality requiring this change),
capo (wraps builder content identification functionality for mobster),
Contact person: emravec (RedHat) / @ezopezo (Github)
How to verify it
Run any multistage build with intermediate stage, with implemented arguments. Resulting intermediate images should be saved and correctly labeled. Example:
buildah build --save-stages --stage-labels -t test:0.1 .Which issue(s) this PR fixes:
Fixes: #6257
Internal Jira: https://issues.redhat.com/browse/ISV-6122
Does this PR introduce a user-facing change?