Skip to content

storage, overlay: stop using symlinks for new layers#666

Open
giuseppe wants to merge 1 commit intocontainers:mainfrom
giuseppe:drop-overlay-symlink
Open

storage, overlay: stop using symlinks for new layers#666
giuseppe wants to merge 1 commit intocontainers:mainfrom
giuseppe:drop-overlay-symlink

Conversation

@giuseppe
Copy link
Member

The overlay driver maintained a l/ directory of symlinks solely to shorten mount option strings below the kernel's page-size limit.
This is unnecessary since mountOverlayFrom() already handles long mount options via fd-based mounting.

New layers now store relative diff paths (e.g. "layer-id/diff") directly in the lower file instead of short symlink references.

Old-format layers with l/ symlinks continue to work.

@github-actions github-actions bot added the storage Related to "storage" package label Feb 18, 2026
@giuseppe giuseppe force-pushed the drop-overlay-symlink branch from 9827306 to dff75cb Compare February 18, 2026 11:07
@giuseppe
Copy link
Member Author

@mtrmac I believe you are going to like this PR :)

@giuseppe giuseppe marked this pull request as ready for review February 18, 2026 12:34
@giuseppe giuseppe changed the title [WIP] storage, overlay: stop using symlinks for new layers storage, overlay: stop using symlinks for new layers Feb 18, 2026
@mtrmac
Copy link
Contributor

mtrmac commented Feb 18, 2026

At a glance, yes I do :)

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, do we have a test case that mounts a 500 layer overlay? If not should we have one to ensure mounting always works?

@giuseppe
Copy link
Member Author

Mounting with the new code gives always shorter paths since we use the fd number as the full path instead of a symlink that is still an absolute path. I think this code was left there just for historical reasons and there is no case where it is better

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to add comments marking the remaining "legacy" code paths (dealing with l/ symlinks) as such, so they can be removed in the future. Otherwise it's hard to grok for future readers as it is -- we don't create any links but we handle them.

One other thing is, I'm not sure if we need the entire fork/exec open lower fds. This is a further optimization but maybe we can just do it in the same process? I mean, the fork/exec was done because of chdir, but it looks like we no longer need it (yet it's left there, and that's another part of "legacy" code I guess). So, maybe we should call mountOverlayFrom when working with legacy l/ symlinks, and do a via-fds in the same process?

As for the tests, I think this is covered by existing @test "allow storing images with more than 127 layers" which creates 300 layers (although I haven't checked it if actually crosses the pagesize threshold).


func (d *Driver) create(id, parent string, opts *graphdriver.CreateOpts, readOnly bool) (retErr error) {
dir, homedir, _ := d.dir2(id, readOnly)
dir, _, _ := d.dir2(id, readOnly)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this removes the only user of the second value returned from dir2, so maybe it makes sense to drop it (perhaps in a separate subsequent commit).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for spotting it, fixed now!

@kolyshkin
Copy link
Contributor

I wonder, do we have a test case that mounts a 500 layer overlay? If not should we have one to ensure mounting always works?

We do have one for 300 layers, see storage/tests/layers.bats, although as I said I haven't checked we're actually crossing the pagesize threshold with it.

@TomSweeneyRedHat
Copy link
Member

@giuseppe do you spin up a PR in Podman to see how this does there?
Other than @kolyshkin 's comments, LGTM, and happy green test buttons.

@giuseppe
Copy link
Member Author

@giuseppe do you spin up a PR in Podman to see how this does there?
Other than @kolyshkin 's comments, LGTM, and happy green test buttons.

fixed, and opened a test PR here: containers/podman#28164

@TomSweeneyRedHat
Copy link
Member

I restarted the failing test here, it looked to be a time out flake.
The Podman PR, however, is very red.

@giuseppe
Copy link
Member Author

I restarted the failing test here, it looked to be a time out flake.
The Podman PR, however, is very red.

that failure is expected because we are not rebuilding Buildah. Buildah fails to find the symlink since we are not creating it anymore.

@giuseppe
Copy link
Member Author

giuseppe commented Mar 4, 2026

@Luap99 @mtrmac @kolyshkin PTAL

@Luap99
Copy link
Member

Luap99 commented Mar 4, 2026

that failure is expected because we are not rebuilding Buildah. Buildah fails to find the symlink since we are not creating it anymore.

Well how do you expect us to get CI passing then? We should not merge this if it breaks CI tests.
Also we cannot really control the update timings of podman/buildah/skopeo so if mixing them breaks them completely we have a big problem for the distributions shipping these tools. We can only control fedora/RHEL.

i.e. this test is using skopeo for example

[+0054s] not ok 39 [010] podman pull image with additional store in 1071ms
         # (in test file test/system/[010-images.bats, line 395](https://github.com/containers/podman/blob/2839aa038a596ef18fcf60a9731517fa8a1a6dc5/test/system/010-images.bats#L395))
         #   `skopeo copy containers-storage:$IMAGE \' failed
         #
<+     > # # podman  image exists quay.io/libpod/testimage:20241011
         # Getting image source signatures
         # Copying blob sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
         # Copying blob sha256:b66a884aaf08f1c410c61682a7072d68a0d837ba8dc16ff13b9509bdceb32fd2
         # time="2026-02-27T11:30:24-06:00" level=fatal msg="reading blob sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef: creating file-getter: readlink /var/lib/containers/storage/overlay/b66a884aaf08f1c410c61682a7072d68a0d837ba8dc16ff13b9509bdceb32fd2/diff: invalid argument"
         # # [teardown]
         # rm: cannot remove '/tmp/CI_M4p5/podman_bats.ozboK6/imagestore/root/overlay': Device or resource busy

@giuseppe
Copy link
Member Author

giuseppe commented Mar 4, 2026

ok let's make it in two phases, I'll change this PR to not use the symlinks but still create them

@giuseppe giuseppe force-pushed the drop-overlay-symlink branch 3 times, most recently from a1bf17b to 171a0bc Compare March 4, 2026 21:08
giuseppe added a commit to giuseppe/libpod that referenced this pull request Mar 7, 2026
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

giuseppe commented Mar 9, 2026

Podman tests pass now

Write a new "lower-layers" file alongside the existing "lower" file.
It stores layer IDs directly instead of l/ symlink references; the
reading side appends "/diff" itself.  When present, lower-layers is
preferred over lower.  The old lower file is still written so that
older tools (buildah, skopeo) continue to work.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the drop-overlay-symlink branch from 171a0bc to b8f4b1f Compare March 17, 2026 13:54
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the new functionality is covered by existing tests, LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants