Skip to content

storage: fix some bugs in the additional layer store#695

Open
giuseppe wants to merge 3 commits intocontainers:mainfrom
giuseppe:image-layer-store-fixes
Open

storage: fix some bugs in the additional layer store#695
giuseppe wants to merge 3 commits intocontainers:mainfrom
giuseppe:image-layer-store-fixes

Conversation

@giuseppe
Copy link
Member

more details in each commit

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@github-actions github-actions bot added the storage Related to "storage" package label Mar 12, 2026
Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

storage/store.go Outdated
adriver = a
return nil
}(); err != nil {
if err := s.startUsingGraphDriver(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphLock does not protect the graph driver state against concurrent layer creation / deletion / …— see how almost all code here releases it in getLayerStore and the like before doing ~anything.

That’s really the job of the layer store lock.

Yes, the shutdown logic is weird (we can ~legitimately have two concurrent layer store objects for the same store, and that happens in some code paths)

AFAICS Shutdown holds the layer store lock as well, for the whole duration, so holding the layer store lock after obtaining a driver object should be sufficient for safety. But, really, keeping to the normal layering model and indirecting the driver call through a layerStore would be much more idiomatic and much more clearly “not wrong, or wrong the same way as everything else”.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to do more here?

Right now it seems that the implementation of the lookup is safe enough to run without any locking (anyway the outcome can be invalidated after we unlock and this function returns). But architecturally it’s unclean, generally the drivers trust the higher layers to enforce RW locking.

I don’t feel strongly either way, the other parts of this PR are valuable on their own.

@@ -0,0 +1,96 @@
//go:build linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having any test coverage here is always good…

Given how this hard-codes specifics of the overlay format, moving the test to drivers/overlay might be cleaner. That would also allow us to use a test-only override for the FUSE checks, without affecting runtime behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped the controversial FUSE checks patch

// filesystem. The GC notification protocol (notifyUseAdditionalLayer /
// notifyReleaseAdditionalLayer) only works when the additional layer store is a
// FUSE filesystem that intercepts the operations and returns ENOENT.
func isAdditionalLayerStoreFUSE(p string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m rather unhappy about this run-time condition.

I think the ALS feature makes little sense without FUSE (it requires both diff and blob, storing everything twice otherwise!). And the FUSE interface is already limiting us (e.g. no version negotiation, very annoying to pass more than one parameter) so I think moving towards the direction of static filesystems is not where I’d like this to go.

[Well I’d like to nuke ALS from the codebase, but that’s not happening.]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this change as I thought it could be useful to have a static list of OCI layers, that could be used without any additional change (or to require a FUSE filesystem). Well, they are already usable, we just get a warning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that works until you try to push… Yes, that could be useful to some people, but it’s another gotcha for everyone to keep track of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the assumption is that the list of layers can only grow and that layers are never deleted, since there is no refcounting, but that is true also for additional images stores.

In any case I don't have any immediate use case, and the cost of not doing it is just a silly warning, so I've just dropped it.

…rors

When LookupAdditionalLayer successfully looks up a layer from the
additional layer store but then fails on Info() or JSON decoding, the
drivers.AdditionalLayer is never released. This leaks the reference
counter that was incremented during lookup, preventing the additional
layer store from garbage collecting the layer. Call Release() on
error paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the image-layer-store-fixes branch from 0188ec8 to 6ade24d Compare March 12, 2026 21:10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the image-layer-store-fixes branch from 6ade24d to adbda5a Compare March 12, 2026 21:26
Copy link
Member

@TomSweeneyRedHat TomSweeneyRedHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
and happy green test buttons

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants