storage: fix some bugs in the additional layer store#695
storage: fix some bugs in the additional layer store#695giuseppe wants to merge 3 commits intocontainers:mainfrom
Conversation
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
storage/store.go
Outdated
| adriver = a | ||
| return nil | ||
| }(); err != nil { | ||
| if err := s.startUsingGraphDriver(); err != nil { |
There was a problem hiding this comment.
graphLock does not protect the graph driver state against concurrent layer creation / deletion / …— see how almost all code here releases it in getLayerStore and the like before doing ~anything.
That’s really the job of the layer store lock.
Yes, the shutdown logic is weird (we can ~legitimately have two concurrent layer store objects for the same store, and that happens in some code paths)
AFAICS Shutdown holds the layer store lock as well, for the whole duration, so holding the layer store lock after obtaining a driver object should be sufficient for safety. But, really, keeping to the normal layering model and indirecting the driver call through a layerStore would be much more idiomatic and much more clearly “not wrong, or wrong the same way as everything else”.
There was a problem hiding this comment.
Do you want to do more here?
Right now it seems that the implementation of the lookup is safe enough to run without any locking (anyway the outcome can be invalidated after we unlock and this function returns). But architecturally it’s unclean, generally the drivers trust the higher layers to enforce RW locking.
I don’t feel strongly either way, the other parts of this PR are valuable on their own.
| @@ -0,0 +1,96 @@ | |||
| //go:build linux | |||
There was a problem hiding this comment.
Having any test coverage here is always good…
Given how this hard-codes specifics of the overlay format, moving the test to drivers/overlay might be cleaner. That would also allow us to use a test-only override for the FUSE checks, without affecting runtime behavior.
There was a problem hiding this comment.
I've dropped the controversial FUSE checks patch
storage/drivers/overlay/overlay.go
Outdated
| // filesystem. The GC notification protocol (notifyUseAdditionalLayer / | ||
| // notifyReleaseAdditionalLayer) only works when the additional layer store is a | ||
| // FUSE filesystem that intercepts the operations and returns ENOENT. | ||
| func isAdditionalLayerStoreFUSE(p string) bool { |
There was a problem hiding this comment.
I’m rather unhappy about this run-time condition.
I think the ALS feature makes little sense without FUSE (it requires both diff and blob, storing everything twice otherwise!). And the FUSE interface is already limiting us (e.g. no version negotiation, very annoying to pass more than one parameter) so I think moving towards the direction of static filesystems is not where I’d like this to go.
[Well I’d like to nuke ALS from the codebase, but that’s not happening.]
There was a problem hiding this comment.
I added this change as I thought it could be useful to have a static list of OCI layers, that could be used without any additional change (or to require a FUSE filesystem). Well, they are already usable, we just get a warning
There was a problem hiding this comment.
I think that works until you try to push… Yes, that could be useful to some people, but it’s another gotcha for everyone to keep track of.
There was a problem hiding this comment.
well the assumption is that the list of layers can only grow and that layers are never deleted, since there is no refcounting, but that is true also for additional images stores.
In any case I don't have any immediate use case, and the cost of not doing it is just a silly warning, so I've just dropped it.
…rors When LookupAdditionalLayer successfully looks up a layer from the additional layer store but then fails on Info() or JSON decoding, the drivers.AdditionalLayer is never released. This leaks the reference counter that was incremented during lookup, preventing the additional layer store from garbage collecting the layer. Call Release() on error paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
0188ec8 to
6ade24d
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
6ade24d to
adbda5a
Compare
TomSweeneyRedHat
left a comment
There was a problem hiding this comment.
LGTM
and happy green test buttons
more details in each commit