Skip to content

Conversation

@jeremy
Copy link

@jeremy jeremy commented May 30, 2025

When using S3 or GCS cache with bucket lifecycle rules to automatically delete old objects, imageproxy can get "poisoned" with empty cache entries. This happens because the cache returns empty data ([]byte{}, true) instead of a cache miss (nil, false), causing httpcache to serve blank responses.

This PR fixes the issue by treating zero-length objects as cache misses in both S3 and GCS cache implementations.

  • Add empty object check in s3cache.Get() and gcscache.Get()
  • Refactor S3 cache to use s3iface.S3API interface for testability
  • Refactor GCS cache to use custom interfaces for testability (bit verbose; mirrors the S3 approach)
  • Add NewWithClient/NewWithBucket constructors for dependency injection
  • Add comprehensive test coverage for both cache implementations, necessary to confirm expected behavior when a cache object is deleted

Fixes #248

/cc @Ladybiss @mdkent

When S3/GCS lifecycle rules delete cached objects, the cache might
return empty data instead of indicating a cache miss. This causes
httpcache to serve blank responses with "unknown format" errors.

This change treats zero-length objects as cache misses, ensuring
imageproxy fetches fresh copies from the origin when cached objects
are deleted by lifecycle rules, and refactors S3 and GCS caches to
use interfaces so this behavior can be tested.

Fixes willnorris#248
@willnorris
Copy link
Owner

Is this unique to s3 and gcs, or is this a layer that should sit in front of whatever caching backend is in use?

@codecov
Copy link

codecov bot commented Jun 6, 2025

Codecov Report

Attention: Patch coverage is 70.96774% with 9 lines in your changes missing coverage. Please review.

Project coverage is 67.43%. Comparing base (ef50c1f) to head (1d8936e).
Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
internal/gcscache/gcscache.go 50.00% 9 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ef50c1f) and HEAD (1d8936e). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (ef50c1f) HEAD (1d8936e)
2 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #467       +/-   ##
===========================================
- Coverage   85.21%   67.43%   -17.79%     
===========================================
  Files           6       12        +6     
  Lines         717     1213      +496     
===========================================
+ Hits          611      818      +207     
- Misses         76      357      +281     
- Partials       30       38        +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jeremy
Copy link
Author

jeremy commented Jun 6, 2025

Is this unique to s3 and gcs, or is this a layer that should sit in front of whatever caching backend is in use?

Indeed, this is an odd band-aid over some underlying issue and it doesn't apply to just S3/GCS.

Turns out @flavorjones fixed the root cause in an internal patch that fixed a weird segfault:

Return a more-complete 304 from TransformingTransport.RoundTrip
to address the segfault we're seeing in production. The necessary
conditions for the issue are:

the 303 must return a location URI that changes every time, so that
the real location of the image is never cached. Asana is using a
X-Amz-Security-Token that accomplishes this.

The image responses must match by Etag, so that
imageProxy.should304() returns true causing
TranformingTransport.RoundTrip() to return a bare 304 response.

If those conditions are met, then the bare 304 Response returned will
be used and read by one of the callers.

So let's make it more like a real Response so when it's treated that
way, it doesn't explode.

After his patch, empty bodies stopped showing up in the cache. These weren't due to lifecycle rules at all—they were from imageproxy caching malformed 304s.

@flavorjones could you open a PR with your fix?

@jeremy jeremy closed this Jun 6, 2025
@jeremy jeremy deleted the cache-lifecycle-rules branch June 6, 2025 07:19
@willnorris
Copy link
Owner

After his patch, empty bodies stopped showing up in the cache. These weren't due to lifecycle rules at all—they were from imageproxy caching malformed 304s.

@flavorjones could you open a PR with your fix?

oh, fascinating. Yeah, I'd love to hear more about that issue and see the patch.

@flavorjones
Copy link
Contributor

@flavorjones could you open a PR with your fix?

Sure, will do.

@flavorjones
Copy link
Contributor

PR is #472

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache consistency problem after partial cache clean-up with Apache httpd as upstream

3 participants