Add folder_matches filtering for Sentinel-2 ingestion optimization #353
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds folder-level filtering capability to the create-chunks task, specifically to optimize Sentinel-2 ingestion by reducing Azure Blob Storage API calls.
Problem
The Sentinel-2 create-chunks task was walking through ALL year folders (2015-2026) in the blob storage, resulting in creating approx 11 million listBlob API calls even when we only wanted to process 2026 data. This is because the year folder is at depth 4 in the Sentinel-2 structure (UTM/Grid1/Grid2/Year/Month/Day/.SAFE/), unlike Sentinel-1 where year is at depth 1.
Solution
Added two new options
folder_matches: A regex pattern to filter which folders are descended into during the walk
folder_matches_at_depth: Apply the filter only at a specific depth (1-indexed from walk start)
Fixes # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Submitted workflow with folder filtering enabled
Verified create-chunks only walks into 2026 folders
Confirmed reduced API calls and successful item processing
Checklist:
Please delete options that are not relevant.