Remove support for prefix matching on bucket names#499
Open
Sleepful wants to merge 3 commits intopowersync-ja:mainfrom
Open
Remove support for prefix matching on bucket names#499Sleepful wants to merge 3 commits intopowersync-ja:mainfrom
Sleepful wants to merge 3 commits intopowersync-ja:mainfrom
Conversation
|
… fix Adds a test that verifies the compactor properly handles bucket_name comparisons when using U+FFFF as a sentinel upper bound. Under locale-aware collation (e.g. en_US.UTF-8), alphanumeric strings can sort after U+FFFF, causing the initial pagination query to return zero rows and skip compaction entirely. The test inserts two PUT ops for the same row, then compacts without specifying compactBuckets (triggering the bucketUpper = U+FFFF path). After compaction, the older PUT should be converted to MOVE - which only works correctly with COLLATE "C" forcing byte-order comparison.
The COLLATE regression test (Postgres Compact - COLLATE regression) verified that compaction works when bucket_name is compared against the U+FFFF sentinel under locale-aware collation. This test is redundant with the shared compacting tests in register-compacting-tests.ts: - compacting (1) through (4) all exercise the bucket==null (compact-all) path - They verify MOVE conversion with stronger assertions (validateCompactedBucket) - The extra test was a minimal reproduction of the same code path The sentinel and COLLATE "C" will be removed entirely as part of the compactor refactor, making this regression impossible to trigger. The shared tests provide full coverage of the behavior we care about: MOVE conversion works correctly when compacting all buckets.
…at a time Replace complex bucket range filtering in PostgresCompactor with a cleaner two-method approach: compactAllBuckets() discovers buckets in batches and compactSingleBucket() handles individual bucket compaction. Changes: - Split compactInternal into compactAllBuckets and compactSingleBucket - Remove bucket range string matching (bucketLower/bucketUpper) logic - Add CLI validation requiring full bucket names (e.g., "global[]") - Update documentation to clarify compactBuckets format requirements - Add test coverage for explicit bucket name compaction
9d82ddb to
a9cfb37
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue: #400
Summary
The refactor cleanly removes the
U+FFFFsentinel,COLLATE "C", and prefix matching from the Postgres compactor, replacing them with incremental bucket discovery and per-bucket exact-match compaction.EXPLAINplans confirm optimal index usage with noCOLLATEand noFilterlines.Tradeoffs
Incremental discovery vs. single-pass: The new approach adds one small index-only query per 200 buckets. For extreme deployments (millions of buckets), this adds ~5000 round-trips, but each is sub-millisecond. The plan's analysis is sound — the tradeoff is heavily in favor of correctness and simplicity.
Breaking change for CLI users: Users passing bucket definition names (
-b global) will now get a validation error instead of silently compacting matching buckets. This is intentional and documented. No existing tests used prefix matching, confirming this was an unused feature path.Manual validation
CLI warning message
Run compact tests
Run EXPLAIN
These represent the queries that are used in file
PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres \ -d powersync_storage_test -c "SET search_path TO powersync;" \ -c "SELECT '=== DISCOVERY: incremental bucket enumeration ===' AS query;" \ -c "EXPLAIN SELECT DISTINCT bucket_name FROM bucket_data WHERE group_id = 1 AND bucket_name > '' ORDER BY bucket_name ASC LIMIT 200;" \ -c "SELECT '=== SINGLE BUCKET: no cursor ===' AS query;" \ -c "EXPLAIN SELECT op, op_id, source_table, table_name, row_id, source_key, bucket_name FROM bucket_data WHERE group_id = 1 AND bucket_name = 'global[]' ORDER BY op_id DESC LIMIT 10000;" \ -c "SELECT '=== SINGLE BUCKET: with cursor ===' AS query;" \ -c "EXPLAIN SELECT op, op_id, source_table, table_name, row_id, source_key, bucket_name FROM bucket_data WHERE group_id = 1 AND bucket_name = 'global[]' AND op_id < 50000 ORDER BY op_id DESC LIMIT 10000;"just explainqueryDISCOVERY: incremental bucket enumerationPostgresCompactor.ts:90-98compactAllBuckets()discovery query:SELECT DISTINCT bucket_name ... WHERE bucket_name > $lastBucket ORDER BY bucket_name ASC LIMIT 200SINGLE BUCKET: no cursorPostgresCompactor.ts:126-149compactSingleBucket()on first iteration, whenupperOpIdLimit = BIGINT_MAX(line 123) — effectively no cursor constraintSINGLE BUCKET: with cursorPostgresCompactor.ts:126-149upperOpIdLimit = lastBatchItem.op_id(line 158) makes theAND op_id < $upperOpIdLimitclause a real constraintSkipped validation
I did not run the MongoDB tests, seems like those files were not touched. Let me know if I should run them, or if CI takes care of it. Basically this command:
pnpm --filter @powersync/service-module-mongodb-storage test -- --run