Skip to content

Remove support for prefix matching on bucket names#499

Open
Sleepful wants to merge 3 commits intopowersync-ja:mainfrom
Sleepful:jv-2026-02-10-compaction
Open

Remove support for prefix matching on bucket names#499
Sleepful wants to merge 3 commits intopowersync-ja:mainfrom
Sleepful:jv-2026-02-10-compaction

Conversation

@Sleepful
Copy link

@Sleepful Sleepful commented Feb 11, 2026

Issue: #400

Summary

The refactor cleanly removes the U+FFFF sentinel, COLLATE "C", and prefix matching from the Postgres compactor, replacing them with incremental bucket discovery and per-bucket exact-match compaction. EXPLAIN plans confirm optimal index usage with no COLLATE and no Filter lines.

Tradeoffs

  1. Incremental discovery vs. single-pass: The new approach adds one small index-only query per 200 buckets. For extreme deployments (millions of buckets), this adds ~5000 round-trips, but each is sub-millisecond. The plan's analysis is sound — the tradeoff is heavily in favor of correctness and simplicity.

  2. Breaking change for CLI users: Users passing bucket definition names (-b global) will now get a validation error instead of silently compacting matching buckets. This is intentional and documented. No existing tests used prefix matching, confirming this was an unused feature path.

Manual validation

CLI warning message

node service/lib/entry.js compact -c ../powersync.yaml -b global
info: Successfully registered Module Core.
error: Invalid bucket names: global. Pass full bucket names (e.g., "global[]"), not bucket definition names (e.g., "global").
error: Recipe `ps` failed on line 7 with exit code 1

Run compact tests

PG_STORAGE_TEST_URL="postgres://postgres:postgres@localhost:5432/powersync_storage_test" pnpm --filter @powersync/service-module-postgres-storage test -- --run

✓ test/src/connection-report-storage.test.ts (14 tests) 36ms

 Test Files  5 passed (5)
      Tests  74 passed (74)
   Start at  01:55:12
   Duration  29.56s (transform 747ms, setup 3.22s, import 3.49s, tests 22.46s, environment 0ms)

Run EXPLAIN

These represent the queries that are used in file

    PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres \
      -d powersync_storage_test -c "SET search_path TO powersync;" \
      -c "SELECT '=== DISCOVERY: incremental bucket enumeration ===' AS query;" \
      -c "EXPLAIN SELECT DISTINCT bucket_name FROM bucket_data WHERE group_id = 1 AND bucket_name > '' ORDER BY bucket_name ASC LIMIT 200;" \
      -c "SELECT '=== SINGLE BUCKET: no cursor ===' AS query;" \
      -c "EXPLAIN SELECT op, op_id, source_table, table_name, row_id, source_key, bucket_name FROM bucket_data WHERE group_id = 1 AND bucket_name = 'global[]' ORDER BY op_id DESC LIMIT 10000;" \
      -c "SELECT '=== SINGLE BUCKET: with cursor ===' AS query;" \
      -c "EXPLAIN SELECT op, op_id, source_table, table_name, row_id, source_key, bucket_name FROM bucket_data WHERE group_id = 1 AND bucket_name = 'global[]' AND op_id < 50000 ORDER BY op_id DESC LIMIT 10000;"
just explain query Source Lines
DISCOVERY: incremental bucket enumeration PostgresCompactor.ts:90-98 compactAllBuckets() discovery query: SELECT DISTINCT bucket_name ... WHERE bucket_name > $lastBucket ORDER BY bucket_name ASC LIMIT 200
SINGLE BUCKET: no cursor PostgresCompactor.ts:126-149 compactSingleBucket() on first iteration, when upperOpIdLimit = BIGINT_MAX (line 123) — effectively no cursor constraint
SINGLE BUCKET: with cursor PostgresCompactor.ts:126-149 Same query on subsequent iterations, after upperOpIdLimit = lastBatchItem.op_id (line 158) makes the AND op_id < $upperOpIdLimit clause a real constraint
                       query
---------------------------------------------------
 === DISCOVERY: incremental bucket enumeration ===
(1 row)

                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..8.17 rows=1 width=32)
   ->  Result  (cost=0.15..8.17 rows=1 width=32)
         ->  Unique  (cost=0.15..8.17 rows=1 width=32)
               ->  Index Only Scan using unique_id on bucket_data  (cost=0.15..8.17 rows=1 width=32)
                     Index Cond: ((group_id = 1) AND (bucket_name > ''::text))
(5 rows)

              query
----------------------------------
 === SINGLE BUCKET: no cursor ===
(1 row)

                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Limit  (cost=0.15..8.17 rows=1 width=200)
   ->  Index Scan Backward using unique_id on bucket_data  (cost=0.15..8.17 rows=1 width=200)
         Index Cond: ((group_id = 1) AND (bucket_name = 'global[]'::text))
(3 rows)

               query
------------------------------------
 === SINGLE BUCKET: with cursor ===
(1 row)

                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.15..8.17 rows=1 width=200)
   ->  Index Scan Backward using unique_id on bucket_data  (cost=0.15..8.17 rows=1 width=200)
         Index Cond: ((group_id = 1) AND (bucket_name = 'global[]'::text) AND (op_id < 50000))
(3 rows)

Skipped validation

I did not run the MongoDB tests, seems like those files were not touched. Let me know if I should run them, or if CI takes care of it. Basically this command: pnpm --filter @powersync/service-module-mongodb-storage test -- --run

@CLAassistant
Copy link

CLAassistant commented Feb 11, 2026

CLA assistant check
All committers have signed the CLA.

@changeset-bot
Copy link

changeset-bot bot commented Feb 11, 2026

⚠️ No Changeset found

Latest commit: a9cfb37

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Sleepful Sleepful marked this pull request as draft February 11, 2026 08:16
… fix

Adds a test that verifies the compactor properly handles bucket_name
comparisons when using U+FFFF as a sentinel upper bound. Under
locale-aware collation (e.g. en_US.UTF-8), alphanumeric strings can
sort after U+FFFF, causing the initial pagination query to return zero
rows and skip compaction entirely.

The test inserts two PUT ops for the same row, then compacts without
specifying compactBuckets (triggering the bucketUpper = U+FFFF path).
After compaction, the older PUT should be converted to MOVE - which
only works correctly with COLLATE "C" forcing byte-order comparison.
The COLLATE regression test (Postgres Compact - COLLATE regression) verified
that compaction works when bucket_name is compared against the U+FFFF sentinel
under locale-aware collation. This test is redundant with the shared compacting
tests in register-compacting-tests.ts:

- compacting (1) through (4) all exercise the bucket==null (compact-all) path
- They verify MOVE conversion with stronger assertions (validateCompactedBucket)
- The extra test was a minimal reproduction of the same code path

The sentinel and COLLATE "C" will be removed entirely as part of the compactor
refactor, making this regression impossible to trigger. The shared tests provide
full coverage of the behavior we care about: 

MOVE conversion works correctly when compacting all buckets.
…at a time

Replace complex bucket range filtering in PostgresCompactor with a cleaner
two-method approach: compactAllBuckets() discovers buckets in batches and
compactSingleBucket() handles individual bucket compaction.

Changes:
- Split compactInternal into compactAllBuckets and compactSingleBucket
- Remove bucket range string matching (bucketLower/bucketUpper) logic
- Add CLI validation requiring full bucket names (e.g., "global[]")
- Update documentation to clarify compactBuckets format requirements
- Add test coverage for explicit bucket name compaction
@Sleepful Sleepful force-pushed the jv-2026-02-10-compaction branch from 9d82ddb to a9cfb37 Compare February 11, 2026 08:55
@Sleepful Sleepful marked this pull request as ready for review February 11, 2026 08:56
Copy link
Contributor

@rkistner rkistner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments