Skip to content

Storage version#487

Merged
rkistner merged 34 commits intomainfrom
storage-version
Feb 18, 2026
Merged

Storage version#487
rkistner merged 34 commits intomainfrom
storage-version

Conversation

@rkistner
Copy link
Contributor

@rkistner rkistner commented Feb 4, 2026

This introduces a "storage_version" config, for both MongoDB and Postgres storage.

The basic idea is to move away from migrations that need to be run upfront, to a storage version that is specific to a sync rules version. So when upgrading the service version, there is no need to run large migrations on existing data. Instead, when you deploy a new sync rules version, the new storage version is used for that.

Pros:

  1. We can make significant changes to large collections such as bucket_data, without running any expensive migrations upfront.
  2. There is no conflict between service versions before/after the migration. For example, if the migration runs before running the new service version, the old version had to be compatible with the post-migration data, otherwise it would cause issues.
  3. Storage version downgrades are possible, by first deploying sync rules with an older storage version (not implemented yet).

Cons:

  1. We need to support older storage versions for a significant period. We can eventually deprecate old storage versions in new major releases.

This initial implementation uses the storage version for two features:

  1. A guarantee that checksums are always stored as Long, which gives a small performance improvement for checksum calculations. See Fix checksum calculations in large buckets with > 4m rows #282 for context on the original issue.
  2. Auto-enabling versioned bucket names, instead of requiring an opt-in in the sync rules file.

These are fairly minor storage changes to start with. However, the plan is to use this for incremental reprocessing (#468), which may introduce much larger storage changes.

Collections

There are no actual changes to collections here, but generally going forward, we'd have "static" collections (no change based on version), and "versioned" collections - where we'd use different collections based on the storage version, if affected by the storage version. This is not final - we can always introduce more changes based on storage_version, but this generally explains the expected changes.

Still TBD how we manage changes to static collections. For now we can keep on using migrations, but we may eventually replace that mechanism with something that can handle downgrades better.

Static collections:

  1. migrations
  2. instance
  3. sync_rules (the actual fields used may change based on storage version)
  4. locks
  5. op_id_sequence (some of this usage may change based on storage version)
  6. connection_report_events (TBD)

Versioned collections:

  1. bucket_data
  2. bucket_parameters
  3. bucket_state
  4. current_data
  5. checkpoint_events
  6. source_tables
  7. write_checkpoints
  8. custom_write_checkpoints

Downgrades

A downgrade to a lower storage version would always need a sync rule reprocessing. In theory, this can happen before or after downgrading the service version (to one that doesn't support the latest storage version). So you generally have these options:

1. Don't upgrade storage version

Upgrade service version, but don't upgrade storage version. Can downgrade without issues.

The caveat is that storage version upgrade is currently automatic when you update the sync rules - can't opt out of that yet. We can add support for opting out in the future.

2. Downgrade storage before downgrading service.

  1. Upgrade service.
  2. Update sync rules, triggering storage version upgrade.
  3. Realize you want to downgrade.
  4. Update sync rules, with lower storage version. Not supported yet, but we can support this in the future
  5. Downgrade service, with no downtime.

3. Downgrade storage after downgrading service

  1. Upgrade service.
  2. Update sync rules, triggering storage version upgrade.
  3. Downgrade service.
  4. Storage version unsupported - replication and API processes fail, causes downtime.
  5. Re-process sync rules, resulting in the lower storage version. We can consider triggering this automatically, but have to think through the implications.
  6. Once reprocessing completed, users can sync again.

4. Downgrading to lower migration version

Just for reference

Currently this blocks the migration process. Theoretically it's possible to down-migrate first using the newer service version, but I'm not sure that process actually works. Not supported at all on the cloud service at the moment - have to stop and start the instance, which re-creates the storage from scratch and causes downtime.

Depending on the actual migrations, there may be consistency issues in the process.

General comments

There is remaining work to implement to make downgrading possible without downtime. We'd have to think how exactly we expose this - for example config options, or an API. Either approach would likely require documenting storage version compatibility.

However, the current state with storage is no worse than we have with the migration system.

Test changes

This now runs most tests with both storage version 1 and 2. This affects bucket names, so the tests have some refactoring to cater for that.

@changeset-bot
Copy link

changeset-bot bot commented Feb 4, 2026

🦋 Changeset detected

Latest commit: b69bf69

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages
Name Type
@powersync/service-module-postgres-storage Minor
@powersync/service-module-mongodb-storage Minor
@powersync/service-core-tests Minor
@powersync/service-module-postgres Minor
@powersync/service-module-mongodb Minor
@powersync/service-core Minor
@powersync/service-module-mssql Minor
@powersync/service-module-mysql Minor
@powersync/service-sync-rules Minor
@powersync/service-errors Patch
@powersync/service-schema Minor
@powersync/service-image Minor
@powersync/service-module-core Patch
test-client Patch
@powersync/service-jpgwire Patch
@powersync/lib-services-framework Patch
@powersync/lib-service-postgres Patch
@powersync/service-rsocket-router Patch
@powersync/lib-service-mongodb Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rkistner
Copy link
Contributor Author

rkistner commented Feb 4, 2026

@simolus3 @stevensJourney It's not urgent to get this out, but I'd like to get your input on this approach for managing storage versions, in preparation for changes we'd need for incremental reprocessing.

Copy link
Collaborator

@stevensJourney stevensJourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this approach makes sense and looks good to me.

Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main question is whether we perhaps want to define "common" storage versions in a shared package to reduce duplication once we implement this for Postgres.

Some aspects of storage versions (like storing checksums as long values in this case) are specific to the bucket storage implementation, but others (versioned bucket ids) are essentially a "version of the sync service used in this deployment" field that could be shared.

Let's say we had something like this in service-core:

/**
 * Changes that can be enabled when deploying new sync configurations, but must be preserved for existing deployments.
 */
export interface CommonStorageConfig {
  /**
   * Whether versioned bucket names are automatically enabled.
   *
   * If this is false, bucket names may still be versioned depending on the sync config.
   */
  versionedBuckets: boolean;
}

export const COMMON_STORAGE_CONFIG_LEGACY: CommonStorageConfig = Object.freeze({versionedBuckets: false});
export const COMMON_STORAGE_CONFIG_V1: CommonStorageConfig = Object.freeze({versionedBuckets: true});

In module-mongodb-storage, each StorageConfig would then have a common: CommonStorageConfig field pointing to the corresponding constant defined in service-core.

This would allow us to make CommonStorageConfig a field on the PersistedSyncRulesContent interface (Postgres would unconditionally use COMMON_STORAGE_CONFIG_LEGACY for now). I'm mainly suggesting this in anticipation of #498, but since we will end up having something like the versionedBuckets option across all storage implementations anyway, it feels right to put that into a shared package.

@rkistner
Copy link
Contributor Author

My main question is whether we perhaps want to define "common" storage versions in a shared package to reduce duplication once we implement this for Postgres.

Some aspects of storage versions (like storing checksums as long values in this case) are specific to the bucket storage implementation, but others (versioned bucket ids) are essentially a "version of the sync service used in this deployment" field that could be shared.

I'm starting to think a common storage version sequence can help. We'll likely expose the storage version to the developer, since they need to be aware of that for certain upgrades or downgrades. And it would make documentation a lot simpler if we can refer to "storage version 5" instead of "storage version 5 for mongodb, 3 for postgres". The same could apply to the cloud dashboard, where the developer should not have to care whether it's a mongodb or postgres storage version that they're seeing/specifying.

I don't think it matters than much that certain storage version features are only applicable on one of the implementations. We can always bump the storage version for both, even if it only really affects one of them.

One implication in practice is we do something like the "versioned bucket names" change only for MongoDB here, we'd either have Postgres not support that storage version yet, or we need to be more pro-active in adding support for Postgres as well.

I'll see if I can update the PR to use common storage versions, and use it for Postgres as well.

@rkistner rkistner changed the title [MongoDB Storage] Storage version Storage version Feb 16, 2026
@rkistner rkistner marked this pull request as ready for review February 17, 2026 10:46
@rkistner rkistner merged commit 8bd83e8 into main Feb 18, 2026
28 checks passed
@rkistner rkistner deleted the storage-version branch February 18, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments