fix: LOG_BASED replication bookmark not advancing between syncs#745
Merged
edgarrmondragon merged 1 commit intoMeltanoLabs:mainfrom Mar 11, 2026
Merged
Conversation
edgarrmondragon
approved these changes
Mar 10, 2026
Member
edgarrmondragon
left a comment
There was a problem hiding this comment.
Thanks @ksohail22!
Just to confirm: records are extracted in increasing order of _sdc_lsn, right?
Contributor
Author
|
@edgarrmondragon Yes, records are always delivered in increasing _sdc_lsn order. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
All
LOG_BASEDstreams intap-postgresfail to advance theirreplication_key_valuebookmark after a successful sync. This means:INCREMENTAL streams are unaffected — their bookmarks advance correctly.
Root Cause
PostgresLogBasedStreaminheritsis_sorted = Falsefrom the Singer SDK base class and does not override it.When
is_sortedisFalse, the SDK'sincrement_state()function writes bookmark updates to a temporaryprogress_markersbuffer rather than directly toreplication_key_valuein the stream state. These progress markers are supposed to be promoted to the main state at the end of the sync, but this promotion does not succeed — the bookmark remains frozen at its initial value.This is incorrect for WAL-based replication. PostgreSQL's logical replication protocol delivers messages in strict LSN order — the stream is inherently sorted.
Fix
Set
is_sorted = TrueonPostgresLogBasedStream:With this change,
increment_state()writesreplication_key_valuedirectly into the stream's main state dict after each record. No buffering, no promotion step, no risk of state mismatch.Impact
replication_key_valuewill correctly advance after every sync.send_feedback(flush_lsn=...)will report the new LSN to PostgreSQL, allowing it to discard consumed WAL segments and reclaim disk space.How to Verify
replication_key_valuein the state between the two runs — it should advance."Stream is assumed to be unsorted, progress is not resumable if interrupted"no longer appears.pg_replication_slots.confirmed_flush_lsnon source databases — it should advance after each sync.