Skip to content

fix: LOG_BASED replication bookmark not advancing between syncs#745

Merged
edgarrmondragon merged 1 commit intoMeltanoLabs:mainfrom
ksohail22:fix/wal-log-flush
Mar 11, 2026
Merged

fix: LOG_BASED replication bookmark not advancing between syncs#745
edgarrmondragon merged 1 commit intoMeltanoLabs:mainfrom
ksohail22:fix/wal-log-flush

Conversation

@ksohail22
Copy link
Contributor

Problem

All LOG_BASED streams in tap-postgres fail to advance their replication_key_value bookmark after a successful sync. This means:

  • Every sync re-reads the same WAL data from the starting LSN, regardless of how many records were processed.
  • The PostgreSQL replication slot is never flushed past the original LSN, causing unbounded WAL growth on source databases.
  • Sync durations grow over time as the accumulated WAL backlog increases.
  • Increased risk of disk exhaustion on production PostgreSQL instances.

INCREMENTAL streams are unaffected — their bookmarks advance correctly.

Root Cause

PostgresLogBasedStream inherits is_sorted = False from the Singer SDK base class and does not override it.

When is_sorted is False, the SDK's increment_state() function writes bookmark updates to a temporary progress_markers buffer rather than directly to replication_key_value in the stream state. These progress markers are supposed to be promoted to the main state at the end of the sync, but this promotion does not succeed — the bookmark remains frozen at its initial value.

This is incorrect for WAL-based replication. PostgreSQL's logical replication protocol delivers messages in strict LSN order — the stream is inherently sorted.

Fix

Set is_sorted = True on PostgresLogBasedStream:

class PostgresLogBasedStream(SQLStream):
    replication_key = "_sdc_lsn"
    is_sorted = True

With this change, increment_state() writes replication_key_value directly into the stream's main state dict after each record. No buffering, no promotion step, no risk of state mismatch.

Impact

  • Bookmark advancement: replication_key_value will correctly advance after every sync.
  • WAL flush: send_feedback(flush_lsn=...) will report the new LSN to PostgreSQL, allowing it to discard consumed WAL segments and reclaim disk space.
  • First run after deploy: Will process the backlog of WAL accumulated since the bookmark was last truly updated. May take longer than usual.
  • Subsequent runs: Will only process new WAL records since the last sync — fast and efficient.
  • Backward compatible: No state format changes. Existing bookmarks continue to work; they will simply start advancing from their current position.

How to Verify

  1. Run any LOG_BASED stream twice after deploying this change.
  2. Compare the replication_key_value in the state between the two runs — it should advance.
  3. Confirm the log message "Stream is assumed to be unsorted, progress is not resumable if interrupted" no longer appears.
  4. Monitor pg_replication_slots.confirmed_flush_lsn on source databases — it should advance after each sync.

@edgarrmondragon edgarrmondragon changed the title Fix: LOG_BASED replication bookmark not advancing between syncs fix: LOG_BASED replication bookmark not advancing between syncs Mar 10, 2026
@edgarrmondragon edgarrmondragon self-assigned this Mar 10, 2026
@edgarrmondragon edgarrmondragon added the bug Something isn't working label Mar 10, 2026
Copy link
Member

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ksohail22!

Just to confirm: records are extracted in increasing order of _sdc_lsn, right?

@ksohail22
Copy link
Contributor Author

@edgarrmondragon Yes, records are always delivered in increasing _sdc_lsn order.

@edgarrmondragon edgarrmondragon added this pull request to the merge queue Mar 11, 2026
Merged via the queue into MeltanoLabs:main with commit f2d4ea4 Mar 11, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants