Skip to content

fix: Handle wal2json enum type quoting issue in LOG_BASED replication#736

Merged
edgarrmondragon merged 6 commits intoMeltanoLabs:mainfrom
ksohail22:fix/handle-wal2json-enum-type-quotingissue-log-based-replication
Mar 3, 2026
Merged

fix: Handle wal2json enum type quoting issue in LOG_BASED replication#736
edgarrmondragon merged 6 commits intoMeltanoLabs:mainfrom
ksohail22:fix/handle-wal2json-enum-type-quotingissue-log-based-replication

Conversation

@ksohail22
Copy link
Contributor

@ksohail22 ksohail22 commented Feb 26, 2026

Summary

  • Problem: wal2json outputs PostgreSQL enum type names with unescaped double quotes (e.g., "type":""EnumName"") which produces invalid JSON. This causes json.JSONDecodeError in _parse_message_payload, resulting in the entire message being silently dropped (returning {}).
  • Fix: Before giving up on a malformed payload, attempt to fix the known wal2json enum quoting pattern (""EnumName"""EnumName") using a regex substitution, then retry JSON parsing. Only logs and returns {} if the retry also fails.
  • Adds _fix_wal2json_enum_quotes() helper method to PostgresLogBasedStream.

Test plan

  • Verify LOG_BASED replication works correctly with tables containing enum columns
  • Verify payloads without enum types are unaffected (no regression)
  • Verify truly malformed payloads still log a warning and return {}
  • Existing test suite passes

kashif-se and others added 4 commits February 25, 2026 21:57
…iscover_streams

The `connection_parameters` attribute is set as a side effect of the
`connector` cached property. In `discover_streams`, accessing
`self.connection_parameters` (for LOG_BASED streams) without first
ensuring `self.connector` has been called could raise an AttributeError.

This change accesses the connector once at the top of the method, stores
it in a local variable, and reuses it — guaranteeing initialization
order and avoiding redundant cached property lookups in the loop.

Co-authored-by: Cursor <cursoragent@cursor.com>
…connection-parameters-in-discover-streams

fix: Initialize connector before accessing connection_parameters in d…
@edgarrmondragon edgarrmondragon self-assigned this Feb 26, 2026
@edgarrmondragon edgarrmondragon added the bug Something isn't working label Feb 26, 2026
Copy link
Member

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Just two comments. Let me know what you think.

Comment on lines +401 to +407
try:
message_payload = json.loads(fixed_payload)
except json.JSONDecodeError:
self.logger.warning(
"A message payload of %s could not be converted to JSON",
message.payload,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perf: this adds another try/except block to each iteration of the log message loop. Even if the performance hit is measurable, having the sync crash is worse.

I %100 need to add some sort of automatic benchmarks for general and log-based performance here.

@edgarrmondragon
Copy link
Member

Thanks @ksohail22!

@edgarrmondragon edgarrmondragon added this pull request to the merge queue Mar 3, 2026
Merged via the queue into MeltanoLabs:main with commit 00bb6fc Mar 3, 2026
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants