Skip to content

Fix: Don't use SCD type 2 restatement logic in regular runs#4976

Merged
themisvaltinos merged 2 commits intomainfrom
themis/scd2_fix
Jul 16, 2025
Merged

Fix: Don't use SCD type 2 restatement logic in regular runs#4976
themisvaltinos merged 2 commits intomainfrom
themis/scd2_fix

Conversation

@themisvaltinos
Copy link
Contributor

This update introduces an is_restatement flag and logic in run_merged_intervals to distinguish between restatements (which should clean up historical data from the restatement start date onwards) and regular runs (which should only append new changes). currently SCD Type 2 models with offset cron greater than the interval start incorrectly used restatement cleanup logic during regular scheduled runs, which can be demonstrated from the added test_scd_type_2_regular_run_with_offset test.

restatements_by_snapshot_id = {
stage.all_snapshots[name].snapshot_id: interval
for name, interval in plan.restatements.items()
if name in stage.all_snapshots
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this if needed? Is there a reason to believe a name could be in restatements but not in snapshots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I was simply being cautious since I thought this stage.all_snapshots[name].snaphot_id would raise a key error in an edge case. but following the control flow when we're here it should be present

is_restatement = (
restatements is not None
and snapshot.snapshot_id in restatements
and start >= restatements[snapshot.snapshot_id][0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this start check is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a partial restatement shouldn’t happen in an interval that the interval’s start is earlier than the start date of the restatement, but that’s not actually the case since the logic can handle it so i removed it

@themisvaltinos themisvaltinos merged commit 9b26320 into main Jul 16, 2025
27 checks passed
@themisvaltinos themisvaltinos deleted the themis/scd2_fix branch July 16, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants