Fix: When restating prod, clear intervals across all related snapshots, not just promoted ones by erindru · Pull Request #5274 · SQLMesh/sqlmesh

erindru · 2025-09-02T05:01:39Z

This builds on StateSync.get_snapshots_by_names introduced in #5273

The current implementation of prod restatements only prevents dev data from getting promoted to prod for snapshots that are currently "active" in a dev environment.

However, it's possible for snapshots to exist that aren't currently promoted in any dev environment (since they represent an older version of a model) but may get promoted if a user reverts to that older version.

If that revert happens and then the environment is subsequently deployed, the old data may still make its way to prod.

So this PR ensures that all versions of affected snapshots have their intervals cleared, regardless of if they are promoted in an environment or not

erindru · 2025-09-02T05:03:05Z

sqlmesh/core/plan/common.py

+        return list(sorted(self.environment_names))
+
+
+def identify_restatement_intervals_across_snapshot_versions(


This got moved to common because I plan to call it outside the evaluator in an upcoming PR that improves the console output as well as the --explain output around restatements

izeigerman · 2025-09-08T22:00:01Z

sqlmesh/core/plan/common.py

+    }
+
+    # identify the ones that we havent picked up yet, which are the ones that dont exist in any environment
+    if remaining_snapshot_ids := set(all_matching_non_prod_snapshots).difference(


I guess it's worth noting that there's still a slight risk of missing a relevant dependency. For example, if there's a snapshot A' that is not promoted anywhere which has a downstream dependency D which is a model that has been removed from all existing environments, we won't drop intervals for D like we're suppose to because its name won't show up here.

Yes, that's true. We only determine the dependency tree based on promoted snapshots.

I couldn't think of a sane way to push dependency resolution to the db layer, and reading all snapshots to figure this out is a non-starter, so you're right in that this PR improves rather than fully eliminates the current situation.

izeigerman · 2025-09-08T22:01:15Z

sqlmesh/core/plan/common.py

+        # required by StateSync.remove_intervals()
+        # but at this point we have minimized the list by excluding the ones that are already present in prod
+        # and also excluding the ones we have already matched earlier while traversing the environment DAGs
+        remaining_snapshots = state_reader.get_snapshots(snapshot_ids=remaining_snapshot_ids)


Why are we still fetching full snapshots for all remaining snapshots and not just the ones that are full_history_restatement_only?

The handling of full_history_restatement_only happens next on line 183.

This step is just for snapshots that are not promoted in any environment. They may or may not be full_history_restatement_only, at this point we are just identifying them and setting the intervals to clear based on the requested restatements.

Note that Environments only have SnapshotTableInfo available, not full Snapshot, so everything within this method is based on SnapshotTableInfo. The other reason for needing SnapshotTableInfo is that ultimately these get passed to StateSync.remove_intervals(), which requires SnapshotTableInfo at a minimum.

So the order currently goes:

Iterate through each environment, pick up any snapshots and downstream dependencies matching the requested restatements and add the SnapshotId/SnapshotTableInfo snapshot_intervals_to_clear.

Then, based on name, pick up any remaining snapshots that weren't promoted in any environment, and add the SnapshotId/SnapshotTableInfo to snapshot_intervals_to_clear. In order to get SnapshotTableInfo for them, we have to look up the full Snapshot record. Note that we extend loaded_snapshots with these so we can use them in the next step.

Next, for everything we have identified in snapshot_intervals_to_clear, we check if any are full_history_restatement_only and if they are selectively widen intervals for those ones. To selectively widen, we need full snapshots to pass to Snapshot.get_removal_interval, so we fetch full Snapshots here.

Snapshot-fetching optimizations exist in the following places:

remaining_snapshot_ids are just the ones we haven't already fetched. The ones we have already fetched are in snapshot_intervals_to_clear so we do a set difference to limit to just the additional ones we need and put those in remaining_snapshot_ids.

After fetching full snapshots for remaining_snapshot_ids to get their SnapshotTableInfo's, we add them to the list of loaded_snapshots so they dont need to be looked up again,

When handling full_history_restatement_only, we only fetch full Snapshots for the ones that don't already exist in loaded_snapshots

And of course if any of the affected snapshot id's had ever been loaded at a different time, they will already exist in the disk cache

In order to get SnapshotTableInfo for them, we have to look up the full Snapshot record. Note that we extend loaded_snapshots with these so we can use them in the next step.

Didn't we introduce SnapshotIdAndVersion precisely to avoid fetching full snapshots here?

The other reason for needing SnapshotTableInfo is that ultimately these get passed to StateSync.remove_intervals(), which requires SnapshotTableInfo at a minimum.

I'm pretty sure we can update that interface to accept SnapshotIdAndVersion, no?

Ok, i've:

added kind_name to SnapshotIdAndVersion (so we can determine full_history_restatement_only without fetching full Snapshots)

Created a SnapshotIdAndVersionLike type that encompasses SnapshotIdAndVersion plus its supersets

reduced the StateSync.remove_intervals api to only require SnapshotIdAndVersionLike rather than SnapshotInfoLike

erindru · 2025-09-09T22:56:13Z

sqlmesh/core/snapshot/definition.py


    name: str
    version: str
+    kind_name: t.Optional[ModelKindName] = None


Note: this is optional because it's defined as optional on SnapshotTableInfo and I wanted to keep treating SnapshotTableInfo as a superset

…s, not just promoted ones

izeigerman · 2025-09-10T16:54:08Z

sqlmesh/core/plan/common.py

+        # So for now, these are not considered
+        s_id
+        for s_id, s in snapshot_intervals_to_clear.items()
+        if s.snapshot.kind_name and s.snapshot.kind_name.full_history_restatement_only


I'm pretty sure SnapshotIdAndVersion can implement ModelKindMixin and then you can continue doing just snapshot.full_history_restatement_only .

Good idea, that didnt occur to me. Done

izeigerman

very nice 👍

erindru commented Sep 2, 2025

View reviewed changes

erindru mentioned this pull request Sep 3, 2025

Feat: prevent other processes seeing missing intervals during restatement #5285

Merged

erindru force-pushed the erin/state-sync-snapshots-by-name branch 3 times, most recently from 6040d43 to 48d45aa Compare September 7, 2025 21:10

erindru force-pushed the erin/fix-restatement-clear-across-all-environments branch from 042e3a1 to f33490a Compare September 7, 2025 23:54

erindru force-pushed the erin/state-sync-snapshots-by-name branch from 48d45aa to f234322 Compare September 8, 2025 00:49

Base automatically changed from erin/state-sync-snapshots-by-name to main September 8, 2025 01:36

erindru force-pushed the erin/fix-restatement-clear-across-all-environments branch from f33490a to 81bac93 Compare September 8, 2025 01:46

erindru marked this pull request as ready for review September 8, 2025 01:46

izeigerman reviewed Sep 8, 2025

View reviewed changes

erindru force-pushed the erin/fix-restatement-clear-across-all-environments branch from 81bac93 to 1438970 Compare September 9, 2025 22:48

erindru commented Sep 9, 2025

View reviewed changes

erindru added 2 commits September 9, 2025 23:26

Fix: When restating prod, clear intervals across all related snapshot…

c8c4575

…s, not just promoted ones

PR feedback

924e59e

erindru force-pushed the erin/fix-restatement-clear-across-all-environments branch from 1438970 to 924e59e Compare September 9, 2025 23:39

izeigerman reviewed Sep 10, 2025

View reviewed changes

Implement ModelKindMixin on SnapshotIdAndVersion

c7e3fa5

izeigerman approved these changes Sep 10, 2025

View reviewed changes

erindru merged commit 203b74e into main Sep 10, 2025
36 checks passed

erindru deleted the erin/fix-restatement-clear-across-all-environments branch September 10, 2025 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: When restating prod, clear intervals across all related snapshots, not just promoted ones#5274

Fix: When restating prod, clear intervals across all related snapshots, not just promoted ones#5274
erindru merged 3 commits intomainfrom
erin/fix-restatement-clear-across-all-environments

erindru commented Sep 2, 2025 •

edited

Loading

Uh oh!

erindru Sep 2, 2025

Uh oh!

izeigerman Sep 8, 2025

Uh oh!

erindru Sep 8, 2025

Uh oh!

izeigerman Sep 8, 2025

Uh oh!

erindru Sep 8, 2025 •

edited

Loading

Uh oh!

izeigerman Sep 9, 2025

Uh oh!

erindru Sep 9, 2025

Uh oh!

erindru Sep 9, 2025

Uh oh!

izeigerman Sep 10, 2025

Uh oh!

erindru Sep 10, 2025

Uh oh!

izeigerman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return list(sorted(self.environment_names))


		def identify_restatement_intervals_across_snapshot_versions(

Conversation

erindru commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erindru Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

izeigerman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erindru commented Sep 2, 2025 •

edited

Loading

erindru Sep 8, 2025 •

edited

Loading