Feat(sqlmesh_dbt): Implement --full-refresh#5370
Conversation
sqlmesh/core/plan/stages.py
Outdated
| if not plan.restatements or plan.is_dev: | ||
| # The RestatementStage to clear intervals from state across all environments is not needed for plans against dev, only prod | ||
| return None | ||
| if plan.restatements and plan.clear_restated_intervals_across_model_versions: |
There was a problem hiding this comment.
I'd add an assert here to ensure plan.is_dev. I think it's critical enough that we want to make sure we never restate cross version in dev
There was a problem hiding this comment.
ensure
plan.is_dev
Do you mean ensure plan.is_prod since we want to emit RestatementStage on prod plans only?
I've updated the code to raise an error if plan.is_dev which indicates that the flag was not set properly upstream
sqlmesh/core/context.py
Outdated
|
|
||
| # When handling prod restatements, only clear intervals from other model versions if we are using full virtual environments | ||
| # If we are not, then there is no point, because none of the data in dev environments can be promoted by definition | ||
| clear_restated_intervals_across_model_versions = ( |
There was a problem hiding this comment.
versions seems overloaded here. I initially interpreted this as a snapshot version, when in reality this refers to model version aka "snapshots". Should we call this restate_all_snapshots?
There was a problem hiding this comment.
Well, restate is overloaded as well. The term refers to the re-processing of data, but actually that's implemented as a backfill.
restate in the context of a plan just refers to clearing intervals from state, which is very much non-obvious from the word restate which is why I had clear_restated_intervals to describe what it actually does.
It's also not all snapshots - just ones that arent promoted in prod.
Maybe clear_restated_intervals_from_dev_snapshots?
There was a problem hiding this comment.
It's also not all snapshots - just ones that arent promoted in prod.
The prod ones are cleared by definition. They just get immediately backfilled. The fact that we don't update state for those is an implementation detail and is not relevant here.
There was a problem hiding this comment.
I guess we think of "cleared" differently then. I tend to think of it in terms of actually physically deleting records from state.
But no problem, I can call this flag restate_all_snapshots
sqlmesh/core/plan/builder.py
Outdated
| ) | ||
|
|
||
| def _ensure_no_new_snapshots_with_restatements(self) -> None: | ||
| if self._always_include_local_changes: |
There was a problem hiding this comment.
Should we just remove this check and rely on context to provide a valid ContextDiff? This way we don't need to worry about passing always_include_local_changes just to perform the check and can instead rely on the correctly formed ContextDiff instance.
There was a problem hiding this comment.
Sure, I did it this way because of the test_new_snapshots_with_restatements test that specifically validates the ContextDiff had been set up as expected.
I've updated the PR to put the burden on the caller and removed the test
27a1204 to
ba111a2
Compare
This PR adds the following options to
PlanBuilder:always_include_local_changes- essentially overrides "force no diff" to False if restatements are present, so that theContextDiffalways includes local changes. It's also used to short-circuit the check inPlanBuilderto not throw aModel changes and restatements can't be a part of the same planerrorclear_restated_intervals_across_model_versions- moves the logic on whether or not to produceRestatementStage(which clears non-prod intervals from state) from the stage builder back to the plan builder where more config information, such as the VDE mode, is available.Note that these new flags are exposed on
context.plan_builderand notcontext.plan. The reason is so they can be kept as escape hatches for dbt compatibility and not be exposed in native/standard plan.Utilizing these new flags, this PR also implements the
--full-refreshoption on thesqlmesh_dbtcli