diff --git a/docs/concepts/models/overview.md b/docs/concepts/models/overview.md index cf57678607..d6356462b4 100644 --- a/docs/concepts/models/overview.md +++ b/docs/concepts/models/overview.md @@ -513,9 +513,12 @@ Some properties are only available in specific model kinds - see the [model conf Must be one of the following values: `allow`, `warn`, `error` (default), or `ignore`. -!!! warning "Ignore is Dangerous" +### on_additive_change +: What should happen when a change to a [forward-only model](../../guides/incremental_time.md#forward-only-models) or incremental model in a [forward-only plan](../plans.md#forward-only-plans) causes an additive modification to the table schema (i.e., adding new columns, modifying column data types in compatible ways, ect.). - `ignore` is dangerous since it can result in error or data loss. It likely should never be used but could be useful as an "escape-hatch" or a way to workaround unexpected behavior. + SQLMesh checks for additive changes at plan time based on the model definition and run time based on the model's underlying physical tables. + + Must be one of the following values: `allow` (default), `warn`, `error`, or `ignore`. ### disable_restatement : Set this to true to indicate that [data restatement](../plans.md#restatement-plans) is disabled for this model. diff --git a/docs/concepts/plans.md b/docs/concepts/plans.md index 91616a6e7e..defcd06c0d 100644 --- a/docs/concepts/plans.md +++ b/docs/concepts/plans.md @@ -355,9 +355,25 @@ Some model changes destroy existing data in a table. SQLMesh automatically detec Forward-only plans treats all of the plan's model changes as forward-only. In these plans, SQLMesh will check all modified incremental models for destructive schema changes, not just forward-only models. -SQLMesh determines what to do for each model based on this setting hierarchy: the [model's `on_destructive_change` value](../guides/incremental_time.md#destructive-changes) (if present), the `on_destructive_change` [model defaults](../reference/model_configuration.md#model-defaults) value (if present), and the SQLMesh global default of `error`. +SQLMesh determines what to do for each model based on this setting hierarchy: -If you want to temporarily allow destructive changes to models that don't allow them, use the `plan` command's `--allow-destructive-model` selector to specify which models. Learn more about model selectors [here](../guides/model_selection.md). +- **For destructive changes**: the [model's `on_destructive_change` value](../guides/incremental_time.md#schema-changes) (if present), the `on_destructive_change` [model defaults](../reference/model_configuration.md#model-defaults) value (if present), and the SQLMesh global default of `error` +- **For additive changes**: the [model's `on_additive_change` value](../guides/incremental_time.md#schema-changes) (if present), the `on_additive_change` [model defaults](../reference/model_configuration.md#model-defaults) value (if present), and the SQLMesh global default of `allow` + +If you want to temporarily allow destructive changes to models that don't allow them, use the `plan` command's `--allow-destructive-model` selector to specify which models. +Similarly, if you want to temporarily allow additive changes to models configured with `on_additive_change=error`, use the `--allow-additive-model` selector. + +For example, to allow destructive changes to all models in the `analytics` schema: +```bash +sqlmesh plan --forward-only --allow-destructive-model "analytics.*" +``` + +Or to allow destructive changes to multiple specific models: +```bash +sqlmesh plan --forward-only --allow-destructive-model "sales.revenue_model" --allow-destructive-model "marketing.campaign_model" +``` + +Learn more about model selectors [here](../guides/model_selection.md). ### Effective date Changes that are part of the forward-only plan can also be applied retroactively to the production environment by specifying the effective date: diff --git a/docs/guides/incremental_time.md b/docs/guides/incremental_time.md index 2f54516ec4..8663ae9926 100644 --- a/docs/guides/incremental_time.md +++ b/docs/guides/incremental_time.md @@ -159,24 +159,49 @@ WHERE Alternatively, all the changes contained in a *specific plan* can be classified as forward-only with a flag: `sqlmesh plan --forward-only`. A subsequent plan that did not include the forward-only flag would fully refresh the model's physical table. Learn more about forward-only plans [here](../concepts/plans.md#forward-only-plans). -### Destructive changes +### Schema changes -Some model changes destroy existing data in a table. Dropping a column from the model is the most direct cause, but changing a column's data type (such as casting a column from a `STRING` to `INTEGER`) can also require a drop. (Whether or not a specific change requires dropping a column may differ across SQL engines.) +When SQLMesh processes forward-only changes to incremental models, it compares the model's new schema with the existing physical table schema to detect potential data loss or compatibility issues. SQLMesh categorizes schema changes into two types: -Forward-only models are used to retain existing data. Before executing forward-only changes to incremental models, SQLMesh performs a check to determine if existing data will be destroyed. +#### Destructive changes -The check is performed at plan time based on the model definition. SQLMesh may not be able to resolve all of a model's column data types and complete the check, so the check is performed again at run time based on the physical tables underlying the model. +Some model changes destroy existing data in a table. Examples include: + +- **Dropping a column** from the model +- **Renaming a column** +- **Modifying a column data type** in a ways that could cause data loss + +Whether a specific change is destructive may differ across SQL engines based on their schema evolution capabilities. + +#### Additive changes + +Additive changes are any changes to the table's columns that aren't categorized as destructive. A simple example would be adding a column to a table but another would be changing a column data type to a type that is compatible (ex: INT -> STRING). + +SQLMesh performs schema change detection at plan time based on the model definition. If SQLMesh cannot resolve all of a model's column data types at plan time, the check is performed again at run time based on the physical tables underlying the model. #### Changes to forward-only models -A model's `on_destructive_change` [configuration setting](../reference/model_configuration.md#incremental-models) determines what happens when SQLMesh detects a destructive change. +SQLMesh provides two configuration settings to control how schema changes are handled: + +- **`on_destructive_change`** - Controls behavior for destructive schema changes +- **`on_additive_change`** - Controls behavior for additive schema changes + +##### Configuration options + +Both properties support four values: -By default, SQLMesh will error so no data is lost. You can set `on_destructive_change` to `warn` or `allow` in the model's `MODEL` block to allow destructive changes. -`ignore` can be used to not perform the schema change and allow the table's definition to diverge from the model definition. +- **`error`** (default for `on_destructive_change`): Stop execution and raise an error +- **`warn`**: Log a warning but proceed with the change +- **`allow`** (default for `on_additive_change`): Silently proceed with the change +- **`ignore`**: Skip the schema change check entirely for this change type !!! warning "Ignore is Dangerous" - `ignore` is dangerous since it can result in error or data loss. It likely should never be used but could be useful as an "escape-hatch" or a way to workaround unexpected behavior. +`ignore` is dangerous since it can result in error or data loss. It likely should never be used but could be useful as an "escape-hatch" or a way to workaround unexpected behavior. + +##### Destructive change handling + +The `on_destructive_change` [configuration setting](../reference/model_configuration.md#incremental-models) determines what happens when SQLMesh detects a destructive change. By default, SQLMesh will error so no data is lost. This example configures a model to silently `allow` destructive changes: @@ -191,12 +216,93 @@ MODEL ( ); ``` -A default `on_destructive_change` value can be set for all incremental models that do not specify it themselves in the [model defaults configuration](../reference/model_configuration.md#model-defaults). +##### Additive change handling + +The `on_additive_change` configuration setting determines what happens when SQLMesh detects an additive change like adding new columns. By default, SQLMesh allows these changes since they don't destroy existing data. + +This example configures a model to raise an error for additive changes (useful for strict schema control): + +``` sql linenums="1" +MODEL ( + name sqlmesh_example.new_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column model_time_column, + forward_only true, + on_additive_change error + ), +); +``` + +##### Combining both settings + +You can configure both settings together to have fine-grained control over schema evolution: + +``` sql linenums="1" +MODEL ( + name sqlmesh_example.new_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column model_time_column, + forward_only true, + on_destructive_change warn, -- Warn but allow destructive changes + on_additive_change allow -- Silently allow new columns + ), +); +``` + +##### Model defaults + +Default values for both `on_destructive_change` and `on_additive_change` can be set for all incremental models in the [model defaults configuration](../reference/model_configuration.md#model-defaults). + +##### Common use cases + +Here are some common patterns for configuring schema change handling: + +**Strict schema control** - Prevent any schema changes: +```sql linenums="1" +MODEL ( + name sqlmesh_example.strict_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column event_date, + forward_only true, + on_destructive_change error, -- Block destructive changes + on_additive_change error -- Block even new columns + ), +); +``` + +**Permissive development model** - Allow all schema changes: +```sql linenums="1" +MODEL ( + name sqlmesh_example.dev_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column event_date, + forward_only true, + on_destructive_change allow, -- Allow dropping columns + on_additive_change allow -- Allow new columns (`allow` is the default value for this setting, so it can be omitted here) + ), +); +``` + +**Production safety** - Allow safe changes, warn about risky ones: +```sql linenums="1" +MODEL ( + name sqlmesh_example.production_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column event_date, + forward_only true, + on_destructive_change warn, -- Warn about destructive changes + on_additive_change allow -- Allow new columns (`allow` is the default value for this setting, so it can be omitted here) + ), +); +``` #### Changes in forward-only plans -The SQLMesh `plan` [`--forward-only` option](../concepts/plans.md#forward-only-plans) treats all the plan's model changes as forward-only. When this option is specified, SQLMesh will check all modified incremental models for destructive schema changes, not just models configured with `forward_only true`. +The SQLMesh `plan` [`--forward-only` option](../concepts/plans.md#forward-only-plans) treats all the plan's model changes as forward-only. When this option is specified, SQLMesh will check all modified incremental models for both destructive and additive schema changes, not just models configured with `forward_only true`. + +SQLMesh determines what to do for each model based on this setting hierarchy: -SQLMesh determines what to do for each model based on this setting hierarchy: the model's `on_destructive_change` value (if present), the `on_destructive_change` [model defaults](../reference/model_configuration.md#model-defaults) value (if present), and the SQLMesh global default of `error`. +- **For destructive changes**: the model's `on_destructive_change` value (if present), the `on_destructive_change` [model defaults](../reference/model_configuration.md#model-defaults) value (if present), and the SQLMesh global default of `error` +- **For additive changes**: the model's `on_additive_change` value (if present), the `on_additive_change` [model defaults](../reference/model_configuration.md#model-defaults) value (if present), and the SQLMesh global default of `allow` -If you want to temporarily allow destructive changes to models that don't allow them, use the `plan` command's [`--allow-destructive-model` selector](../concepts/plans.md#destructive-changes) to specify which models. Learn more about model selectors [here](../guides/model_selection.md). +If you want to temporarily allow destructive changes to models that don't allow them, use the `plan` command's [`--allow-destructive-model` selector](../concepts/plans.md#destructive-changes) to specify which models. Similarly, if you want to temporarily allow additive changes to models configured with `on_additive_change=error`, use the [`--allow-additive-model` selector](../concepts/plans.md#destructive-changes). Learn more about model selectors [here](../guides/model_selection.md). diff --git a/docs/guides/model_selection.md b/docs/guides/model_selection.md index db098a1538..9cc0a4358a 100644 --- a/docs/guides/model_selection.md +++ b/docs/guides/model_selection.md @@ -2,7 +2,7 @@ This guide describes how to select specific models to include in a SQLMesh plan, which can be useful when modifying a subset of the models in a SQLMesh project. -Note: the selector syntax described below is also used for the SQLMesh `plan` [`--allow-destructive-model` selector](../concepts/plans.md#destructive-changes) and for the `table_diff` command to [diff a selection of models](./tablediff.md#diffing-multiple-models-across-environments). +Note: the selector syntax described below is also used for the SQLMesh `plan` [`--allow-destructive-model` and `--allow-additive-model` selectors](../concepts/plans.md#destructive-changes) and for the `table_diff` command to [diff a selection of models](./tablediff.md#diffing-multiple-models-across-environments). ## Background diff --git a/docs/integrations/dbt.md b/docs/integrations/dbt.md index e58bde7776..c5e4bdd2d9 100644 --- a/docs/integrations/dbt.md +++ b/docs/integrations/dbt.md @@ -273,18 +273,18 @@ Similarly, the [allow_partials](../concepts/models/overview.md#allow_partials) p #### on_schema_change -SQLMesh automatically detects destructive schema changes to [forward-only incremental models](../guides/incremental_time.md#forward-only-models) and to all incremental models in [forward-only plans](../concepts/plans.md#destructive-changes). +SQLMesh automatically detects both destructive and additive schema changes to [forward-only incremental models](../guides/incremental_time.md#forward-only-models) and to all incremental models in [forward-only plans](../concepts/plans.md#destructive-changes). -A model's [`on_destructive_change` setting](../guides/incremental_time.md#destructive-changes) determines whether it errors (default), warns, or silently allows the changes. SQLMesh always allows non-destructive forward-only schema changes, such as adding or casting a column in place. +A model's [`on_destructive_change` and `on_additive_change` settings](../guides/incremental_time.md#schema-changes) determine whether it errors, warns, silently allows, or ignores the changes. SQLMesh provides fine-grained control over both destructive changes (like dropping columns) and additive changes (like adding new columns). -`on_schema_change` configuration values are mapped to these SQLMesh `on_destructive_change` values: +`on_schema_change` configuration values are mapped to these SQLMesh settings: -| `on_schema_change` | SQLMesh `on_destructive_change` | -| ------------------ | ------------------------------- | -| ignore | warn | -| append_new_columns | warn | -| sync_all_columns | allow | -| fail | error | +| `on_schema_change` | SQLMesh `on_destructive_change` | SQLMesh `on_additive_change` | +|--------------------|---------------------------------|------------------------------| +| ignore | ignore | ignore | +| fail | error | error | +| append_new_columns | ignore | allow | +| sync_all_columns | allow | allow | ## Snapshot support diff --git a/docs/reference/cli.md b/docs/reference/cli.md index b6877962ab..a9ce9366e1 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -367,6 +367,8 @@ Options: --forward-only Create a plan for forward-only changes. --allow-destructive-model TEXT Allow destructive forward-only changes to models whose names match the expression. + --allow-additive-model TEXT Allow additive forward-only changes to + models whose names match the expression. --effective-from TEXT The effective date from which to apply forward-only changes on production. --no-prompts Disable interactive prompts for the backfill diff --git a/docs/reference/model_configuration.md b/docs/reference/model_configuration.md index 6ea3dd68b6..a5a96ebbf9 100644 --- a/docs/reference/model_configuration.md +++ b/docs/reference/model_configuration.md @@ -186,6 +186,7 @@ The SQLMesh project-level `model_defaults` key supports the following options, d - virtual_properties - session_properties (on per key basis) - on_destructive_change (described [below](#incremental-models)) +- on_additive_change (described [below](#incremental-models)) - audits (described [here](../concepts/audits.md#generic-audits)) - optimize_query - allow_partials @@ -231,11 +232,12 @@ Python model kind `name` enum value: [ModelKindName.FULL](https://sqlmesh.readth Configuration options for all incremental models (in addition to [general model properties](#general-model-properties)). -| Option | Description | Type | Required | -|-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----:|:--------:| -| `forward_only` | Whether the model's changes should always be classified as [forward-only](../concepts/plans.md#forward-only-change). (Default: `False`) | bool | N | -| `on_destructive_change` | What should happen when a change to a [forward-only model](../guides/incremental_time.md#forward-only-models) or incremental model in a [forward-only plan](../concepts/plans.md#forward-only-plans) causes a destructive modification to the model schema. Valid values: `allow`, `warn`, `error`. (Default: `error`) | str | N | -| `disable_restatement` | Whether [restatements](../concepts/plans.md#restatement-plans) should be disabled for the model. (Default: `False`) | bool | N | +| Option | Description | Type | Required | +|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----:|:--------:| +| `forward_only` | Whether the model's changes should always be classified as [forward-only](../concepts/plans.md#forward-only-change). (Default: `False`) | bool | N | +| `on_destructive_change` | What should happen when a change to a [forward-only model](../guides/incremental_time.md#forward-only-models) or incremental model in a [forward-only plan](../concepts/plans.md#forward-only-plans) causes a destructive modification to the model schema. Valid values: `allow`, `warn`, `error`, `ignore`. (Default: `error`) | str | N | +| `on_additive_change` | What should happen when a change to a [forward-only model](../guides/incremental_time.md#forward-only-models) or incremental model in a [forward-only plan](../concepts/plans.md#forward-only-plans) causes an additive modification to the model schema (like adding new columns). Valid values: `allow`, `warn`, `error`, `ignore`. (Default: `allow`) | str | N | +| `disable_restatement` | Whether [restatements](../concepts/plans.md#restatement-plans) should be disabled for the model. (Default: `False`) | bool | N | #### Incremental by time range diff --git a/sqlmesh/cli/main.py b/sqlmesh/cli/main.py index 5d9a12f110..961b78069e 100644 --- a/sqlmesh/cli/main.py +++ b/sqlmesh/cli/main.py @@ -450,6 +450,12 @@ def diff(ctx: click.Context, environment: t.Optional[str] = None) -> None: multiple=True, help="Allow destructive forward-only changes to models whose names match the expression.", ) +@click.option( + "--allow-additive-model", + type=str, + multiple=True, + help="Allow additive forward-only changes to models whose names match the expression.", +) @click.option( "--effective-from", type=str, @@ -548,6 +554,7 @@ def plan( restate_models = kwargs.pop("restate_model") or None select_models = kwargs.pop("select_model") or None allow_destructive_models = kwargs.pop("allow_destructive_model") or None + allow_additive_models = kwargs.pop("allow_additive_model") or None backfill_models = kwargs.pop("backfill_model") or None ignore_cron = kwargs.pop("ignore_cron") or None setattr(get_console(), "verbosity", Verbosity(verbose)) @@ -557,6 +564,7 @@ def plan( restate_models=restate_models, select_models=select_models, allow_destructive_models=allow_destructive_models, + allow_additive_models=allow_additive_models, backfill_models=backfill_models, ignore_cron=ignore_cron, **kwargs, diff --git a/sqlmesh/core/config/model.py b/sqlmesh/core/config/model.py index 3a6266928a..5406a5497b 100644 --- a/sqlmesh/core/config/model.py +++ b/sqlmesh/core/config/model.py @@ -10,6 +10,8 @@ OnDestructiveChange, model_kind_validator, on_destructive_change_validator, + on_additive_change_validator, + OnAdditiveChange, ) from sqlmesh.core.model.meta import FunctionCall from sqlmesh.core.node import IntervalUnit @@ -34,6 +36,7 @@ class ModelDefaultsConfig(BaseConfig): storage_format: The storage format used to store the physical table, only applicable in certain engines. (eg. 'parquet', 'orc') on_destructive_change: What should happen when a forward-only model requires a destructive schema change. + on_additive_change: What should happen when a forward-only model requires an additive schema change. physical_properties: A key-value mapping of arbitrary properties that are applied to the model table / view in the physical layer. virtual_properties: A key-value mapping of arbitrary properties that are applied to the model view in the virtual layer. session_properties: A key-value mapping of properties specific to the target engine that are applied to the engine session. @@ -56,6 +59,7 @@ class ModelDefaultsConfig(BaseConfig): table_format: t.Optional[str] = None storage_format: t.Optional[str] = None on_destructive_change: t.Optional[OnDestructiveChange] = None + on_additive_change: t.Optional[OnAdditiveChange] = None physical_properties: t.Optional[t.Dict[str, t.Any]] = None virtual_properties: t.Optional[t.Dict[str, t.Any]] = None session_properties: t.Optional[t.Dict[str, t.Any]] = None @@ -71,6 +75,7 @@ class ModelDefaultsConfig(BaseConfig): _model_kind_validator = model_kind_validator _on_destructive_change_validator = on_destructive_change_validator + _on_additive_change_validator = on_additive_change_validator @field_validator("audits", mode="before") def _audits_validator(cls, v: t.Any) -> t.Any: diff --git a/sqlmesh/core/console.py b/sqlmesh/core/console.py index 43283ead90..c73efd9669 100644 --- a/sqlmesh/core/console.py +++ b/sqlmesh/core/console.py @@ -27,6 +27,7 @@ from rich.tree import Tree from sqlglot import exp +from sqlmesh.core.schema_diff import TableAlterOperation from sqlmesh.core.test.result import ModelTextTestResult from sqlmesh.core.environment import EnvironmentNamingInfo, EnvironmentSummary from sqlmesh.core.linter.rule import RuleViolation @@ -47,6 +48,7 @@ PythonModelEvalError, NodeAuditsErrors, format_destructive_change_msg, + format_additive_change_msg, ) from sqlmesh.utils.rich import strip_ansi_codes @@ -327,13 +329,22 @@ class PlanBuilderConsole(BaseConsole, abc.ABC): def log_destructive_change( self, snapshot_name: str, - dropped_column_names: t.List[str], - alter_expressions: t.List[exp.Alter], + alter_operations: t.List[TableAlterOperation], dialect: str, error: bool = True, ) -> None: """Display a destructive change error or warning to the user.""" + @abc.abstractmethod + def log_additive_change( + self, + snapshot_name: str, + alter_operations: t.List[TableAlterOperation], + dialect: str, + error: bool = True, + ) -> None: + """Display an additive change error or warning to the user.""" + class UnitTestConsole(abc.ABC): @abc.abstractmethod @@ -757,8 +768,16 @@ def log_failed_models(self, errors: t.List[NodeExecutionFailedError]) -> None: def log_destructive_change( self, snapshot_name: str, - dropped_column_names: t.List[str], - alter_expressions: t.List[exp.Alter], + alter_operations: t.List[TableAlterOperation], + dialect: str, + error: bool = True, + ) -> None: + pass + + def log_additive_change( + self, + snapshot_name: str, + alter_operations: t.List[TableAlterOperation], dialect: str, error: bool = True, ) -> None: @@ -2199,22 +2218,29 @@ def log_failed_models(self, errors: t.List[NodeExecutionFailedError]) -> None: def log_destructive_change( self, snapshot_name: str, - dropped_column_names: t.List[str], - alter_expressions: t.List[exp.Alter], + alter_operations: t.List[TableAlterOperation], dialect: str, error: bool = True, ) -> None: if error: - self._print( - format_destructive_change_msg( - snapshot_name, dropped_column_names, alter_expressions, dialect - ) + self._print(format_destructive_change_msg(snapshot_name, alter_operations, dialect)) + else: + self.log_warning( + format_destructive_change_msg(snapshot_name, alter_operations, dialect, error) ) + + def log_additive_change( + self, + snapshot_name: str, + alter_operations: t.List[TableAlterOperation], + dialect: str, + error: bool = True, + ) -> None: + if error: + self._print(format_additive_change_msg(snapshot_name, alter_operations, dialect)) else: self.log_warning( - format_destructive_change_msg( - snapshot_name, dropped_column_names, alter_expressions, dialect, error - ) + format_additive_change_msg(snapshot_name, alter_operations, dialect, error) ) def log_error(self, message: str) -> None: diff --git a/sqlmesh/core/context.py b/sqlmesh/core/context.py index 9022f3f069..1a5375183c 100644 --- a/sqlmesh/core/context.py +++ b/sqlmesh/core/context.py @@ -1279,6 +1279,7 @@ def plan( empty_backfill: t.Optional[bool] = None, forward_only: t.Optional[bool] = None, allow_destructive_models: t.Optional[t.Collection[str]] = None, + allow_additive_models: t.Optional[t.Collection[str]] = None, no_prompts: t.Optional[bool] = None, auto_apply: t.Optional[bool] = None, no_auto_categorization: t.Optional[bool] = None, @@ -1322,6 +1323,7 @@ def plan( empty_backfill: Like skip_backfill, but also records processed intervals. forward_only: Whether the purpose of the plan is to make forward only changes. allow_destructive_models: Models whose forward-only changes are allowed to be destructive. + allow_additive_models: Models whose forward-only changes are allowed to be additive. no_prompts: Whether to disable interactive prompts for the backfill time range. Please note that if this flag is set to true and there are uncategorized changes the plan creation will fail. Default: False. @@ -1360,6 +1362,7 @@ def plan( empty_backfill=empty_backfill, forward_only=forward_only, allow_destructive_models=allow_destructive_models, + allow_additive_models=allow_additive_models, no_auto_categorization=no_auto_categorization, effective_from=effective_from, include_unmodified=include_unmodified, @@ -1411,6 +1414,7 @@ def plan_builder( empty_backfill: t.Optional[bool] = None, forward_only: t.Optional[bool] = None, allow_destructive_models: t.Optional[t.Collection[str]] = None, + allow_additive_models: t.Optional[t.Collection[str]] = None, no_auto_categorization: t.Optional[bool] = None, effective_from: t.Optional[TimeLike] = None, include_unmodified: t.Optional[bool] = None, @@ -1480,6 +1484,9 @@ def plan_builder( "allow_destructive_models": list(allow_destructive_models) if allow_destructive_models is not None else None, + "allow_additive_models": list(allow_additive_models) + if allow_additive_models is not None + else None, "no_auto_categorization": no_auto_categorization, "effective_from": effective_from, "include_unmodified": include_unmodified, @@ -1533,6 +1540,11 @@ def plan_builder( else: expanded_destructive_models = None + if allow_additive_models: + expanded_additive_models = model_selector.expand_model_selections(allow_additive_models) + else: + expanded_additive_models = None + if backfill_models: backfill_models = model_selector.expand_model_selections(backfill_models) else: @@ -1642,6 +1654,7 @@ def plan_builder( is_dev=is_dev, forward_only=forward_only, allow_destructive_models=expanded_destructive_models, + allow_additive_models=expanded_additive_models, environment_ttl=environment_ttl, environment_suffix_target=self.config.environment_suffix_target, environment_catalog_mapping=self.environment_catalog_mapping, diff --git a/sqlmesh/core/engine_adapter/base.py b/sqlmesh/core/engine_adapter/base.py index 24ee99bba5..97342b28cc 100644 --- a/sqlmesh/core/engine_adapter/base.py +++ b/sqlmesh/core/engine_adapter/base.py @@ -39,7 +39,7 @@ set_catalog, ) from sqlmesh.core.model.kind import TimeColumn -from sqlmesh.core.schema_diff import SchemaDiffer +from sqlmesh.core.schema_diff import SchemaDiffer, TableAlterOperation from sqlmesh.utils import ( CorrelationId, columns_to_types_all_known, @@ -376,6 +376,13 @@ def _columns_to_types( target_columns_to_types = columns_to_types_from_df(t.cast(pd.DataFrame, query_or_df)) if not source_columns and target_columns_to_types: source_columns = list(target_columns_to_types) + # source columns should only contain columns that are defined in the target. If there are extras then + # that means they are intended to be ignored and will be excluded + source_columns = ( + [x for x in source_columns if x in target_columns_to_types] + if source_columns and target_columns_to_types + else None + ) return target_columns_to_types, source_columns def recycle(self) -> None: @@ -1074,32 +1081,39 @@ def _drop_object( """ self.execute(exp.Drop(this=exp.to_table(name), kind=kind, exists=exists, **drop_args)) - def get_alter_expressions( + def get_alter_operations( self, current_table_name: TableName, target_table_name: TableName, *, ignore_destructive: bool = False, - ) -> t.List[exp.Alter]: + ignore_additive: bool = False, + ) -> t.List[TableAlterOperation]: """ Determines the alter statements needed to change the current table into the structure of the target table. """ - return self.SCHEMA_DIFFER.compare_columns( - current_table_name, - self.columns(current_table_name), - self.columns(target_table_name), - ignore_destructive=ignore_destructive, + return t.cast( + t.List[TableAlterOperation], + self.SCHEMA_DIFFER.compare_columns( + current_table_name, + self.columns(current_table_name), + self.columns(target_table_name), + ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, + ), ) def alter_table( self, - alter_expressions: t.List[exp.Alter], + alter_expressions: t.Union[t.List[exp.Alter], t.List[TableAlterOperation]], ) -> None: """ Performs the alter statements to change the current table into the structure of the target table. """ with self.transaction(): - for alter_expression in alter_expressions: + for alter_expression in [ + x.expression if isinstance(x, TableAlterOperation) else x for x in alter_expressions + ]: self.execute(alter_expression) def create_view( diff --git a/sqlmesh/core/engine_adapter/bigquery.py b/sqlmesh/core/engine_adapter/bigquery.py index 4fe50fdeef..cab637fb36 100644 --- a/sqlmesh/core/engine_adapter/bigquery.py +++ b/sqlmesh/core/engine_adapter/bigquery.py @@ -12,6 +12,7 @@ InsertOverwriteWithMergeMixin, ClusteredByMixin, RowDiffMixin, + TableAlterClusterByOperation, ) from sqlmesh.core.engine_adapter.shared import ( CatalogSupport, @@ -21,7 +22,7 @@ set_catalog, ) from sqlmesh.core.node import IntervalUnit -from sqlmesh.core.schema_diff import SchemaDiffer +from sqlmesh.core.schema_diff import SchemaDiffer, TableAlterOperation from sqlmesh.utils import optional_import, get_source_columns_to_types from sqlmesh.utils.date import to_datetime from sqlmesh.utils.errors import SQLMeshError @@ -51,9 +52,6 @@ NestedField = t.Tuple[str, str, t.List[str]] NestedFieldsDict = t.Dict[str, t.List[NestedField]] -# used to tag AST nodes to be specially handled in alter_table() -_CLUSTERING_META_KEY = "__sqlmesh_update_table_clustering" - @set_catalog() class BigQueryEngineAdapter(InsertOverwriteWithMergeMixin, ClusteredByMixin, RowDiffMixin): @@ -394,29 +392,31 @@ def create_mapping_schema( def alter_table( self, - alter_expressions: t.List[exp.Alter], + alter_expressions: t.Union[t.List[exp.Alter], t.List[TableAlterOperation]], ) -> None: """ Performs the alter statements to change the current table into the structure of the target table, and uses the API to add columns to structs, where SQL is not supported. """ + if not alter_expressions: + return + + cluster_by_operations, alter_statements = [], [] + for e in alter_expressions: + if isinstance(e, TableAlterClusterByOperation): + cluster_by_operations.append(e) + elif isinstance(e, TableAlterOperation): + alter_statements.append(e.expression) + else: + alter_statements.append(e) - nested_fields, non_nested_expressions = self._split_alter_expressions(alter_expressions) + for op in cluster_by_operations: + self._update_clustering_key(op) - if nested_fields: - self._update_table_schema_nested_fields(nested_fields, alter_expressions[0].this) + nested_fields, non_nested_expressions = self._split_alter_expressions(alter_statements) - # this is easier than trying to detect exp.Cluster nodes - # or exp.Command nodes that contain the string "DROP CLUSTERING KEY" - clustering_change_operations = [ - e for e in non_nested_expressions if _CLUSTERING_META_KEY in e.meta - ] - for op in clustering_change_operations: - non_nested_expressions.remove(op) - table, cluster_by = op.meta[_CLUSTERING_META_KEY] - assert isinstance(table, str) or isinstance(table, exp.Table) - - self._update_clustering_key(table, cluster_by) + if nested_fields: + self._update_table_schema_nested_fields(nested_fields, alter_statements[0].this) if non_nested_expressions: super().alter_table(non_nested_expressions) @@ -1179,35 +1179,27 @@ def _get_data_objects( for row in df.itertuples() ] - def _change_clustering_key_expr( - self, table: exp.Table, cluster_by: t.List[exp.Expression] - ) -> exp.Alter: - expr = super()._change_clustering_key_expr(table=table, cluster_by=cluster_by) - expr.meta[_CLUSTERING_META_KEY] = (table, cluster_by) - return expr + def _update_clustering_key(self, operation: TableAlterClusterByOperation) -> None: + cluster_key_expressions = getattr(operation, "cluster_key_expressions", []) + bq_table = self._get_table(operation.target_table) - def _drop_clustering_key_expr(self, table: exp.Table) -> exp.Alter: - expr = super()._drop_clustering_key_expr(table=table) - expr.meta[_CLUSTERING_META_KEY] = (table, None) - return expr - - def _update_clustering_key( - self, table_name: TableName, cluster_by: t.Optional[t.List[exp.Expression]] - ) -> None: - cluster_by = cluster_by or [] - bq_table = self._get_table(table_name) - - rendered_columns = [c.sql(dialect=self.dialect) for c in cluster_by] + rendered_columns = [c.sql(dialect=self.dialect) for c in cluster_key_expressions] bq_table.clustering_fields = ( rendered_columns or None ) # causes a drop of the key if cluster_by is empty or None self._db_call(self.client.update_table, table=bq_table, fields=["clustering_fields"]) - if cluster_by: + if cluster_key_expressions: # BigQuery only applies new clustering going forward, so this rewrites the columns to apply the new clustering to historical data # ref: https://cloud.google.com/bigquery/docs/creating-clustered-tables#modifying-cluster-spec - self.execute(exp.update(table_name, {c: c for c in cluster_by}, where=exp.true())) + self.execute( + exp.update( + operation.target_table, + {c: c for c in cluster_key_expressions}, + where=exp.true(), + ) + ) def _normalize_decimal_value(self, col: exp.Expression, precision: int) -> exp.Expression: return exp.func("FORMAT", exp.Literal.string(f"%.{precision}f"), col) diff --git a/sqlmesh/core/engine_adapter/clickhouse.py b/sqlmesh/core/engine_adapter/clickhouse.py index 5ac4e9b152..523f77766a 100644 --- a/sqlmesh/core/engine_adapter/clickhouse.py +++ b/sqlmesh/core/engine_adapter/clickhouse.py @@ -15,7 +15,7 @@ CommentCreationView, InsertOverwriteStrategy, ) -from sqlmesh.core.schema_diff import SchemaDiffer +from sqlmesh.core.schema_diff import SchemaDiffer, TableAlterOperation from sqlmesh.utils import get_source_columns_to_types if t.TYPE_CHECKING: @@ -596,13 +596,15 @@ def delete_from(self, table_name: TableName, where: t.Union[str, exp.Expression] def alter_table( self, - alter_expressions: t.List[exp.Alter], + alter_expressions: t.Union[t.List[exp.Alter], t.List[TableAlterOperation]], ) -> None: """ Performs the alter statements to change the current table into the structure of the target table. """ with self.transaction(): - for alter_expression in alter_expressions: + for alter_expression in [ + x.expression if isinstance(x, TableAlterOperation) else x for x in alter_expressions + ]: if self.engine_run_mode.is_cluster: alter_expression.set( "cluster", exp.OnCluster(this=exp.to_identifier(self.cluster)) diff --git a/sqlmesh/core/engine_adapter/mixins.py b/sqlmesh/core/engine_adapter/mixins.py index 12c9bfc603..e2d7915cf3 100644 --- a/sqlmesh/core/engine_adapter/mixins.py +++ b/sqlmesh/core/engine_adapter/mixins.py @@ -1,7 +1,9 @@ from __future__ import annotations +import abc import logging import typing as t +from dataclasses import dataclass from sqlglot import exp, parse_one from sqlglot.helper import seq_get @@ -10,6 +12,7 @@ from sqlmesh.core.engine_adapter.shared import InsertOverwriteStrategy, SourceQuery from sqlmesh.core.node import IntervalUnit from sqlmesh.core.dialect import schema_ +from sqlmesh.core.schema_diff import TableAlterOperation from sqlmesh.utils.errors import SQLMeshError if t.TYPE_CHECKING: @@ -340,35 +343,74 @@ def _build_create_table_exp( return statement -class ClusteredByMixin(EngineAdapter): - def _build_clustered_by_exp( - self, - clustered_by: t.List[exp.Expression], - **kwargs: t.Any, - ) -> t.Optional[exp.Cluster]: - return exp.Cluster(expressions=[c.copy() for c in clustered_by]) +@dataclass(frozen=True) +class TableAlterClusterByOperation(TableAlterOperation, abc.ABC): + pass + - def _parse_clustering_key(self, clustering_key: t.Optional[str]) -> t.List[exp.Expression]: - if not clustering_key: - return [] +@dataclass(frozen=True) +class TableAlterChangeClusterKeyOperation(TableAlterClusterByOperation): + clustering_key: str + dialect: str + @property + def is_additive(self) -> bool: + return False + + @property + def is_destructive(self) -> bool: + return False + + @property + def _alter_actions(self) -> t.List[exp.Expression]: + return [exp.Cluster(expressions=self.cluster_key_expressions)] + + @property + def cluster_key_expressions(self) -> t.List[exp.Expression]: # Note: Assumes `clustering_key` as a string like: # - "(col_a)" # - "(col_a, col_b)" # - "func(col_a, transform(col_b))" - parsed_cluster_key = parse_one(clustering_key, dialect=self.dialect) - + parsed_cluster_key = parse_one(self.clustering_key, dialect=self.dialect) return parsed_cluster_key.expressions or [parsed_cluster_key.this] - def get_alter_expressions( + +@dataclass(frozen=True) +class TableAlterDropClusterKeyOperation(TableAlterClusterByOperation): + @property + def is_additive(self) -> bool: + return False + + @property + def is_destructive(self) -> bool: + return False + + @property + def _alter_actions(self) -> t.List[exp.Expression]: + return [exp.Command(this="DROP", expression="CLUSTERING KEY")] + + +class ClusteredByMixin(EngineAdapter): + def _build_clustered_by_exp( + self, + clustered_by: t.List[exp.Expression], + **kwargs: t.Any, + ) -> t.Optional[exp.Cluster]: + return exp.Cluster(expressions=[c.copy() for c in clustered_by]) + + def get_alter_operations( self, current_table_name: TableName, target_table_name: TableName, *, ignore_destructive: bool = False, - ) -> t.List[exp.Alter]: - expressions = super().get_alter_expressions( - current_table_name, target_table_name, ignore_destructive=ignore_destructive + ignore_additive: bool = False, + ) -> t.List[TableAlterOperation]: + operations = super().get_alter_operations( + current_table_name, + target_table_name, + ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, ) # check for a change in clustering @@ -390,32 +432,17 @@ def get_alter_expressions( if target_table_info.clustering_key and ( current_table_info.clustering_key != target_table_info.clustering_key ): - expressions.append( - self._change_clustering_key_expr( - current_table, - self._parse_clustering_key(target_table_info.clustering_key), + operations.append( + TableAlterChangeClusterKeyOperation( + target_table=current_table, + clustering_key=target_table_info.clustering_key, + dialect=self.dialect, ) ) elif current_table_info.is_clustered: - expressions.append(self._drop_clustering_key_expr(current_table)) - - return expressions + operations.append(TableAlterDropClusterKeyOperation(target_table=current_table)) - def _change_clustering_key_expr( - self, table: exp.Table, cluster_by: t.List[exp.Expression] - ) -> exp.Alter: - return exp.Alter( - this=table, - kind="TABLE", - actions=[exp.Cluster(expressions=cluster_by)], - ) - - def _drop_clustering_key_expr(self, table: exp.Table) -> exp.Alter: - return exp.Alter( - this=table, - kind="TABLE", - actions=[exp.Command(this="DROP", expression="CLUSTERING KEY")], - ) + return operations def logical_merge( diff --git a/sqlmesh/core/model/kind.py b/sqlmesh/core/model/kind.py index 470ce92c20..6fbbc3534b 100644 --- a/sqlmesh/core/model/kind.py +++ b/sqlmesh/core/model/kind.py @@ -208,6 +208,31 @@ def is_ignore(self) -> bool: return self == OnDestructiveChange.IGNORE +class OnAdditiveChange(str, Enum): + """What should happen when a forward-only model change requires an additive schema change.""" + + ERROR = "ERROR" + WARN = "WARN" + ALLOW = "ALLOW" + IGNORE = "IGNORE" + + @property + def is_error(self) -> bool: + return self == OnAdditiveChange.ERROR + + @property + def is_warn(self) -> bool: + return self == OnAdditiveChange.WARN + + @property + def is_allow(self) -> bool: + return self == OnAdditiveChange.ALLOW + + @property + def is_ignore(self) -> bool: + return self == OnAdditiveChange.IGNORE + + def _on_destructive_change_validator( cls: t.Type, v: t.Union[OnDestructiveChange, str, exp.Identifier] ) -> t.Any: @@ -218,6 +243,20 @@ def _on_destructive_change_validator( return v +def _on_additive_change_validator( + cls: t.Type, v: t.Union[OnAdditiveChange, str, exp.Identifier] +) -> t.Any: + if v and not isinstance(v, OnAdditiveChange): + return OnAdditiveChange( + v.this.upper() if isinstance(v, (exp.Identifier, exp.Literal)) else v.upper() + ) + return v + + +on_additive_change_validator = field_validator("on_additive_change", mode="before")( + _on_additive_change_validator +) + on_destructive_change_validator = field_validator("on_destructive_change", mode="before")( _on_destructive_change_validator ) @@ -336,15 +375,18 @@ def _kind_dialect_validator(cls: t.Type, v: t.Optional[str]) -> str: class _Incremental(_ModelKind): on_destructive_change: OnDestructiveChange = OnDestructiveChange.ERROR + on_additive_change: OnAdditiveChange = OnAdditiveChange.ALLOW auto_restatement_cron: t.Optional[SQLGlotCron] = None _on_destructive_change_validator = on_destructive_change_validator + _on_additive_change_validator = on_additive_change_validator @property def metadata_hash_values(self) -> t.List[t.Optional[str]]: return [ *super().metadata_hash_values, str(self.on_destructive_change), + str(self.on_additive_change), self.auto_restatement_cron, ] @@ -357,6 +399,7 @@ def to_expression( *_properties( { "on_destructive_change": self.on_destructive_change.value, + "on_additive_change": self.on_additive_change.value, "auto_restatement_cron": self.auto_restatement_cron, } ), @@ -1001,14 +1044,15 @@ def create_model_kind(v: t.Any, dialect: str, defaults: t.Dict[str, t.Any]) -> M if "dialect" in kind_type.all_fields() and props.get("dialect") is None: props["dialect"] = dialect - # only pass the on_destructive_change user default to models inheriting from _Incremental + # only pass the on_destructive_change or on_additive_change user default to models inheriting from _Incremental # that don't explicitly set it in the model definition - if ( - issubclass(kind_type, _Incremental) - and props.get("on_destructive_change") is None - and defaults.get("on_destructive_change") is not None - ): - props["on_destructive_change"] = defaults.get("on_destructive_change") + if issubclass(kind_type, _Incremental): + for on_change_property in ("on_additive_change", "on_destructive_change"): + if ( + props.get(on_change_property) is None + and defaults.get(on_change_property) is not None + ): + props[on_change_property] = defaults.get(on_change_property) if kind_type == CustomKind: # load the custom materialization class and check if it uses a custom kind type diff --git a/sqlmesh/core/model/meta.py b/sqlmesh/core/model/meta.py index 2f24349a72..088398c388 100644 --- a/sqlmesh/core/model/meta.py +++ b/sqlmesh/core/model/meta.py @@ -31,6 +31,7 @@ ViewKind, _IncrementalBy, model_kind_validator, + OnAdditiveChange, ) from sqlmesh.core.node import _Node, str_or_exp_to_str from sqlmesh.core.reference import Reference @@ -517,6 +518,11 @@ def fqn(self) -> str: def on_destructive_change(self) -> OnDestructiveChange: return getattr(self.kind, "on_destructive_change", OnDestructiveChange.ALLOW) + @property + def on_additive_change(self) -> OnAdditiveChange: + """Return the model's additive change setting if it has one.""" + return getattr(self.kind, "on_additive_change", OnAdditiveChange.ALLOW) + @property def ignored_rules(self) -> t.Set[str]: return self.ignored_rules_ or set() diff --git a/sqlmesh/core/plan/builder.py b/sqlmesh/core/plan/builder.py index 9451f9eb53..a48812d16c 100644 --- a/sqlmesh/core/plan/builder.py +++ b/sqlmesh/core/plan/builder.py @@ -26,7 +26,8 @@ from sqlmesh.core.schema_diff import ( get_schema_differ, has_drop_alteration, - get_dropped_column_names, + has_additive_alteration, + TableAlterOperation, ) from sqlmesh.core.snapshot import ( DeployabilityIndex, @@ -73,6 +74,7 @@ class PlanBuilder: is_dev: Whether this plan is for development purposes. forward_only: Whether the purpose of the plan is to make forward only changes. allow_destructive_models: A list of fully qualified model names whose forward-only changes are allowed to be destructive. + allow_additive_models: A list of fully qualified model names whose forward-only changes are allowed to be additive. environment_ttl: The period of time that a development environment should exist before being deleted. categorizer_config: Auto categorization settings. auto_categorization_enabled: Whether to apply auto categorization. @@ -108,6 +110,7 @@ def __init__( is_dev: bool = False, forward_only: bool = False, allow_destructive_models: t.Optional[t.Iterable[str]] = None, + allow_additive_models: t.Optional[t.Iterable[str]] = None, environment_ttl: t.Optional[str] = None, environment_suffix_target: EnvironmentSuffixTarget = EnvironmentSuffixTarget.default, environment_catalog_mapping: t.Optional[t.Dict[re.Pattern, str]] = None, @@ -136,6 +139,9 @@ def __init__( self._allow_destructive_models = set( allow_destructive_models if allow_destructive_models is not None else [] ) + self._allow_additive_models = set( + allow_additive_models if allow_additive_models is not None else [] + ) self._enable_preview = enable_preview self._end_bounded = end_bounded self._ensure_finalized_snapshots = ensure_finalized_snapshots @@ -279,7 +285,7 @@ def build(self) -> Plan: dag = self._build_dag() directly_modified, indirectly_modified = self._build_directly_and_indirectly_modified(dag) - self._check_destructive_changes(directly_modified) + self._check_destructive_additive_changes(directly_modified) self._categorize_snapshots(dag, indirectly_modified) self._adjust_snapshot_intervals() @@ -323,6 +329,7 @@ def build(self) -> Plan: forward_only=self._forward_only, explain=self._explain, allow_destructive_models=t.cast(t.Set, self._allow_destructive_models), + allow_additive_models=t.cast(t.Set, self._allow_additive_models), include_unmodified=self._include_unmodified, environment_ttl=self._environment_ttl, environment_naming_info=self.environment_naming_info, @@ -531,16 +538,20 @@ def _adjust_snapshot_intervals(self) -> None: if new.is_forward_only: new.dev_intervals = new.intervals.copy() - def _check_destructive_changes(self, directly_modified: t.Set[SnapshotId]) -> None: + def _check_destructive_additive_changes(self, directly_modified: t.Set[SnapshotId]) -> None: for s_id in sorted(directly_modified): if s_id.name not in self._context_diff.modified_snapshots: continue snapshot = self._context_diff.snapshots[s_id] + needs_destructive_check = snapshot.needs_destructive_check( + self._allow_destructive_models + ) + needs_additive_check = snapshot.needs_additive_check(self._allow_additive_models) # should we raise/warn if this snapshot has/inherits a destructive change? - should_raise_or_warn = ( - self._is_forward_only_change(s_id) or self._forward_only - ) and snapshot.needs_destructive_check(self._allow_destructive_models) + should_raise_or_warn = (self._is_forward_only_change(s_id) or self._forward_only) and ( + needs_destructive_check or needs_additive_check + ) if not should_raise_or_warn or not snapshot.is_model: continue @@ -554,22 +565,24 @@ def _check_destructive_changes(self, directly_modified: t.Set[SnapshotId]) -> No if columns_to_types_all_known(old_columns_to_types) and columns_to_types_all_known( new_columns_to_types ): - schema_diff = get_schema_differ(snapshot.model.dialect).compare_columns( - new.name, - old_columns_to_types, - new_columns_to_types, - ignore_destructive=new.model.on_destructive_change.is_ignore, + alter_operations = t.cast( + t.List[TableAlterOperation], + get_schema_differ(snapshot.model.dialect).compare_columns( + new.name, + old_columns_to_types, + new_columns_to_types, + ignore_destructive=new.model.on_destructive_change.is_ignore, + ignore_additive=new.model.on_additive_change.is_ignore, + ), ) - if has_drop_alteration(schema_diff): - snapshot_name = snapshot.name - dropped_column_names = get_dropped_column_names(schema_diff) - model_dialect = snapshot.model.dialect + snapshot_name = snapshot.name + model_dialect = snapshot.model.dialect + if needs_destructive_check and has_drop_alteration(alter_operations): self._console.log_destructive_change( snapshot_name, - dropped_column_names, - schema_diff, + alter_operations, model_dialect, error=not snapshot.model.on_destructive_change.is_warn, ) @@ -578,6 +591,16 @@ def _check_destructive_changes(self, directly_modified: t.Set[SnapshotId]) -> No "Plan requires a destructive change to a forward-only model." ) + if needs_additive_check and has_additive_alteration(alter_operations): + self._console.log_additive_change( + snapshot_name, + alter_operations, + model_dialect, + error=not snapshot.model.on_additive_change.is_warn, + ) + if snapshot.model.on_additive_change.is_error: + raise PlanError("Plan requires an additive change to a forward-only model.") + def _categorize_snapshots( self, dag: DAG[SnapshotId], indirectly_modified: SnapshotMapping ) -> None: diff --git a/sqlmesh/core/plan/definition.py b/sqlmesh/core/plan/definition.py index 300ac62faf..2f3ddb5990 100644 --- a/sqlmesh/core/plan/definition.py +++ b/sqlmesh/core/plan/definition.py @@ -44,6 +44,7 @@ class Plan(PydanticModel, frozen=True): no_gaps: bool forward_only: bool allow_destructive_models: t.Set[str] + allow_additive_models: t.Set[str] include_unmodified: bool end_bounded: bool ensure_finalized_snapshots: bool @@ -258,6 +259,7 @@ def to_evaluatable(self) -> EvaluatablePlan: restatements={s.name: i for s, i in self.restatements.items()}, is_dev=self.is_dev, allow_destructive_models=self.allow_destructive_models, + allow_additive_models=self.allow_additive_models, forward_only=self.forward_only, end_bounded=self.end_bounded, ensure_finalized_snapshots=self.ensure_finalized_snapshots, @@ -300,6 +302,7 @@ class EvaluatablePlan(PydanticModel): restatements: t.Dict[str, Interval] is_dev: bool allow_destructive_models: t.Set[str] + allow_additive_models: t.Set[str] forward_only: bool end_bounded: bool ensure_finalized_snapshots: bool diff --git a/sqlmesh/core/plan/evaluator.py b/sqlmesh/core/plan/evaluator.py index 46142b7eeb..298d18a042 100644 --- a/sqlmesh/core/plan/evaluator.py +++ b/sqlmesh/core/plan/evaluator.py @@ -179,6 +179,7 @@ def visit_physical_layer_update_stage( snapshots_to_create, stage.all_snapshots, allow_destructive_snapshots=plan.allow_destructive_models, + allow_additive_snapshots=plan.allow_additive_models, deployability_index=stage.deployability_index, on_start=lambda x: self.console.start_creation_progress( x, plan.environment, self.default_catalog @@ -254,6 +255,7 @@ def visit_backfill_stage(self, stage: stages.BackfillStage, plan: EvaluatablePla start=plan.start, end=plan.end, allow_destructive_snapshots=plan.allow_destructive_models, + allow_additive_snapshots=plan.allow_additive_models, selected_snapshot_ids=stage.selected_snapshot_ids, ) if errors: @@ -322,6 +324,7 @@ def visit_migrate_schemas_stage( stage.snapshots, stage.all_snapshots, allow_destructive_snapshots=plan.allow_destructive_models, + allow_additive_snapshots=plan.allow_additive_models, deployability_index=stage.deployability_index, ) except NodeExecutionFailedError as ex: diff --git a/sqlmesh/core/scheduler.py b/sqlmesh/core/scheduler.py index e787e57a23..4c4a06f4cd 100644 --- a/sqlmesh/core/scheduler.py +++ b/sqlmesh/core/scheduler.py @@ -197,6 +197,7 @@ def evaluate( batch_index: int, environment_naming_info: t.Optional[EnvironmentNamingInfo] = None, allow_destructive_snapshots: t.Optional[t.Set[str]] = None, + allow_additive_snapshots: t.Optional[t.Set[str]] = None, target_table_exists: t.Optional[bool] = None, **kwargs: t.Any, ) -> t.List[AuditResult]: @@ -208,6 +209,7 @@ def evaluate( end: The end datetime to render. execution_time: The date/time time reference to use for execution time. Defaults to now. allow_destructive_snapshots: Snapshots for which destructive schema changes are allowed. + allow_additive_snapshots: Snapshots for which additive schema changes are allowed. deployability_index: Determines snapshots that are deployable in the context of this evaluation. batch_index: If the snapshot is part of a batch of related snapshots; which index in the batch is it auto_restatement_enabled: Whether to enable auto restatements. @@ -230,6 +232,7 @@ def evaluate( execution_time=execution_time, snapshots=snapshots, allow_destructive_snapshots=allow_destructive_snapshots, + allow_additive_snapshots=allow_additive_snapshots, deployability_index=deployability_index, batch_index=batch_index, target_table_exists=target_table_exists, @@ -412,6 +415,7 @@ def run_merged_intervals( start: t.Optional[TimeLike] = None, end: t.Optional[TimeLike] = None, allow_destructive_snapshots: t.Optional[t.Set[str]] = None, + allow_additive_snapshots: t.Optional[t.Set[str]] = None, selected_snapshot_ids: t.Optional[t.Set[SnapshotId]] = None, run_environment_statements: bool = False, audit_only: bool = False, @@ -427,6 +431,7 @@ def run_merged_intervals( start: The start of the run. end: The end of the run. allow_destructive_snapshots: Snapshots for which destructive schema changes are allowed. + allow_additive_snapshots: Snapshots for which additive schema changes are allowed. selected_snapshot_ids: The snapshots to include in the run DAG. If None, all snapshots with missing intervals will be included. Returns: @@ -517,6 +522,7 @@ def run_node(node: SchedulingUnit) -> None: deployability_index=deployability_index, batch_index=node.batch_index, allow_destructive_snapshots=allow_destructive_snapshots, + allow_additive_snapshots=allow_additive_snapshots, target_table_exists=snapshot.snapshot_id not in snapshots_to_create, ) @@ -538,6 +544,7 @@ def run_node(node: SchedulingUnit) -> None: snapshots=self.snapshots_by_name, deployability_index=deployability_index, allow_destructive_snapshots=allow_destructive_snapshots or set(), + allow_additive_snapshots=allow_additive_snapshots or set(), ) try: diff --git a/sqlmesh/core/schema_diff.py b/sqlmesh/core/schema_diff.py index 1bf2f76672..0bbc146c17 100644 --- a/sqlmesh/core/schema_diff.py +++ b/sqlmesh/core/schema_diff.py @@ -1,8 +1,9 @@ from __future__ import annotations +import abc import logging import typing as t -from enum import Enum, auto +from dataclasses import dataclass from collections import defaultdict from pydantic import Field from sqlglot import exp @@ -10,6 +11,7 @@ from sqlmesh.utils import columns_to_types_to_struct from sqlmesh.utils.pydantic import PydanticModel +from sqlmesh.utils.errors import SQLMeshError if t.TYPE_CHECKING: from sqlmesh.core._typing import TableName @@ -17,25 +19,140 @@ logger = logging.getLogger(__name__) -class TableAlterOperationType(Enum): - ADD = auto() - DROP = auto() - ALTER_TYPE = auto() +@dataclass(frozen=True) +class TableAlterOperation(abc.ABC): + target_table: exp.Table @property - def is_add(self) -> bool: - return self == TableAlterOperationType.ADD + @abc.abstractmethod + def is_destructive(self) -> bool: + pass @property - def is_drop(self) -> bool: - return self == TableAlterOperationType.DROP + @abc.abstractmethod + def is_additive(self) -> bool: + pass @property - def is_alter_type(self) -> bool: - return self == TableAlterOperationType.ALTER_TYPE + @abc.abstractmethod + def _alter_actions(self) -> t.List[exp.Expression]: + pass + + @property + def expression(self) -> exp.Alter: + return exp.Alter( + this=self.target_table, + kind="TABLE", + actions=self._alter_actions, + ) + + +@dataclass(frozen=True) +class TableAlterColumnOperation(TableAlterOperation, abc.ABC): + column_parts: t.List[TableAlterColumn] + expected_table_struct: exp.DataType + array_element_selector: str + + @property + def column_identifiers(self) -> t.List[exp.Identifier]: + results = [] + for column in self.column_parts: + results.append(column.identifier) + if ( + column.is_array_of_struct + and len(self.column_parts) > 1 + and self.array_element_selector + ): + results.append(exp.to_identifier(self.array_element_selector)) + return results + + @property + def column(self) -> t.Union[exp.Dot, exp.Identifier]: + columns = self.column_identifiers + if len(columns) == 1: + return columns[0] + return exp.Dot.build(columns) + + +@dataclass(frozen=True) +class TableAlterTypedColumnOperation(TableAlterColumnOperation, abc.ABC): + column_type: exp.DataType + + @property + def column_def(self) -> exp.ColumnDef: + if not self.column_type: + raise SQLMeshError("Tried to access column type when it shouldn't be needed") + return exp.ColumnDef( + this=self.column, + kind=self.column_type, + ) + + +@dataclass(frozen=True) +class TableAlterAddColumnOperation(TableAlterTypedColumnOperation): + position: t.Optional[TableAlterColumnPosition] = None + is_part_of_destructive_change: bool = False + + @property + def is_additive(self) -> bool: + return not self.is_part_of_destructive_change + + @property + def is_destructive(self) -> bool: + return self.is_part_of_destructive_change + + @property + def _alter_actions(self) -> t.List[exp.Expression]: + column_def = exp.ColumnDef( + this=self.column, + kind=self.column_type, + ) + if self.position: + column_def.set("position", self.position.column_position_node) + return [column_def] -class TableAlterColumn(PydanticModel): +@dataclass(frozen=True) +class TableAlterDropColumnOperation(TableAlterColumnOperation): + cascade: bool = False + + @property + def is_additive(self) -> bool: + return False + + @property + def is_destructive(self) -> bool: + return True + + @property + def _alter_actions(self) -> t.List[exp.Expression]: + return [exp.Drop(this=self.column, kind="COLUMN", cascade=self.cascade)] + + +@dataclass(frozen=True) +class TableAlterChangeColumnTypeOperation(TableAlterTypedColumnOperation): + current_type: exp.DataType + + @property + def is_additive(self) -> bool: + return True + + @property + def is_destructive(self) -> bool: + return False + + @property + def _alter_actions(self) -> t.List[exp.Expression]: + return [ + exp.AlterColumn( + this=self.column, + dtype=self.column_type, + ) + ] + + +@dataclass(frozen=True) +class TableAlterColumn: name: str is_struct: bool is_array_of_struct: bool @@ -115,7 +232,8 @@ def identifier(self) -> exp.Identifier: return exp.to_identifier(self.name, quoted=self.quoted) -class TableAlterColumnPosition(PydanticModel): +@dataclass(frozen=True) +class TableAlterColumnPosition: is_first: bool is_last: bool after: t.Optional[exp.Identifier] = None @@ -160,129 +278,6 @@ def column_position_node(self) -> t.Optional[exp.ColumnPosition]: return exp.ColumnPosition(this=column, position=position) -class TableAlterOperation(PydanticModel): - op: TableAlterOperationType - columns: t.List[TableAlterColumn] - column_type: exp.DataType - expected_table_struct: exp.DataType - add_position: t.Optional[TableAlterColumnPosition] = None - current_type: t.Optional[exp.DataType] = None - cascade: bool = False - - @classmethod - def add( - cls, - columns: t.Union[TableAlterColumn, t.List[TableAlterColumn]], - column_type: t.Union[str, exp.DataType], - expected_table_struct: t.Union[str, exp.DataType], - position: t.Optional[TableAlterColumnPosition] = None, - ) -> TableAlterOperation: - return cls( - op=TableAlterOperationType.ADD, - columns=ensure_list(columns), - column_type=exp.DataType.build(column_type), - add_position=position, - expected_table_struct=exp.DataType.build(expected_table_struct), - ) - - @classmethod - def drop( - cls, - columns: t.Union[TableAlterColumn, t.List[TableAlterColumn]], - expected_table_struct: t.Union[str, exp.DataType], - column_type: t.Optional[t.Union[str, exp.DataType]] = None, - cascade: bool = False, - ) -> TableAlterOperation: - column_type = exp.DataType.build(column_type) if column_type else exp.DataType.build("INT") - return cls( - op=TableAlterOperationType.DROP, - columns=ensure_list(columns), - column_type=column_type, - expected_table_struct=exp.DataType.build(expected_table_struct), - cascade=cascade, - ) - - @classmethod - def alter_type( - cls, - columns: t.Union[TableAlterColumn, t.List[TableAlterColumn]], - column_type: t.Union[str, exp.DataType], - current_type: t.Union[str, exp.DataType], - expected_table_struct: t.Union[str, exp.DataType], - position: t.Optional[TableAlterColumnPosition] = None, - ) -> TableAlterOperation: - return cls( - op=TableAlterOperationType.ALTER_TYPE, - columns=ensure_list(columns), - column_type=exp.DataType.build(column_type), - add_position=position, - current_type=exp.DataType.build(current_type), - expected_table_struct=exp.DataType.build(expected_table_struct), - ) - - @property - def is_add(self) -> bool: - return self.op.is_add - - @property - def is_drop(self) -> bool: - return self.op.is_drop - - @property - def is_alter_type(self) -> bool: - return self.op.is_alter_type - - def column_identifiers(self, array_element_selector: str) -> t.List[exp.Identifier]: - results = [] - for column in self.columns: - results.append(column.identifier) - if column.is_array_of_struct and len(self.columns) > 1 and array_element_selector: - results.append(exp.to_identifier(array_element_selector)) - return results - - def column(self, array_element_selector: str) -> t.Union[exp.Dot, exp.Identifier]: - columns = self.column_identifiers(array_element_selector) - if len(columns) == 1: - return columns[0] - return exp.Dot.build(columns) - - def column_def(self, array_element_selector: str) -> exp.ColumnDef: - return exp.ColumnDef( - this=self.column(array_element_selector), - kind=self.column_type, - ) - - def expression( - self, table_name: t.Union[str, exp.Table], array_element_selector: str - ) -> exp.Alter: - if self.is_alter_type: - return exp.Alter( - this=exp.to_table(table_name), - kind="TABLE", - actions=[ - exp.AlterColumn( - this=self.column(array_element_selector), - dtype=self.column_type, - ) - ], - ) - if self.is_add: - alter_table = exp.Alter(this=exp.to_table(table_name), kind="TABLE") - column = self.column_def(array_element_selector) - alter_table.set("actions", [column]) - if self.add_position: - column.set("position", self.add_position.column_position_node) - return alter_table - if self.is_drop: - alter_table = exp.Alter(this=exp.to_table(table_name), kind="TABLE") - drop_column = exp.Drop( - this=self.column(array_element_selector), kind="COLUMN", cascade=self.cascade - ) - alter_table.set("actions", [drop_column]) - return alter_table - raise ValueError(f"Unknown operation {self.op}") - - class SchemaDiffer(PydanticModel): """ Compares a source schema against a target schema and returns a list of alter statements to have the source @@ -467,16 +462,21 @@ def _drop_operation( struct: exp.DataType, pos: int, root_struct: exp.DataType, - ) -> t.List[TableAlterOperation]: + table_name: TableName, + ) -> t.List[TableAlterColumnOperation]: columns = ensure_list(columns) - operations = [] + operations: t.List[TableAlterColumnOperation] = [] column_pos, column_kwarg = self._get_matching_kwarg(columns[-1].name, struct, pos) assert column_pos is not None assert column_kwarg struct.expressions.pop(column_pos) operations.append( - TableAlterOperation.drop( - columns, root_struct.copy(), column_kwarg.args["kind"], cascade=self.drop_cascade + TableAlterDropColumnOperation( + target_table=exp.to_table(table_name), + column_parts=columns, + expected_table_struct=root_struct.copy(), + cascade=self.drop_cascade, + array_element_selector=self.array_element_selector, ) ) return operations @@ -496,14 +496,17 @@ def _resolve_drop_operation( current_struct: exp.DataType, new_struct: exp.DataType, root_struct: exp.DataType, - ) -> t.List[TableAlterOperation]: + table_name: TableName, + ) -> t.List[TableAlterColumnOperation]: operations = [] for current_pos, current_kwarg in enumerate(current_struct.expressions.copy()): new_pos, _ = self._get_matching_kwarg(current_kwarg, new_struct, current_pos) columns = parent_columns + [TableAlterColumn.from_struct_kwarg(current_kwarg)] if new_pos is None: operations.extend( - self._drop_operation(columns, current_struct, current_pos, root_struct) + self._drop_operation( + columns, current_struct, current_pos, root_struct, table_name + ) ) return operations @@ -514,7 +517,9 @@ def _add_operation( new_kwarg: exp.ColumnDef, current_struct: exp.DataType, root_struct: exp.DataType, - ) -> t.List[TableAlterOperation]: + table_name: TableName, + is_part_of_destructive_change: bool = False, + ) -> t.List[TableAlterColumnOperation]: if self.support_positional_add: col_pos = TableAlterColumnPosition.create(new_pos, current_struct.expressions) current_struct.expressions.insert(new_pos, new_kwarg) @@ -522,11 +527,14 @@ def _add_operation( col_pos = None current_struct.expressions.append(new_kwarg) return [ - TableAlterOperation.add( - columns, - new_kwarg.args["kind"], - root_struct.copy(), - col_pos, + TableAlterAddColumnOperation( + target_table=exp.to_table(table_name), + column_parts=columns, + column_type=new_kwarg.args["kind"], + expected_table_struct=root_struct.copy(), + position=col_pos, + is_part_of_destructive_change=is_part_of_destructive_change, + array_element_selector=self.array_element_selector, ) ] @@ -536,14 +544,17 @@ def _resolve_add_operations( current_struct: exp.DataType, new_struct: exp.DataType, root_struct: exp.DataType, - ) -> t.List[TableAlterOperation]: + table_name: TableName, + ) -> t.List[TableAlterColumnOperation]: operations = [] for new_pos, new_kwarg in enumerate(new_struct.expressions): possible_current_pos, _ = self._get_matching_kwarg(new_kwarg, current_struct, new_pos) if possible_current_pos is None: columns = parent_columns + [TableAlterColumn.from_struct_kwarg(new_kwarg)] operations.extend( - self._add_operation(columns, new_pos, new_kwarg, current_struct, root_struct) + self._add_operation( + columns, new_pos, new_kwarg, current_struct, root_struct, table_name + ) ) return operations @@ -556,9 +567,11 @@ def _alter_operation( current_type: t.Union[str, exp.DataType], root_struct: exp.DataType, new_kwarg: exp.ColumnDef, + table_name: TableName, *, ignore_destructive: bool = False, - ) -> t.List[TableAlterOperation]: + ignore_additive: bool = False, + ) -> t.List[TableAlterColumnOperation]: # We don't copy on purpose here because current_type may need to be mutated inside # _get_operations (struct.expressions.pop and struct.expressions.insert) current_type = exp.DataType.build(current_type, copy=False) @@ -572,7 +585,9 @@ def _alter_operation( current_type, new_type, root_struct, + table_name, ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, ) if new_type.this == current_type.this == exp.DataType.Type.ARRAY: @@ -590,31 +605,43 @@ def _alter_operation( current_array_type, new_array_type, root_struct, + table_name, ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, ) if self._is_coerceable_type(current_type, new_type): return [] if self._is_compatible_type(current_type, new_type): + if ignore_additive: + return [] struct.expressions.pop(pos) struct.expressions.insert(pos, new_kwarg) - col_pos = ( - TableAlterColumnPosition.create(pos, struct.expressions, replacing_col=True) - if self.support_positional_add - else None - ) return [ - TableAlterOperation.alter_type( - columns, - new_type, - current_type, - root_struct.copy(), - col_pos, + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table(table_name), + column_parts=columns, + column_type=new_type, + current_type=current_type, + expected_table_struct=root_struct.copy(), + array_element_selector=self.array_element_selector, ) ] if ignore_destructive: return [] - return self._drop_operation(columns, root_struct, pos, root_struct) + self._add_operation( - columns, pos, new_kwarg, struct, root_struct + return self._drop_operation( + columns, + root_struct, + pos, + root_struct, + table_name, + ) + self._add_operation( + columns, + pos, + new_kwarg, + struct, + root_struct, + table_name, + is_part_of_destructive_change=True, ) def _resolve_alter_operations( @@ -623,9 +650,11 @@ def _resolve_alter_operations( current_struct: exp.DataType, new_struct: exp.DataType, root_struct: exp.DataType, + table_name: TableName, *, ignore_destructive: bool = False, - ) -> t.List[TableAlterOperation]: + ignore_additive: bool = False, + ) -> t.List[TableAlterColumnOperation]: operations = [] for current_pos, current_kwarg in enumerate(current_struct.expressions.copy()): _, new_kwarg = self._get_matching_kwarg(current_kwarg, new_struct, current_pos) @@ -647,7 +676,9 @@ def _resolve_alter_operations( current_type, root_struct, new_kwarg, + table_name, ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, ) ) return operations @@ -658,21 +689,26 @@ def _get_operations( current_struct: exp.DataType, new_struct: exp.DataType, root_struct: exp.DataType, + table_name: TableName, *, ignore_destructive: bool = False, - ) -> t.List[TableAlterOperation]: + ignore_additive: bool = False, + ) -> t.List[TableAlterColumnOperation]: root_struct = root_struct or current_struct parent_columns = parent_columns or [] operations = [] if not ignore_destructive: operations.extend( self._resolve_drop_operation( - parent_columns, current_struct, new_struct, root_struct + parent_columns, current_struct, new_struct, root_struct, table_name + ) + ) + if not ignore_additive: + operations.extend( + self._resolve_add_operations( + parent_columns, current_struct, new_struct, root_struct, table_name ) ) - operations.extend( - self._resolve_add_operations(parent_columns, current_struct, new_struct, root_struct) - ) operations.extend( self._resolve_alter_operations( parent_columns, @@ -680,6 +716,8 @@ def _get_operations( new_struct, root_struct, ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, + table_name=table_name, ) ) return operations @@ -688,11 +726,19 @@ def _from_structs( self, current_struct: exp.DataType, new_struct: exp.DataType, + table_name: TableName, *, ignore_destructive: bool = False, - ) -> t.List[TableAlterOperation]: + ignore_additive: bool = False, + ) -> t.List[TableAlterColumnOperation]: return self._get_operations( - [], current_struct, new_struct, current_struct, ignore_destructive=ignore_destructive + [], + current_struct, + new_struct, + current_struct, + table_name=table_name, + ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, ) def _compare_structs( @@ -702,11 +748,15 @@ def _compare_structs( new: exp.DataType, *, ignore_destructive: bool = False, - ) -> t.List[exp.Alter]: - return [ - op.expression(table_name, self.array_element_selector) - for op in self._from_structs(current, new, ignore_destructive=ignore_destructive) - ] + ignore_additive: bool = False, + ) -> t.List[TableAlterColumnOperation]: + return self._from_structs( + current, + new, + table_name=table_name, + ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, + ) def compare_columns( self, @@ -715,31 +765,45 @@ def compare_columns( new: t.Dict[str, exp.DataType], *, ignore_destructive: bool = False, - ) -> t.List[exp.Alter]: + ignore_additive: bool = False, + ) -> t.List[TableAlterColumnOperation]: return self._compare_structs( table_name, columns_to_types_to_struct(current), columns_to_types_to_struct(new), ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, ) -def has_drop_alteration(alter_expressions: t.List[exp.Alter]) -> bool: - return any( - isinstance(action, exp.Drop) - for actions in alter_expressions - for action in actions.args.get("actions", []) - ) +def has_drop_alteration(alter_operations: t.List[TableAlterOperation]) -> bool: + return any(op.is_destructive for op in alter_operations) + + +def has_additive_alteration(alter_operations: t.List[TableAlterOperation]) -> bool: + return any(op.is_additive for op in alter_operations) + + +def get_additive_changes( + alter_operations: t.List[TableAlterOperation], +) -> t.List[TableAlterOperation]: + return [x for x in alter_operations if x.is_additive] + + +def get_dropped_column_names(alter_expressions: t.List[TableAlterOperation]) -> t.List[str]: + return [ + op.column.alias_or_name + for op in alter_expressions + if isinstance(op, TableAlterDropColumnOperation) + ] -def get_dropped_column_names(alter_expressions: t.List[exp.Alter]) -> t.List[str]: - dropped_columns = [] - for actions in alter_expressions: - for action in actions.args.get("actions", []): - if isinstance(action, exp.Drop): - if action.kind == "COLUMN": - dropped_columns.append(action.alias_or_name) - return dropped_columns +def get_additive_column_names(alter_expressions: t.List[TableAlterOperation]) -> t.List[str]: + return [ + op.column.alias_or_name + for op in alter_expressions + if op.is_additive and isinstance(op, TableAlterColumnOperation) + ] def get_schema_differ(dialect: str) -> SchemaDiffer: diff --git a/sqlmesh/core/snapshot/definition.py b/sqlmesh/core/snapshot/definition.py index ec5a883f7f..bd2fd47bd0 100644 --- a/sqlmesh/core/snapshot/definition.py +++ b/sqlmesh/core/snapshot/definition.py @@ -1141,6 +1141,16 @@ def needs_destructive_check( and self.name not in allow_destructive_snapshots ) + def needs_additive_check( + self, + allow_additive_snapshots: t.Set[str], + ) -> bool: + return ( + self.is_model + and not self.model.on_additive_change.is_allow + and self.name not in allow_additive_snapshots + ) + def get_next_auto_restatement_interval(self, execution_time: TimeLike) -> t.Optional[Interval]: """Returns the next auto restatement interval for the snapshot. diff --git a/sqlmesh/core/snapshot/evaluator.py b/sqlmesh/core/snapshot/evaluator.py index 1531997c1b..e891c752b1 100644 --- a/sqlmesh/core/snapshot/evaluator.py +++ b/sqlmesh/core/snapshot/evaluator.py @@ -50,8 +50,13 @@ ViewKind, CustomKind, ) +from sqlmesh.core.model.kind import _Incremental from sqlmesh.utils import CompletionStatus -from sqlmesh.core.schema_diff import has_drop_alteration, get_dropped_column_names +from sqlmesh.core.schema_diff import ( + has_drop_alteration, + TableAlterOperation, + has_additive_alteration, +) from sqlmesh.core.snapshot import ( DeployabilityIndex, Intervals, @@ -72,6 +77,8 @@ DestructiveChangeError, SQLMeshError, format_destructive_change_msg, + format_additive_change_msg, + AdditiveChangeError, ) if sys.version_info >= (3, 12): @@ -138,6 +145,7 @@ def evaluate( execution_time: TimeLike, snapshots: t.Dict[str, Snapshot], allow_destructive_snapshots: t.Optional[t.Set[str]] = None, + allow_additive_snapshots: t.Optional[t.Set[str]] = None, deployability_index: t.Optional[DeployabilityIndex] = None, batch_index: int = 0, target_table_exists: t.Optional[bool] = None, @@ -152,6 +160,7 @@ def evaluate( execution_time: The date/time time reference to use for execution time. snapshots: All upstream snapshots (by name) to use for expansion and mapping of physical locations. allow_destructive_snapshots: Snapshots for which destructive schema changes are allowed. + allow_additive_snapshots: Snapshots for which additive schema changes are allowed. deployability_index: Determines snapshots that are deployable in the context of this evaluation. batch_index: If the snapshot is part of a batch of related snapshots; which index in the batch is it target_table_exists: Whether the target table exists. If None, the table will be checked for existence. @@ -167,6 +176,7 @@ def evaluate( snapshot=snapshot, snapshots=snapshots, allow_destructive_snapshots=allow_destructive_snapshots or set(), + allow_additive_snapshots=allow_additive_snapshots or set(), deployability_index=deployability_index, batch_index=batch_index, target_table_exists=target_table_exists, @@ -338,6 +348,7 @@ def create( on_start: t.Optional[t.Callable] = None, on_complete: t.Optional[t.Callable[[SnapshotInfoLike], None]] = None, allow_destructive_snapshots: t.Optional[t.Set[str]] = None, + allow_additive_snapshots: t.Optional[t.Set[str]] = None, ) -> CompletionStatus: """Creates a physical snapshot schema and table for the given collection of snapshots. @@ -348,6 +359,7 @@ def create( on_start: A callback to initialize the snapshot creation progress bar. on_complete: A callback to call on each successfully created snapshot. allow_destructive_snapshots: Set of snapshots that are allowed to have destructive schema changes. + allow_additive_snapshots: Set of snapshots that are allowed to have additive schema changes. Returns: CompletionStatus: The status of the creation operation (success, failure, nothing to do). @@ -366,6 +378,7 @@ def create( deployability_index=deployability_index, on_complete=on_complete, allow_destructive_snapshots=allow_destructive_snapshots or set(), + allow_additive_snapshots=allow_additive_snapshots or set(), ) return CompletionStatus.SUCCESS @@ -454,6 +467,7 @@ def _create_snapshots( deployability_index: DeployabilityIndex, on_complete: t.Optional[t.Callable[[SnapshotInfoLike], None]], allow_destructive_snapshots: t.Set[str], + allow_additive_snapshots: t.Set[str], ) -> None: """Internal method to create tables in parallel.""" with self.concurrent_context(): @@ -464,6 +478,7 @@ def _create_snapshots( snapshots=snapshots, deployability_index=deployability_index, allow_destructive_snapshots=allow_destructive_snapshots, + allow_additive_snapshots=allow_additive_snapshots, on_complete=on_complete, ), self.ddl_concurrent_tasks, @@ -477,6 +492,7 @@ def migrate( target_snapshots: t.Iterable[Snapshot], snapshots: t.Dict[SnapshotId, Snapshot], allow_destructive_snapshots: t.Optional[t.Set[str]] = None, + allow_additive_snapshots: t.Optional[t.Set[str]] = None, deployability_index: t.Optional[DeployabilityIndex] = None, ) -> None: """Alters a physical snapshot table to match its snapshot's schema for the given collection of snapshots. @@ -485,9 +501,11 @@ def migrate( target_snapshots: Target snapshots. snapshots: Mapping of snapshot ID to snapshot. allow_destructive_snapshots: Set of snapshots that are allowed to have destructive schema changes. + allow_additive_snapshots: Set of snapshots that are allowed to have additive schema changes. deployability_index: Determines snapshots that are deployable in the context of this evaluation. """ allow_destructive_snapshots = allow_destructive_snapshots or set() + allow_additive_snapshots = allow_additive_snapshots or set() deployability_index = deployability_index or DeployabilityIndex.all_deployable() snapshots_by_name = {s.name: s for s in snapshots.values()} with self.concurrent_context(): @@ -497,6 +515,7 @@ def migrate( s, snapshots_by_name, allow_destructive_snapshots, + allow_additive_snapshots, self.get_adapter(s.model_gateway), deployability_index, ), @@ -669,6 +688,7 @@ def _evaluate_snapshot( snapshot: Snapshot, snapshots: t.Dict[str, Snapshot], allow_destructive_snapshots: t.Set[str], + allow_additive_snapshots: t.Set[str], deployability_index: t.Optional[DeployabilityIndex], batch_index: int, target_table_exists: t.Optional[bool], @@ -683,6 +703,7 @@ def _evaluate_snapshot( execution_time: The date/time time reference to use for execution time. snapshots: All upstream snapshots to use for expansion and mapping of physical locations. allow_destructive_snapshots: Snapshots for which destructive schema changes are allowed. + allow_additive_snapshots: Snapshots for which additive schema changes are allowed. deployability_index: Determines snapshots that are deployable in the context of this evaluation. batch_index: If the snapshot is part of a batch of related snapshots; which index in the batch is it target_table_exists: Whether the target table exists. If None, the table will be checked for existence. @@ -750,6 +771,7 @@ def _evaluate_snapshot( render_kwargs=create_render_kwargs, rendered_physical_properties=rendered_physical_properties, allow_destructive_snapshots=allow_destructive_snapshots, + allow_additive_snapshots=allow_additive_snapshots, ) common_render_kwargs["runtime_stage"] = RuntimeStage.EVALUATING elif model.annotated or model.is_seed or model.kind.is_scd_type_2: @@ -780,6 +802,7 @@ def _evaluate_snapshot( snapshot=snapshot, snapshots=snapshots, render_kwargs=common_render_kwargs, + create_render_kwargs=create_render_kwargs, rendered_physical_properties=rendered_physical_properties, deployability_index=deployability_index, is_first_insert=is_first_insert, @@ -796,6 +819,7 @@ def create_snapshot( snapshots: t.Dict[str, Snapshot], deployability_index: DeployabilityIndex, allow_destructive_snapshots: t.Set[str], + allow_additive_snapshots: t.Set[str], on_complete: t.Optional[t.Callable[[SnapshotInfoLike], None]] = None, ) -> None: """Creates a physical table for the given snapshot. @@ -806,6 +830,7 @@ def create_snapshot( deployability_index: Determines snapshots that are deployable in the context of this creation. on_complete: A callback to call on each successfully created database object. allow_destructive_snapshots: Snapshots for which destructive schema changes are allowed. + allow_additive_snapshots: Snapshots for which additive schema changes are allowed. """ if not snapshot.is_model: return @@ -836,6 +861,7 @@ def create_snapshot( render_kwargs=create_render_kwargs, rendered_physical_properties=rendered_physical_properties, allow_destructive_snapshots=allow_destructive_snapshots, + allow_additive_snapshots=allow_additive_snapshots, ) else: is_table_deployable = deployability_index.is_deployable(snapshot) @@ -860,6 +886,7 @@ def _render_and_insert_snapshot( snapshot: Snapshot, snapshots: t.Dict[str, Snapshot], render_kwargs: t.Dict[str, t.Any], + create_render_kwargs: t.Dict[str, t.Any], rendered_physical_properties: t.Dict[str, exp.Expression], deployability_index: DeployabilityIndex, is_first_insert: bool, @@ -896,7 +923,7 @@ def apply(query_or_df: QueryOrDF, index: int = 0) -> None: end=end, execution_time=execution_time, physical_properties=rendered_physical_properties, - render_kwargs=render_kwargs, + render_kwargs=create_render_kwargs, ) else: logger.info( @@ -918,7 +945,7 @@ def apply(query_or_df: QueryOrDF, index: int = 0) -> None: end=end, execution_time=execution_time, physical_properties=rendered_physical_properties, - render_kwargs=render_kwargs, + render_kwargs=create_render_kwargs, ) # DataFrames, unlike SQL expressions, can provide partial results by yielding dataframes. As a result, @@ -982,6 +1009,7 @@ def _clone_snapshot_in_dev( render_kwargs: t.Dict[str, t.Any], rendered_physical_properties: t.Dict[str, exp.Expression], allow_destructive_snapshots: t.Set[str], + allow_additive_snapshots: t.Set[str], ) -> None: adapter = self.get_adapter(snapshot.model.gateway) @@ -1004,6 +1032,7 @@ def _clone_snapshot_in_dev( render_kwargs=render_kwargs, rendered_physical_properties=rendered_physical_properties, allow_destructive_snapshots=allow_destructive_snapshots, + allow_additive_snapshots=allow_additive_snapshots, ) except Exception: adapter.drop_table(target_table_name) @@ -1014,6 +1043,7 @@ def _migrate_snapshot( snapshot: Snapshot, snapshots: t.Dict[str, Snapshot], allow_destructive_snapshots: t.Set[str], + allow_additive_snapshots: t.Set[str], adapter: EngineAdapter, deployability_index: DeployabilityIndex, ) -> None: @@ -1051,6 +1081,7 @@ def _migrate_snapshot( **render_kwargs ), allow_destructive_snapshots=allow_destructive_snapshots, + allow_additive_snapshots=allow_additive_snapshots, run_pre_post_statements=True, ) @@ -1063,6 +1094,7 @@ def _migrate_target_table( render_kwargs: t.Dict[str, t.Any], rendered_physical_properties: t.Dict[str, exp.Expression], allow_destructive_snapshots: t.Set[str], + allow_additive_snapshots: t.Set[str], run_pre_post_statements: bool = False, ) -> None: adapter = self.get_adapter(snapshot.model.gateway) @@ -1092,7 +1124,9 @@ def _migrate_target_table( snapshot=snapshot, snapshots=snapshots, allow_destructive_snapshots=allow_destructive_snapshots, + allow_additive_snapshots=allow_additive_snapshots, ignore_destructive=snapshot.model.on_destructive_change.is_ignore, + ignore_additive=snapshot.model.on_additive_change.is_ignore, ) finally: if snapshot.is_materialized: @@ -1501,6 +1535,7 @@ def migrate( snapshot: Snapshot, *, ignore_destructive: bool, + ignore_additive: bool, **kwargs: t.Any, ) -> None: """Migrates the target table schema so that it corresponds to the source table schema. @@ -1511,6 +1546,8 @@ def migrate( snapshot: The target snapshot. ignore_destructive: If True, destructive changes are not created when migrating. This is used for forward-only models that are being migrated to a new version. + ignore_additive: If True, additive changes are not created when migrating. + This is used for forward-only models that are being migrated to a new version. """ @abc.abstractmethod @@ -1587,6 +1624,7 @@ def migrate( snapshot: Snapshot, *, ignore_destructive: bool, + ignore_additive: bool, **kwarg: t.Any, ) -> None: pass @@ -1713,16 +1751,23 @@ def migrate( snapshot: Snapshot, *, ignore_destructive: bool, + ignore_additive: bool, **kwargs: t.Any, ) -> None: logger.info(f"Altering table '{target_table_name}'") - alter_expressions = self.adapter.get_alter_expressions( - target_table_name, source_table_name, ignore_destructive=ignore_destructive + alter_operations = self.adapter.get_alter_operations( + target_table_name, + source_table_name, + ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, ) _check_destructive_schema_change( - snapshot, alter_expressions, kwargs["allow_destructive_snapshots"] + snapshot, alter_operations, kwargs["allow_destructive_snapshots"] + ) + _check_additive_schema_change( + snapshot, alter_operations, kwargs["allow_additive_snapshots"] ) - self.adapter.alter_table(alter_expressions) + self.adapter.alter_table(alter_operations) def delete(self, name: str, **kwargs: t.Any) -> None: _check_table_db_is_physical_schema(name, kwargs["physical_schema"]) @@ -1783,11 +1828,13 @@ def _get_target_and_source_columns( else: target_column_to_types = ( model.columns_to_types # type: ignore - if model.annotated and not model.on_destructive_change.is_ignore + if model.annotated + and not model.on_destructive_change.is_ignore + and not model.on_additive_change.is_ignore else self.adapter.columns(table_name) ) assert target_column_to_types is not None - if model.on_destructive_change.is_ignore: + if model.on_destructive_change.is_ignore or model.on_additive_change.is_ignore: # We need to identify the columns that are only in the source so we create an empty table with # the user query to determine that with self.adapter.temp_table(model.ctas_query(**render_kwargs)) as temp_table: @@ -2305,6 +2352,7 @@ def migrate( snapshot: Snapshot, *, ignore_destructive: bool, + ignore_additive: bool, **kwargs: t.Any, ) -> None: logger.info("Migrating view '%s'", target_table_name) @@ -2552,12 +2600,16 @@ def migrate( snapshot: Snapshot, *, ignore_destructive: bool, + ignore_additive: bool, **kwargs: t.Any, ) -> None: - potential_alter_expressions = self.adapter.get_alter_expressions( - target_table_name, source_table_name, ignore_destructive=ignore_destructive + potential_alter_operations = self.adapter.get_alter_operations( + target_table_name, + source_table_name, + ignore_destructive=ignore_destructive, + ignore_additive=ignore_additive, ) - if len(potential_alter_expressions) > 0: + if len(potential_alter_operations) > 0: # this can happen if a user changes a managed model and deliberately overrides a plan to be forward only, eg `sqlmesh plan --forward-only` raise SQLMeshError( f"The schema of the managed model '{target_table_name}' cannot be updated in a forward-only fashion." @@ -2584,36 +2636,65 @@ def _intervals(snapshot: Snapshot, deployability_index: DeployabilityIndex) -> I def _check_destructive_schema_change( snapshot: Snapshot, - alter_expressions: t.List[exp.Alter], + alter_operations: t.List[TableAlterOperation], allow_destructive_snapshots: t.Set[str], ) -> None: if ( snapshot.is_no_rebuild and snapshot.needs_destructive_check(allow_destructive_snapshots) - and has_drop_alteration(alter_expressions) + and has_drop_alteration(alter_operations) ): snapshot_name = snapshot.name - dropped_column_names = get_dropped_column_names(alter_expressions) model_dialect = snapshot.model.dialect if snapshot.model.on_destructive_change.is_warn: logger.warning( format_destructive_change_msg( snapshot_name, - dropped_column_names, - alter_expressions, + alter_operations, model_dialect, error=False, ) ) return raise DestructiveChangeError( - format_destructive_change_msg( - snapshot_name, dropped_column_names, alter_expressions, model_dialect - ) + format_destructive_change_msg(snapshot_name, alter_operations, model_dialect) ) +def _check_additive_schema_change( + snapshot: Snapshot, + alter_operations: t.List[TableAlterOperation], + allow_additive_snapshots: t.Set[str], +) -> None: + # Only check additive changes for incremental models that have the on_additive_change property + if not isinstance(snapshot.model.kind, _Incremental): + return + + if snapshot.needs_additive_check(allow_additive_snapshots) and has_additive_alteration( + alter_operations + ): + # Note: IGNORE filtering is applied before this function is called + # so if we reach here, additive changes are not being ignored + snapshot_name = snapshot.name + model_dialect = snapshot.model.dialect + + if snapshot.model.on_additive_change.is_warn: + logger.warning( + format_additive_change_msg( + snapshot_name, + alter_operations, + model_dialect, + error=False, + ) + ) + return + if snapshot.model.on_additive_change.is_error: + raise AdditiveChangeError( + format_additive_change_msg(snapshot_name, alter_operations, model_dialect) + ) + + def _check_table_db_is_physical_schema(table_name: str, physical_schema: str) -> None: table = exp.to_table(table_name) if table.db != physical_schema: diff --git a/sqlmesh/dbt/model.py b/sqlmesh/dbt/model.py index e35d8c16f4..c646392368 100644 --- a/sqlmesh/dbt/model.py +++ b/sqlmesh/dbt/model.py @@ -23,7 +23,7 @@ ManagedKind, create_sql_model, ) -from sqlmesh.core.model.kind import SCDType2ByTimeKind, OnDestructiveChange +from sqlmesh.core.model.kind import SCDType2ByTimeKind, OnDestructiveChange, OnAdditiveChange from sqlmesh.dbt.basemodel import BaseModelConfig, Materialization, SnapshotStrategy from sqlmesh.dbt.common import SqlStr, extract_jinja_config, sql_str_validator from sqlmesh.utils.errors import ConfigError @@ -91,7 +91,7 @@ class ModelConfig(BaseModelConfig): unique_key: t.Optional[t.List[str]] = None partition_by: t.Optional[t.Union[t.List[str], t.Dict[str, t.Any]]] = None full_refresh: t.Optional[bool] = None - on_schema_change: t.Optional[str] = None + on_schema_change: str = "ignore" # Snapshot (SCD Type 2) Fields updated_at: t.Optional[str] = None @@ -227,17 +227,31 @@ def model_kind(self, context: DbtContext) -> ModelKind: # args common to all sqlmesh incremental kinds, regardless of materialization incremental_kind_kwargs: t.Dict[str, t.Any] = {} - if self.on_schema_change: - on_schema_change = self.on_schema_change.lower() - - on_destructive_change = OnDestructiveChange.WARN - if on_schema_change == "sync_all_columns": - on_destructive_change = OnDestructiveChange.ALLOW - elif on_schema_change in ("fail", "append_new_columns", "ignore"): - on_destructive_change = OnDestructiveChange.ERROR - - incremental_kind_kwargs["on_destructive_change"] = on_destructive_change + on_schema_change = self.on_schema_change.lower() + if materialization == Materialization.SNAPSHOT: + # dbt snapshots default to `append_new_columns` behavior and can't be changed + on_schema_change = "append_new_columns" + + if on_schema_change == "ignore": + on_destructive_change = OnDestructiveChange.IGNORE + on_additive_change = OnAdditiveChange.IGNORE + elif on_schema_change == "fail": + on_destructive_change = OnDestructiveChange.ERROR + on_additive_change = OnAdditiveChange.ERROR + elif on_schema_change == "append_new_columns": + on_destructive_change = OnDestructiveChange.IGNORE + on_additive_change = OnAdditiveChange.ALLOW + elif on_schema_change == "sync_all_columns": + on_destructive_change = OnDestructiveChange.ALLOW + on_additive_change = OnAdditiveChange.ALLOW + else: + raise ConfigError( + f"{self.canonical_name(context)}: Invalid on_schema_change value '{on_schema_change}'. " + "Valid values are 'ignore', 'fail', 'append_new_columns', 'sync_all_columns'." + ) + incremental_kind_kwargs["on_destructive_change"] = on_destructive_change + incremental_kind_kwargs["on_additive_change"] = on_additive_change for field in ("forward_only", "auto_restatement_cron"): field_val = getattr(self, field, None) if field_val is None: diff --git a/sqlmesh/migrations/v0091_on_additive_change.py b/sqlmesh/migrations/v0091_on_additive_change.py new file mode 100644 index 0000000000..56059b982f --- /dev/null +++ b/sqlmesh/migrations/v0091_on_additive_change.py @@ -0,0 +1,5 @@ +"""Add on_additive_change to incremental model metadata hash.""" + + +def migrate(state_sync, **kwargs): # type: ignore + pass diff --git a/sqlmesh/utils/errors.py b/sqlmesh/utils/errors.py index 9974fdce0a..82ec311237 100644 --- a/sqlmesh/utils/errors.py +++ b/sqlmesh/utils/errors.py @@ -11,6 +11,7 @@ from requests.models import Response from sqlmesh.core.model import Model + from sqlmesh.core.schema_diff import TableAlterOperation class ErrorLevel(AutoName): @@ -129,6 +130,10 @@ class DestructiveChangeError(SQLMeshError): pass +class AdditiveChangeError(SQLMeshError): + pass + + class NotificationTargetError(SQLMeshError): pass @@ -210,27 +215,87 @@ def raise_for_status(response: Response) -> None: raise ApiServerError(response.text, response.status_code) -def format_destructive_change_msg( +def _format_schema_change_msg( snapshot_name: str, - dropped_column_names: t.List[str], - alter_expressions: t.List[exp.Alter], + is_destructive: bool, + alter_operations: t.List[TableAlterOperation], dialect: str, error: bool = True, ) -> str: - dropped_column_str = "', '".join(dropped_column_names) - dropped_column_msg = ( - f" that drops column{'s' if dropped_column_names and len(dropped_column_names) > 1 else ''} '{dropped_column_str}'" - if dropped_column_str + """ + Common function to format schema change messages. + + Args: + snapshot_name: Name of the model/snapshot + is_destructive: if change is destructive else it would be additive + alter_operations: List of table alter operations + dialect: SQL dialect for formatting + error: Whether this is an error or warning + """ + from sqlmesh.core.schema_diff import get_dropped_column_names, get_additive_column_names + + change_type = "destructive" if is_destructive else "additive" + setting_name = "on_destructive_change" if is_destructive else "on_additive_change" + action_verb = "drops" if is_destructive else "adds" + cli_flag = "--allow-destructive-model" if is_destructive else "--allow-additive-model" + + column_names = ( + get_dropped_column_names(alter_operations) + if is_destructive + else get_additive_column_names(alter_operations) + ) + column_str = "', '".join(column_names) + column_msg = ( + f" that {action_verb} column{'s' if column_names and len(column_names) > 1 else ''} '{column_str}'" + if column_str else "" ) + # Format ALTER expressions alter_expr_msg = "\n\nSchema changes:\n " + "\n ".join( - [alter.sql(dialect) for alter in alter_expressions] + [alter.expression.sql(dialect) for alter in alter_operations] ) + # Main warning message warning_msg = ( - f"Plan requires a destructive change to forward-only model '{snapshot_name}'s schema" + f"Plan requires {change_type} change to forward-only model '{snapshot_name}'s schema" ) - err_msg = "\n\nTo allow the destructive change, set the model's `on_destructive_change` setting to `warn` or `allow` or include the model in the plan's `--allow-destructive-model` option.\n" - return f"\n{warning_msg}{dropped_column_msg}.{alter_expr_msg}{err_msg if error else ''}" + if error: + permissive_values = "`warn`, `allow`, or `ignore`" + cli_part = f" or include the model in the plan's `{cli_flag}` option" + err_msg = f"\n\nTo allow the {change_type} change, set the model's `{setting_name}` setting to {permissive_values}{cli_part}.\n" + else: + err_msg = "" + + return f"\n{warning_msg}{column_msg}.{alter_expr_msg}{err_msg}" + + +def format_destructive_change_msg( + snapshot_name: str, + alter_expressions: t.List[TableAlterOperation], + dialect: str, + error: bool = True, +) -> str: + return _format_schema_change_msg( + snapshot_name=snapshot_name, + is_destructive=True, + alter_operations=alter_expressions, + dialect=dialect, + error=error, + ) + + +def format_additive_change_msg( + snapshot_name: str, + alter_operations: t.List[TableAlterOperation], + dialect: str, + error: bool = True, +) -> str: + return _format_schema_change_msg( + snapshot_name=snapshot_name, + is_destructive=False, + alter_operations=alter_operations, + dialect=dialect, + error=error, + ) diff --git a/tests/conftest.py b/tests/conftest.py index 85780d2db4..e5bbc4f425 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -32,7 +32,7 @@ from sqlmesh.core import lineage from sqlmesh.core.macros import macro from sqlmesh.core.model import IncrementalByTimeRangeKind, SqlModel, model -from sqlmesh.core.model.kind import OnDestructiveChange +from sqlmesh.core.model.kind import OnDestructiveChange, OnAdditiveChange from sqlmesh.core.plan import BuiltInPlanEvaluator, Plan, stages as plan_stages from sqlmesh.core.snapshot import ( DeployabilityIndex, @@ -451,6 +451,52 @@ def _make_function( return _make_function +@pytest.fixture +def make_snapshot_on_additive_change(make_snapshot: t.Callable) -> t.Callable: + def _make_function( + name: str = "a", + old_query: str = "select '1' as one, '2' as two, '2022-01-01' ds", + new_query: str = "select '1' as one, '2' as two, '3' as three, '2022-01-01' ds", + on_additive_change: OnAdditiveChange = OnAdditiveChange.ERROR, + ) -> t.Tuple[Snapshot, Snapshot]: + snapshot_old = make_snapshot( + SqlModel( + name=name, + dialect="duckdb", + query=parse_one(old_query), + kind=IncrementalByTimeRangeKind( + time_column="ds", forward_only=True, on_additive_change=on_additive_change + ), + ) + ) + + snapshot = make_snapshot( + SqlModel( + name=name, + dialect="duckdb", + query=parse_one(new_query), + kind=IncrementalByTimeRangeKind( + time_column="ds", forward_only=True, on_additive_change=on_additive_change + ), + ) + ) + snapshot.previous_versions = ( + SnapshotDataVersion( + fingerprint=SnapshotFingerprint( + data_hash="test_data_hash", + metadata_hash="test_metadata_hash", + ), + version="test_version", + change_category=SnapshotChangeCategory.NON_BREAKING, + dev_table_suffix="dev", + ), + ) + + return snapshot_old, snapshot + + return _make_function + + @pytest.fixture def random_name() -> t.Callable: return lambda: f"generated_{random_id()}" diff --git a/tests/core/engine_adapter/integration/test_integration.py b/tests/core/engine_adapter/integration/test_integration.py index d1a9c5afaa..e6e4cfdeb6 100644 --- a/tests/core/engine_adapter/integration/test_integration.py +++ b/tests/core/engine_adapter/integration/test_integration.py @@ -272,9 +272,9 @@ def test_ctas_source_columns(ctx_query_and_df: TestContext): input_data = pd.DataFrame( [ - {"id": 1, "ds": "2022-01-01"}, - {"id": 2, "ds": "2022-01-02"}, - {"id": 3, "ds": "2022-01-03"}, + {"id": 1, "ds": "2022-01-01", "ignored_source": "ignored_value"}, + {"id": 2, "ds": "2022-01-02", "ignored_source": "ignored_value"}, + {"id": 3, "ds": "2022-01-03", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.ctas( @@ -284,11 +284,12 @@ def test_ctas_source_columns(ctx_query_and_df: TestContext): column_descriptions={"id": "test id column description"}, table_format=ctx.default_table_format, target_columns_to_types=columns_to_types, - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], ) expected_data = input_data.copy() expected_data["ignored_column"] = pd.Series() + expected_data = expected_data.drop(columns=["ignored_source"]) results = ctx.get_metadata_results(schema=table.db) assert len(results.views) == 0 @@ -360,9 +361,9 @@ def test_create_view_source_columns(ctx_query_and_df: TestContext): input_data = pd.DataFrame( [ - {"id": 1, "ds": "2022-01-01"}, - {"id": 2, "ds": "2022-01-02"}, - {"id": 3, "ds": "2022-01-03"}, + {"id": 1, "ds": "2022-01-01", "ignored_source": "ignored_value"}, + {"id": 2, "ds": "2022-01-02", "ignored_source": "ignored_value"}, + {"id": 3, "ds": "2022-01-03", "ignored_source": "ignored_value"}, ] ) view = ctx.table("test_view") @@ -371,12 +372,13 @@ def test_create_view_source_columns(ctx_query_and_df: TestContext): ctx.input_data(input_data), table_description="test view description", column_descriptions={"id": "test id column description"}, - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], target_columns_to_types=columns_to_types, ) expected_data = input_data.copy() expected_data["ignored_column"] = pd.Series() + expected_data = expected_data.drop(columns=["ignored_source"]) results = ctx.get_metadata_results() assert len(results.tables) == 0 @@ -550,9 +552,9 @@ def test_replace_query_source_columns(ctx_query_and_df: TestContext): # Initial Load input_data = pd.DataFrame( [ - {"id": 1, "ds": "2022-01-01"}, - {"id": 2, "ds": "2022-01-02"}, - {"id": 3, "ds": "2022-01-03"}, + {"id": 1, "ds": "2022-01-01", "ignored_source": "ignored_value"}, + {"id": 2, "ds": "2022-01-02", "ignored_source": "ignored_value"}, + {"id": 3, "ds": "2022-01-03", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.create_table(table, columns_to_types, table_format=ctx.default_table_format) @@ -560,11 +562,12 @@ def test_replace_query_source_columns(ctx_query_and_df: TestContext): table, ctx.input_data(input_data), table_format=ctx.default_table_format, - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], target_columns_to_types=columns_to_types, ) expected_data = input_data.copy() expected_data["ignored_column"] = pd.Series() + expected_data = expected_data.drop(columns=["ignored_source"]) results = ctx.get_metadata_results() assert len(results.views) == 0 @@ -709,19 +712,21 @@ def test_insert_append_source_columns(ctx_query_and_df: TestContext): # Initial Load input_data = pd.DataFrame( [ - {"id": 1, "ds": "2022-01-01"}, - {"id": 2, "ds": "2022-01-02"}, - {"id": 3, "ds": "2022-01-03"}, + {"id": 1, "ds": "2022-01-01", "ignored_source": "ignored_value"}, + {"id": 2, "ds": "2022-01-02", "ignored_source": "ignored_value"}, + {"id": 3, "ds": "2022-01-03", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.insert_append( table, ctx.input_data(input_data), - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], target_columns_to_types=columns_to_types, ) expected_data = input_data.copy() expected_data["ignored_column"] = pd.Series() + expected_data = expected_data.drop(columns=["ignored_source"]) + results = ctx.get_metadata_results() assert len(results.views) == 0 assert len(results.materialized_views) == 0 @@ -733,19 +738,21 @@ def test_insert_append_source_columns(ctx_query_and_df: TestContext): if ctx.test_type == "df": append_data = pd.DataFrame( [ - {"id": 4, "ds": "2022-01-04"}, - {"id": 5, "ds": "2022-01-05"}, - {"id": 6, "ds": "2022-01-06"}, + {"id": 4, "ds": "2022-01-04", "ignored_source": "ignored_value"}, + {"id": 5, "ds": "2022-01-05", "ignored_source": "ignored_value"}, + {"id": 6, "ds": "2022-01-06", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.insert_append( table, ctx.input_data(append_data), - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], target_columns_to_types=columns_to_types, ) append_expected_data = append_data.copy() append_expected_data["ignored_column"] = pd.Series() + append_expected_data = append_expected_data.drop(columns=["ignored_source"]) + results = ctx.get_metadata_results() assert len(results.views) == 0 assert len(results.materialized_views) == 0 @@ -871,9 +878,9 @@ def test_insert_overwrite_by_time_partition_source_columns(ctx_query_and_df: Tes ) input_data = pd.DataFrame( [ - {"id": 1, ctx.time_column: "2022-01-01"}, - {"id": 2, ctx.time_column: "2022-01-02"}, - {"id": 3, ctx.time_column: "2022-01-03"}, + {"id": 1, ctx.time_column: "2022-01-01", "ignored_source": "ignored_value"}, + {"id": 2, ctx.time_column: "2022-01-02", "ignored_source": "ignored_value"}, + {"id": 3, ctx.time_column: "2022-01-03", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.insert_overwrite_by_time_partition( @@ -884,10 +891,11 @@ def test_insert_overwrite_by_time_partition_source_columns(ctx_query_and_df: Tes time_formatter=ctx.time_formatter, time_column=ctx.time_column, target_columns_to_types=columns_to_types, - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], ) expected_data = input_data.copy() + expected_data = expected_data.drop(columns=["ignored_source"]) expected_data.insert(len(expected_data.columns) - 1, "ignored_column", pd.Series()) results = ctx.get_metadata_results() @@ -905,9 +913,9 @@ def test_insert_overwrite_by_time_partition_source_columns(ctx_query_and_df: Tes if ctx.test_type == "df": overwrite_data = pd.DataFrame( [ - {"id": 10, ctx.time_column: "2022-01-03"}, - {"id": 4, ctx.time_column: "2022-01-04"}, - {"id": 5, ctx.time_column: "2022-01-05"}, + {"id": 10, ctx.time_column: "2022-01-03", "ignored_source": "ignored_value"}, + {"id": 4, ctx.time_column: "2022-01-04", "ignored_source": "ignored_value"}, + {"id": 5, ctx.time_column: "2022-01-05", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.insert_overwrite_by_time_partition( @@ -918,7 +926,7 @@ def test_insert_overwrite_by_time_partition_source_columns(ctx_query_and_df: Tes time_formatter=ctx.time_formatter, time_column=ctx.time_column, target_columns_to_types=columns_to_types, - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], ) results = ctx.get_metadata_results() assert len(results.views) == 0 @@ -1025,9 +1033,9 @@ def test_merge_source_columns(ctx_query_and_df: TestContext): ctx.engine_adapter.create_table(table, columns_to_types, table_format=table_format) input_data = pd.DataFrame( [ - {"id": 1, "ds": "2022-01-01"}, - {"id": 2, "ds": "2022-01-02"}, - {"id": 3, "ds": "2022-01-03"}, + {"id": 1, "ds": "2022-01-01", "ignored_source": "ignored_value"}, + {"id": 2, "ds": "2022-01-02", "ignored_source": "ignored_value"}, + {"id": 3, "ds": "2022-01-03", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.merge( @@ -1035,11 +1043,12 @@ def test_merge_source_columns(ctx_query_and_df: TestContext): ctx.input_data(input_data), unique_key=[exp.to_identifier("id")], target_columns_to_types=columns_to_types, - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], ) expected_data = input_data.copy() expected_data["ignored_column"] = pd.Series() + expected_data = expected_data.drop(columns=["ignored_source"]) results = ctx.get_metadata_results() assert len(results.views) == 0 @@ -1052,9 +1061,9 @@ def test_merge_source_columns(ctx_query_and_df: TestContext): if ctx.test_type == "df": merge_data = pd.DataFrame( [ - {"id": 2, "ds": "2022-01-10"}, - {"id": 4, "ds": "2022-01-04"}, - {"id": 5, "ds": "2022-01-05"}, + {"id": 2, "ds": "2022-01-10", "ignored_source": "ignored_value"}, + {"id": 4, "ds": "2022-01-04", "ignored_source": "ignored_value"}, + {"id": 5, "ds": "2022-01-05", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.merge( @@ -1062,7 +1071,7 @@ def test_merge_source_columns(ctx_query_and_df: TestContext): ctx.input_data(merge_data), unique_key=[exp.to_identifier("id")], target_columns_to_types=columns_to_types, - source_columns=["id", "ds"], + source_columns=["id", "ds", "ignored_source"], ) results = ctx.get_metadata_results() @@ -1265,9 +1274,24 @@ def test_scd_type_2_by_time_source_columns(ctx_query_and_df: TestContext): ctx.engine_adapter.create_table(table, columns_to_types, table_format=ctx.default_table_format) input_data = pd.DataFrame( [ - {"id": 1, "name": "a", "updated_at": "2022-01-01 00:00:00"}, - {"id": 2, "name": "b", "updated_at": "2022-01-02 00:00:00"}, - {"id": 3, "name": "c", "updated_at": "2022-01-03 00:00:00"}, + { + "id": 1, + "name": "a", + "updated_at": "2022-01-01 00:00:00", + "ignored_source": "ignored_value", + }, + { + "id": 2, + "name": "b", + "updated_at": "2022-01-02 00:00:00", + "ignored_source": "ignored_value", + }, + { + "id": 3, + "name": "c", + "updated_at": "2022-01-03 00:00:00", + "ignored_source": "ignored_value", + }, ] ) ctx.engine_adapter.scd_type_2_by_time( @@ -1283,7 +1307,7 @@ def test_scd_type_2_by_time_source_columns(ctx_query_and_df: TestContext): truncate=True, start="2022-01-01 00:00:00", target_columns_to_types=columns_to_types, - source_columns=["id", "name", "updated_at"], + source_columns=["id", "name", "updated_at", "ignored_source"], ) results = ctx.get_metadata_results() assert len(results.views) == 0 @@ -1329,13 +1353,28 @@ def test_scd_type_2_by_time_source_columns(ctx_query_and_df: TestContext): current_data = pd.DataFrame( [ # Change `a` to `x` - {"id": 1, "name": "x", "updated_at": "2022-01-04 00:00:00"}, + { + "id": 1, + "name": "x", + "updated_at": "2022-01-04 00:00:00", + "ignored_source": "ignored_value", + }, # Delete - # {"id": 2, "name": "b", "updated_at": "2022-01-02 00:00:00"}, + # {"id": 2, "name": "b", "updated_at": "2022-01-02 00:00:00", "ignored_source": "ignored_value"}, # No change - {"id": 3, "name": "c", "updated_at": "2022-01-03 00:00:00"}, + { + "id": 3, + "name": "c", + "updated_at": "2022-01-03 00:00:00", + "ignored_source": "ignored_value", + }, # Add - {"id": 4, "name": "d", "updated_at": "2022-01-04 00:00:00"}, + { + "id": 4, + "name": "d", + "updated_at": "2022-01-04 00:00:00", + "ignored_source": "ignored_value", + }, ] ) ctx.engine_adapter.scd_type_2_by_time( @@ -1351,7 +1390,7 @@ def test_scd_type_2_by_time_source_columns(ctx_query_and_df: TestContext): truncate=False, start="2022-01-01 00:00:00", target_columns_to_types=columns_to_types, - source_columns=["id", "name", "updated_at"], + source_columns=["id", "name", "updated_at", "ignored_source"], ) results = ctx.get_metadata_results() assert len(results.views) == 0 @@ -1610,10 +1649,10 @@ def test_scd_type_2_by_column_source_columns(ctx_query_and_df: TestContext): ctx.engine_adapter.create_table(table, columns_to_types, table_format=ctx.default_table_format) input_data = pd.DataFrame( [ - {"id": 1, "name": "a", "status": "active"}, - {"id": 2, "name": "b", "status": "inactive"}, - {"id": 3, "name": "c", "status": "active"}, - {"id": 4, "name": "d", "status": "active"}, + {"id": 1, "name": "a", "status": "active", "ignored_source": "ignored_value"}, + {"id": 2, "name": "b", "status": "inactive", "ignored_source": "ignored_value"}, + {"id": 3, "name": "c", "status": "active", "ignored_source": "ignored_value"}, + {"id": 4, "name": "d", "status": "active", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.scd_type_2_by_column( @@ -1628,7 +1667,7 @@ def test_scd_type_2_by_column_source_columns(ctx_query_and_df: TestContext): truncate=True, start="2023-01-01", target_columns_to_types=columns_to_types, - source_columns=["id", "name", "status"], + source_columns=["id", "name", "status", "ignored_source"], ) results = ctx.get_metadata_results() assert len(results.views) == 0 @@ -1682,15 +1721,15 @@ def test_scd_type_2_by_column_source_columns(ctx_query_and_df: TestContext): current_data = pd.DataFrame( [ # Change `a` to `x` - {"id": 1, "name": "x", "status": "active"}, + {"id": 1, "name": "x", "status": "active", "ignored_source": "ignored_value"}, # Delete - # {"id": 2, "name": "b", status: "inactive"}, + # {"id": 2, "name": "b", status: "inactive", "ignored_source": "ignored_value"}, # No change - {"id": 3, "name": "c", "status": "active"}, + {"id": 3, "name": "c", "status": "active", "ignored_source": "ignored_value"}, # Change status to inactive - {"id": 4, "name": "d", "status": "inactive"}, + {"id": 4, "name": "d", "status": "inactive", "ignored_source": "ignored_value"}, # Add - {"id": 5, "name": "e", "status": "inactive"}, + {"id": 5, "name": "e", "status": "inactive", "ignored_source": "ignored_value"}, ] ) ctx.engine_adapter.scd_type_2_by_column( @@ -1705,7 +1744,7 @@ def test_scd_type_2_by_column_source_columns(ctx_query_and_df: TestContext): truncate=False, start="2023-01-01", target_columns_to_types=columns_to_types, - source_columns=["id", "name", "status"], + source_columns=["id", "name", "status", "ignored_source"], ) results = ctx.get_metadata_results() assert len(results.views) == 0 diff --git a/tests/core/engine_adapter/integration/test_integration_bigquery.py b/tests/core/engine_adapter/integration/test_integration_bigquery.py index e1cfaded13..66a647dc80 100644 --- a/tests/core/engine_adapter/integration/test_integration_bigquery.py +++ b/tests/core/engine_adapter/integration/test_integration_bigquery.py @@ -7,7 +7,10 @@ from sqlmesh.cli.project_init import ProjectTemplate, init_example_project from sqlmesh.core.config import Config from sqlmesh.core.engine_adapter import BigQueryEngineAdapter -from sqlmesh.core.engine_adapter.bigquery import _CLUSTERING_META_KEY +from sqlmesh.core.engine_adapter.mixins import ( + TableAlterDropClusterKeyOperation, + TableAlterChangeClusterKeyOperation, +) from sqlmesh.core.engine_adapter.shared import DataObject import sqlmesh.core.dialect as d from sqlmesh.core.model import SqlModel, load_sql_based_model @@ -68,41 +71,43 @@ def test_get_alter_expressions_includes_clustering( assert clustered_differently_table_metadata.clustering_key == "(c1,c2)" assert normal_table_metadata.clustering_key is None - assert len(engine_adapter.get_alter_expressions(normal_table, normal_table)) == 0 - assert len(engine_adapter.get_alter_expressions(clustered_table, clustered_table)) == 0 + assert len(engine_adapter.get_alter_operations(normal_table, normal_table)) == 0 + assert len(engine_adapter.get_alter_operations(clustered_table, clustered_table)) == 0 # alter table drop clustered - clustered_to_normal = engine_adapter.get_alter_expressions(clustered_table, normal_table) + clustered_to_normal = engine_adapter.get_alter_operations(clustered_table, normal_table) assert len(clustered_to_normal) == 1 - assert clustered_to_normal[0].meta[_CLUSTERING_META_KEY] == (clustered_table, None) + assert isinstance(clustered_to_normal[0], TableAlterDropClusterKeyOperation) + assert clustered_to_normal[0].target_table == clustered_table + assert not hasattr(clustered_to_normal[0], "clustering_key") # alter table add clustered - normal_to_clustered = engine_adapter.get_alter_expressions(normal_table, clustered_table) + normal_to_clustered = engine_adapter.get_alter_operations(normal_table, clustered_table) assert len(normal_to_clustered) == 1 - assert normal_to_clustered[0].meta[_CLUSTERING_META_KEY] == ( - normal_table, - [exp.to_column("c1")], - ) + operation = normal_to_clustered[0] + assert isinstance(operation, TableAlterChangeClusterKeyOperation) + assert operation.target_table == normal_table + assert operation.clustering_key == "(c1)" # alter table change clustering (c1 -> (c1, c2)) - clustered_to_clustered_differently = engine_adapter.get_alter_expressions( + clustered_to_clustered_differently = engine_adapter.get_alter_operations( clustered_table, clustered_differently_table ) assert len(clustered_to_clustered_differently) == 1 - assert clustered_to_clustered_differently[0].meta[_CLUSTERING_META_KEY] == ( - clustered_table, - [exp.to_column("c1"), exp.to_column("c2")], - ) + operation = clustered_to_clustered_differently[0] + assert isinstance(operation, TableAlterChangeClusterKeyOperation) + assert operation.target_table == clustered_table + assert operation.clustering_key == "(c1,c2)" # alter table change clustering ((c1, c2) -> c1) - clustered_differently_to_clustered = engine_adapter.get_alter_expressions( + clustered_differently_to_clustered = engine_adapter.get_alter_operations( clustered_differently_table, clustered_table ) assert len(clustered_differently_to_clustered) == 1 - assert clustered_differently_to_clustered[0].meta[_CLUSTERING_META_KEY] == ( - clustered_differently_table, - [exp.to_column("c1")], - ) + operation = clustered_differently_to_clustered[0] + assert isinstance(operation, TableAlterChangeClusterKeyOperation) + assert operation.target_table == clustered_differently_table + assert operation.clustering_key == "(c1)" def test_mutating_clustered_by_forward_only( diff --git a/tests/core/engine_adapter/integration/test_integration_snowflake.py b/tests/core/engine_adapter/integration/test_integration_snowflake.py index 12e45f1f14..01cbe1c0aa 100644 --- a/tests/core/engine_adapter/integration/test_integration_snowflake.py +++ b/tests/core/engine_adapter/integration/test_integration_snowflake.py @@ -52,42 +52,42 @@ def test_get_alter_expressions_includes_clustering( ) engine_adapter.execute(f"CREATE TABLE {normal_table} (c1 int, c2 timestamp)") - assert len(engine_adapter.get_alter_expressions(normal_table, normal_table)) == 0 - assert len(engine_adapter.get_alter_expressions(clustered_table, clustered_table)) == 0 + assert len(engine_adapter.get_alter_operations(normal_table, normal_table)) == 0 + assert len(engine_adapter.get_alter_operations(clustered_table, clustered_table)) == 0 # alter table drop clustered - clustered_to_normal = engine_adapter.get_alter_expressions(clustered_table, normal_table) + clustered_to_normal = engine_adapter.get_alter_operations(clustered_table, normal_table) assert len(clustered_to_normal) == 1 assert ( - clustered_to_normal[0].sql(dialect=ctx.dialect) + clustered_to_normal[0].expression.sql(dialect=ctx.dialect) == f"ALTER TABLE {clustered_table} DROP CLUSTERING KEY" ) # alter table add clustered - normal_to_clustered = engine_adapter.get_alter_expressions(normal_table, clustered_table) + normal_to_clustered = engine_adapter.get_alter_operations(normal_table, clustered_table) assert len(normal_to_clustered) == 1 assert ( - normal_to_clustered[0].sql(dialect=ctx.dialect) + normal_to_clustered[0].expression.sql(dialect=ctx.dialect) == f"ALTER TABLE {normal_table} CLUSTER BY (c1)" ) # alter table change clustering - clustered_to_clustered_differently = engine_adapter.get_alter_expressions( + clustered_to_clustered_differently = engine_adapter.get_alter_operations( clustered_table, clustered_differently_table ) assert len(clustered_to_clustered_differently) == 1 assert ( - clustered_to_clustered_differently[0].sql(dialect=ctx.dialect) + clustered_to_clustered_differently[0].expression.sql(dialect=ctx.dialect) == f"ALTER TABLE {clustered_table} CLUSTER BY (c1, TO_DATE(c2))" ) # alter table change clustering - clustered_differently_to_clustered = engine_adapter.get_alter_expressions( + clustered_differently_to_clustered = engine_adapter.get_alter_operations( clustered_differently_table, clustered_table ) assert len(clustered_differently_to_clustered) == 1 assert ( - clustered_differently_to_clustered[0].sql(dialect=ctx.dialect) + clustered_differently_to_clustered[0].expression.sql(dialect=ctx.dialect) == f"ALTER TABLE {clustered_differently_table} CLUSTER BY (c1)" ) diff --git a/tests/core/engine_adapter/test_base.py b/tests/core/engine_adapter/test_base.py index 02029ca6f8..3b25091e10 100644 --- a/tests/core/engine_adapter/test_base.py +++ b/tests/core/engine_adapter/test_base.py @@ -21,6 +21,9 @@ from sqlmesh.utils.errors import SQLMeshError, UnsupportedCatalogOperationError from tests.core.engine_adapter import to_sql_calls +if t.TYPE_CHECKING: + pass + pytestmark = pytest.mark.engine @@ -81,10 +84,10 @@ def test_create_view_pandas_source_columns(make_mocked_engine_adapter: t.Callabl bigint_dtype = exp.DataType.build("BIGINT") adapter.create_view( "test_view", - pd.DataFrame({"a": [1, 2, 3]}), + pd.DataFrame({"a": [1, 2, 3], "ignored_source": [4, 5, 6]}), target_columns_to_types={"a": bigint_dtype, "b": bigint_dtype}, replace=False, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ @@ -96,16 +99,16 @@ def test_create_view_query_source_columns(make_mocked_engine_adapter: t.Callable adapter = make_mocked_engine_adapter(EngineAdapter) adapter.create_view( "test_view", - parse_one("SELECT a FROM tbl"), + parse_one("SELECT a, ignored_source FROM tbl"), target_columns_to_types={ "a": exp.DataType.build("BIGINT"), "b": exp.DataType.build("BIGINT"), }, replace=False, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ - 'CREATE VIEW "test_view" ("a", "b") AS SELECT "a", CAST(NULL AS BIGINT) AS "b" FROM (SELECT "a" FROM "tbl") AS "select_source_columns"', + 'CREATE VIEW "test_view" ("a", "b") AS SELECT "a", CAST(NULL AS BIGINT) AS "b" FROM (SELECT "a", "ignored_source" FROM "tbl") AS "select_source_columns"', ] @@ -318,7 +321,7 @@ def test_insert_overwrite_by_time_partition_supports_insert_overwrite_pandas_sou ): adapter = make_mocked_engine_adapter(EngineAdapter) adapter.INSERT_OVERWRITE_STRATEGY = InsertOverwriteStrategy.INSERT_OVERWRITE - df = pd.DataFrame({"a": [1, 2]}) + df = pd.DataFrame({"a": [1, 2], "ignored_source": [3, 4]}) adapter.insert_overwrite_by_time_partition( "test_table", df, @@ -330,7 +333,7 @@ def test_insert_overwrite_by_time_partition_supports_insert_overwrite_pandas_sou "a": exp.DataType.build("INT"), "ds": exp.DataType.build("STRING"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ """INSERT OVERWRITE TABLE "test_table" ("a", "ds") SELECT "a", "ds" FROM (SELECT CAST("a" AS INT) AS "a", CAST(NULL AS TEXT) AS "ds" FROM (VALUES (1), (2)) AS "t"("a")) AS "_subquery" WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02'""" @@ -344,7 +347,7 @@ def test_insert_overwrite_by_time_partition_supports_insert_overwrite_query_sour adapter.INSERT_OVERWRITE_STRATEGY = InsertOverwriteStrategy.INSERT_OVERWRITE adapter.insert_overwrite_by_time_partition( "test_table", - parse_one("SELECT a FROM tbl"), + parse_one("SELECT a, ignored_source FROM tbl"), start="2022-01-01", end="2022-01-02", time_column="ds", @@ -353,10 +356,10 @@ def test_insert_overwrite_by_time_partition_supports_insert_overwrite_query_sour "a": exp.DataType.build("INT"), "ds": exp.DataType.build("STRING"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ - """INSERT OVERWRITE TABLE "test_table" ("a", "ds") SELECT "a", "ds" FROM (SELECT "a", CAST(NULL AS TEXT) AS "ds" FROM (SELECT "a" FROM "tbl") AS "select_source_columns") AS "_subquery" WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02'""" + """INSERT OVERWRITE TABLE "test_table" ("a", "ds") SELECT "a", "ds" FROM (SELECT "a", CAST(NULL AS TEXT) AS "ds" FROM (SELECT "a", "ignored_source" FROM "tbl") AS "select_source_columns") AS "_subquery" WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02'""" ] @@ -410,7 +413,7 @@ def test_insert_overwrite_by_time_partition_replace_where_pandas_source_columns( ): adapter = make_mocked_engine_adapter(EngineAdapter) adapter.INSERT_OVERWRITE_STRATEGY = InsertOverwriteStrategy.REPLACE_WHERE - df = pd.DataFrame({"a": [1, 2]}) + df = pd.DataFrame({"a": [1, 2], "ignored_source": [3, 4]}) adapter.insert_overwrite_by_time_partition( "test_table", df, @@ -422,7 +425,7 @@ def test_insert_overwrite_by_time_partition_replace_where_pandas_source_columns( "a": exp.DataType.build("INT"), "ds": exp.DataType.build("STRING"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ """INSERT INTO "test_table" REPLACE WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02' SELECT "a", "ds" FROM (SELECT CAST("a" AS INT) AS "a", CAST(NULL AS TEXT) AS "ds" FROM (VALUES (1), (2)) AS "t"("a")) AS "_subquery" WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02'""" @@ -436,7 +439,7 @@ def test_insert_overwrite_by_time_partition_replace_where_query_source_columns( adapter.INSERT_OVERWRITE_STRATEGY = InsertOverwriteStrategy.REPLACE_WHERE adapter.insert_overwrite_by_time_partition( "test_table", - parse_one("SELECT a FROM tbl"), + parse_one("SELECT a, ignored_source FROM tbl"), start="2022-01-01", end="2022-01-02", time_column="ds", @@ -445,10 +448,10 @@ def test_insert_overwrite_by_time_partition_replace_where_query_source_columns( "a": exp.DataType.build("INT"), "ds": exp.DataType.build("STRING"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ - """INSERT INTO "test_table" REPLACE WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02' SELECT "a", "ds" FROM (SELECT "a", CAST(NULL AS TEXT) AS "ds" FROM (SELECT "a" FROM "tbl") AS "select_source_columns") AS "_subquery" WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02'""" + """INSERT INTO "test_table" REPLACE WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02' SELECT "a", "ds" FROM (SELECT "a", CAST(NULL AS TEXT) AS "ds" FROM (SELECT "a", "ignored_source" FROM "tbl") AS "select_source_columns") AS "_subquery" WHERE "ds" BETWEEN '2022-01-01' AND '2022-01-02'""" ] @@ -572,7 +575,7 @@ def test_insert_append_pandas_batches(make_mocked_engine_adapter: t.Callable): def test_insert_append_pandas_source_columns(make_mocked_engine_adapter: t.Callable): adapter = make_mocked_engine_adapter(EngineAdapter) - df = pd.DataFrame({"a": [1, 2, 3]}) + df = pd.DataFrame({"a": [1, 2, 3], "ignored_source": [4, 5, 6]}) adapter.insert_append( "test_table", df, @@ -580,7 +583,7 @@ def test_insert_append_pandas_source_columns(make_mocked_engine_adapter: t.Calla "a": exp.DataType.build("INT"), "b": exp.DataType.build("INT"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ 'INSERT INTO "test_table" ("a", "b") SELECT CAST("a" AS INT) AS "a", CAST(NULL AS INT) AS "b" FROM (VALUES (1), (2), (3)) AS "t"("a")', @@ -591,15 +594,15 @@ def test_insert_append_query_source_columns(make_mocked_engine_adapter: t.Callab adapter = make_mocked_engine_adapter(EngineAdapter) adapter.insert_append( "test_table", - parse_one("SELECT a FROM tbl"), + parse_one("SELECT a, ignored_source FROM tbl"), target_columns_to_types={ "a": exp.DataType.build("INT"), "b": exp.DataType.build("INT"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ - 'INSERT INTO "test_table" ("a", "b") SELECT "a", CAST(NULL AS INT) AS "b" FROM (SELECT "a" FROM "tbl") AS "select_source_columns"', + 'INSERT INTO "test_table" ("a", "b") SELECT "a", CAST(NULL AS INT) AS "b" FROM (SELECT "a", "ignored_source" FROM "tbl") AS "select_source_columns"', ] @@ -1067,12 +1070,8 @@ def test_alter_table( adapter.SCHEMA_DIFFER = SchemaDiffer(**schema_differ_config) original_from_structs = adapter.SCHEMA_DIFFER._from_structs - def _from_structs( - current_struct: exp.DataType, new_struct: exp.DataType, *, ignore_destructive: bool = False - ) -> t.List[TableAlterOperation]: - operations = original_from_structs( - current_struct, new_struct, ignore_destructive=ignore_destructive - ) + def _from_structs(*args, **kwargs) -> t.List[TableAlterOperation]: + operations = original_from_structs(*args, **kwargs) if not operations: return operations assert ( @@ -1093,7 +1092,7 @@ def table_columns(table_name: str) -> t.Dict[str, exp.DataType]: adapter.columns = table_columns - adapter.alter_table(adapter.get_alter_expressions(current_table_name, target_table_name)) + adapter.alter_table(adapter.get_alter_operations(current_table_name, target_table_name)) adapter.cursor.begin.assert_called_once() adapter.cursor.commit.assert_called_once() @@ -1191,7 +1190,7 @@ def test_merge_upsert_pandas(make_mocked_engine_adapter: t.Callable): def test_merge_upsert_pandas_source_columns(make_mocked_engine_adapter: t.Callable): adapter = make_mocked_engine_adapter(EngineAdapter) - df = pd.DataFrame({"id": [1, 2, 3], "ts": [4, 5, 6]}) + df = pd.DataFrame({"id": [1, 2, 3], "ts": [4, 5, 6], "ignored_source": [7, 8, 9]}) adapter.merge( target_table="target", source_table=df, @@ -1201,7 +1200,7 @@ def test_merge_upsert_pandas_source_columns(make_mocked_engine_adapter: t.Callab "val": exp.DataType.build("int"), }, unique_key=[exp.to_identifier("id")], - source_columns=["id", "ts"], + source_columns=["id", "ignored_source", "ts"], ) adapter.cursor.execute.assert_called_once_with( 'MERGE INTO "target" AS "__MERGE_TARGET__" USING (SELECT CAST("id" AS INT) AS "id", CAST("ts" AS TIMESTAMP) AS "ts", CAST(NULL AS INT) AS "val" FROM (VALUES (1, 4), (2, 5), (3, 6)) AS "t"("id", "ts")) AS "__MERGE_SOURCE__" ON "__MERGE_TARGET__"."id" = "__MERGE_SOURCE__"."id" ' @@ -1214,17 +1213,17 @@ def test_merge_upsert_query_source_columns(make_mocked_engine_adapter: t.Callabl adapter = make_mocked_engine_adapter(EngineAdapter) adapter.merge( target_table="target", - source_table=parse_one("SELECT id, ts FROM source"), + source_table=parse_one("SELECT id, ts, ignored_source FROM source"), target_columns_to_types={ "id": exp.DataType.build("int"), "ts": exp.DataType.build("timestamp"), "val": exp.DataType.build("int"), }, unique_key=[exp.to_identifier("id")], - source_columns=["id", "ts"], + source_columns=["id", "ts", "ignored_source"], ) adapter.cursor.execute.assert_called_once_with( - 'MERGE INTO "target" AS "__MERGE_TARGET__" USING (SELECT "id", "ts", CAST(NULL AS INT) AS "val" FROM (SELECT "id", "ts" FROM "source") AS "select_source_columns") AS "__MERGE_SOURCE__" ON "__MERGE_TARGET__"."id" = "__MERGE_SOURCE__"."id" ' + 'MERGE INTO "target" AS "__MERGE_TARGET__" USING (SELECT "id", "ts", CAST(NULL AS INT) AS "val" FROM (SELECT "id", "ts", "ignored_source" FROM "source") AS "select_source_columns") AS "__MERGE_SOURCE__" ON "__MERGE_TARGET__"."id" = "__MERGE_SOURCE__"."id" ' 'WHEN MATCHED THEN UPDATE SET "__MERGE_TARGET__"."id" = "__MERGE_SOURCE__"."id", "__MERGE_TARGET__"."ts" = "__MERGE_SOURCE__"."ts", "__MERGE_TARGET__"."val" = "__MERGE_SOURCE__"."val" ' 'WHEN NOT MATCHED THEN INSERT ("id", "ts", "val") VALUES ("__MERGE_SOURCE__"."id", "__MERGE_SOURCE__"."ts", "__MERGE_SOURCE__"."val")' ) @@ -1639,6 +1638,7 @@ def test_scd_type_2_by_time_source_columns(make_mocked_engine_adapter: t.Callabl "2020-01-02 15:00:00", "2020-01-03 12:00:00", ], + "ignored_source": [4, 5, 6], } ) adapter.scd_type_2_by_time( @@ -1656,7 +1656,7 @@ def test_scd_type_2_by_time_source_columns(make_mocked_engine_adapter: t.Callabl "test_valid_from": exp.DataType.build("TIMESTAMP"), "test_valid_to": exp.DataType.build("TIMESTAMP"), }, - source_columns=["id", "name", "test_UPDATED_at"], + source_columns=["id", "name", "test_UPDATED_at", "ignored_source"], execution_time=datetime(2020, 1, 1, 0, 0, 0), start=datetime(2020, 1, 1, 0, 0, 0), is_restatement=True, @@ -3201,7 +3201,7 @@ def test_replace_query_pandas(make_mocked_engine_adapter: t.Callable): def test_replace_query_pandas_source_columns(make_mocked_engine_adapter: t.Callable): adapter = make_mocked_engine_adapter(EngineAdapter) - df = pd.DataFrame({"a": [1, 2, 3]}) + df = pd.DataFrame({"a": [1, 2, 3], "ignored_source": [4, 5, 6]}) adapter.replace_query( "test_table", df, @@ -3209,7 +3209,7 @@ def test_replace_query_pandas_source_columns(make_mocked_engine_adapter: t.Calla "a": exp.DataType.build("INT"), "b": exp.DataType.build("INT"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ 'CREATE OR REPLACE TABLE "test_table" AS SELECT CAST("a" AS INT) AS "a", CAST("b" AS INT) AS "b" FROM (SELECT CAST("a" AS INT) AS "a", CAST(NULL AS INT) AS "b" FROM (VALUES (1), (2), (3)) AS "t"("a")) AS "_subquery"', @@ -3220,15 +3220,15 @@ def test_replace_query_query_source_columns(make_mocked_engine_adapter: t.Callab adapter = make_mocked_engine_adapter(EngineAdapter) adapter.replace_query( "test_table", - parse_one("SELECT a FROM tbl"), + parse_one("SELECT a, ignored_source FROM tbl"), target_columns_to_types={ "a": exp.DataType.build("INT"), "b": exp.DataType.build("INT"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ - 'CREATE OR REPLACE TABLE "test_table" AS SELECT CAST("a" AS INT) AS "a", CAST("b" AS INT) AS "b" FROM (SELECT "a", CAST(NULL AS INT) AS "b" FROM (SELECT "a" FROM "tbl") AS "select_source_columns") AS "_subquery"', + 'CREATE OR REPLACE TABLE "test_table" AS SELECT CAST("a" AS INT) AS "a", CAST("b" AS INT) AS "b" FROM (SELECT "a", CAST(NULL AS INT) AS "b" FROM (SELECT "a", "ignored_source" FROM "tbl") AS "select_source_columns") AS "_subquery"', ] @@ -3382,7 +3382,7 @@ def test_ctas_pandas(make_mocked_engine_adapter: t.Callable): def test_ctas_pandas_source_columns(make_mocked_engine_adapter: t.Callable): adapter = make_mocked_engine_adapter(EngineAdapter) - df = pd.DataFrame({"a": [1, 2, 3]}) + df = pd.DataFrame({"a": [1, 2, 3], "ignored_source": [4, 5, 6]}) adapter.ctas( "test_table", df, @@ -3390,7 +3390,7 @@ def test_ctas_pandas_source_columns(make_mocked_engine_adapter: t.Callable): "a": exp.DataType.build("INT"), "b": exp.DataType.build("INT"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ 'CREATE TABLE IF NOT EXISTS "test_table" AS SELECT CAST("a" AS INT) AS "a", CAST("b" AS INT) AS "b" FROM (SELECT CAST("a" AS INT) AS "a", CAST(NULL AS INT) AS "b" FROM (VALUES (1), (2), (3)) AS "t"("a")) AS "_subquery"', @@ -3401,15 +3401,15 @@ def test_ctas_query_source_columns(make_mocked_engine_adapter: t.Callable): adapter = make_mocked_engine_adapter(EngineAdapter) adapter.ctas( "test_table", - parse_one("SELECT a FROM tbl"), + parse_one("SELECT a, ignored_source FROM tbl"), target_columns_to_types={ "a": exp.DataType.build("INT"), "b": exp.DataType.build("INT"), }, - source_columns=["a"], + source_columns=["a", "ignored_source"], ) assert to_sql_calls(adapter) == [ - 'CREATE TABLE IF NOT EXISTS "test_table" AS SELECT CAST("a" AS INT) AS "a", CAST("b" AS INT) AS "b" FROM (SELECT "a", CAST(NULL AS INT) AS "b" FROM (SELECT "a" FROM "tbl") AS "select_source_columns") AS "_subquery"', + 'CREATE TABLE IF NOT EXISTS "test_table" AS SELECT CAST("a" AS INT) AS "a", CAST("b" AS INT) AS "b" FROM (SELECT "a", CAST(NULL AS INT) AS "b" FROM (SELECT "a", "ignored_source" FROM "tbl") AS "select_source_columns") AS "_subquery"', ] diff --git a/tests/core/engine_adapter/test_bigquery.py b/tests/core/engine_adapter/test_bigquery.py index f5a287defb..25aa4006b5 100644 --- a/tests/core/engine_adapter/test_bigquery.py +++ b/tests/core/engine_adapter/test_bigquery.py @@ -1053,7 +1053,7 @@ def test_get_alter_expressions_includes_catalog( ) get_data_objects_mock.return_value = [] - adapter.get_alter_expressions("catalog1.foo.bar", "catalog2.bar.bing") + adapter.get_alter_operations("catalog1.foo.bar", "catalog2.bar.bing") assert get_data_objects_mock.call_count == 2 diff --git a/tests/core/engine_adapter/test_clickhouse.py b/tests/core/engine_adapter/test_clickhouse.py index 3e92a8fe9b..b75609e759 100644 --- a/tests/core/engine_adapter/test_clickhouse.py +++ b/tests/core/engine_adapter/test_clickhouse.py @@ -172,7 +172,7 @@ def table_columns(table_name: str) -> t.Dict[str, exp.DataType]: adapter.columns = table_columns # type: ignore # ON CLUSTER not added because engine_run_mode.is_cluster=False - adapter.alter_table(adapter.get_alter_expressions(current_table_name, target_table_name)) + adapter.alter_table(adapter.get_alter_operations(current_table_name, target_table_name)) mocker.patch.object( ClickhouseEngineAdapter, @@ -185,7 +185,7 @@ def table_columns(table_name: str) -> t.Dict[str, exp.DataType]: new_callable=mocker.PropertyMock(return_value=EngineRunMode.CLUSTER), ) - adapter.alter_table(adapter.get_alter_expressions(current_table_name, target_table_name)) + adapter.alter_table(adapter.get_alter_operations(current_table_name, target_table_name)) assert to_sql_calls(adapter) == [ 'ALTER TABLE "test_table" DROP COLUMN "c"', diff --git a/tests/core/engine_adapter/test_postgres.py b/tests/core/engine_adapter/test_postgres.py index 5d05dd653c..6134126a41 100644 --- a/tests/core/engine_adapter/test_postgres.py +++ b/tests/core/engine_adapter/test_postgres.py @@ -157,7 +157,7 @@ def table_columns(table_name: str) -> t.Dict[str, exp.DataType]: adapter.columns = table_columns - adapter.alter_table(adapter.get_alter_expressions(current_table_name, target_table_name)) + adapter.alter_table(adapter.get_alter_operations(current_table_name, target_table_name)) assert to_sql_calls(adapter) == [ 'ALTER TABLE "test_table" DROP COLUMN "test_column" CASCADE', ] diff --git a/tests/core/engine_adapter/test_redshift.py b/tests/core/engine_adapter/test_redshift.py index 17c3dd1866..c5e3dfff17 100644 --- a/tests/core/engine_adapter/test_redshift.py +++ b/tests/core/engine_adapter/test_redshift.py @@ -365,7 +365,7 @@ def table_columns(table_name: str) -> t.Dict[str, exp.DataType]: adapter.columns = table_columns - adapter.alter_table(adapter.get_alter_expressions(current_table_name, target_table_name)) + adapter.alter_table(adapter.get_alter_operations(current_table_name, target_table_name)) assert to_sql_calls(adapter) == [ 'ALTER TABLE "test_table" DROP COLUMN "test_column" CASCADE', ] @@ -388,7 +388,7 @@ def table_columns(table_name: str) -> t.Dict[str, exp.DataType]: adapter.columns = table_columns - adapter.alter_table(adapter.get_alter_expressions(current_table_name, target_table_name)) + adapter.alter_table(adapter.get_alter_operations(current_table_name, target_table_name)) assert to_sql_calls(adapter) == [ 'ALTER TABLE "test_table" ALTER COLUMN "test_column" TYPE VARCHAR(20)', ] @@ -411,7 +411,7 @@ def table_columns(table_name: str) -> t.Dict[str, exp.DataType]: adapter.columns = table_columns - adapter.alter_table(adapter.get_alter_expressions(current_table_name, target_table_name)) + adapter.alter_table(adapter.get_alter_operations(current_table_name, target_table_name)) assert to_sql_calls(adapter) == [ 'ALTER TABLE "test_table" DROP COLUMN "test_column" CASCADE', 'ALTER TABLE "test_table" ADD COLUMN "test_column" DECIMAL(25, 10)', diff --git a/tests/core/engine_adapter/test_spark.py b/tests/core/engine_adapter/test_spark.py index 55a925b995..2e4f6ae2a0 100644 --- a/tests/core/engine_adapter/test_spark.py +++ b/tests/core/engine_adapter/test_spark.py @@ -162,7 +162,7 @@ def table_columns(table_name: str) -> t.Dict[str, exp.DataType]: adapter.columns = table_columns - adapter.alter_table(adapter.get_alter_expressions(current_table_name, target_table_name)) + adapter.alter_table(adapter.get_alter_operations(current_table_name, target_table_name)) adapter.cursor.execute.assert_has_calls( [ diff --git a/tests/core/test_integration.py b/tests/core/test_integration.py index fc129424f4..15b978b87e 100644 --- a/tests/core/test_integration.py +++ b/tests/core/test_integration.py @@ -7880,6 +7880,215 @@ def test_incremental_by_time_model_ignore_destructive_change(tmp_path: Path): context.close() +def test_incremental_by_time_model_ignore_additive_change(tmp_path: Path): + models_dir = tmp_path / "models" + models_dir.mkdir() + data_dir = tmp_path / "data" + data_dir.mkdir() + data_filepath = data_dir / "test.duckdb" + + config = Config( + model_defaults=ModelDefaultsConfig(dialect="duckdb"), + default_connection=DuckDBConnectionConfig(database=str(data_filepath)), + ) + + # Initial model with 3 columns + initial_model = f""" + MODEL ( + name test_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column ds, + forward_only true, + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 'test_name' as name, + 'other' as other_column, + @start_ds as ds + FROM + source_table; + """ + + # Write initial model + (models_dir / "test_model.sql").write_text(initial_model) + + with time_machine.travel("2023-01-08 00:00:00 UTC"): + # Create context and apply initial model + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute("CREATE TABLE source_table (source_id INT)") + context.engine_adapter.execute("INSERT INTO source_table VALUES (1)") + + # Apply initial plan and load data + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify initial data was loaded + initial_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(initial_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in initial_df.columns + assert "name" in initial_df.columns + assert "ds" in initial_df.columns + + context.close() + + # remove `name` column and add new column to the source table + initial_model = """ + MODEL ( + name test_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column ds, + forward_only true, + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 'other' as other_column, + @start_ds as ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(initial_model) + + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute("ALTER TABLE source_table ADD COLUMN new_column INT") + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + + assert len(updated_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is removed since destructive is allowed + assert "name" not in updated_df.columns + # new_column is not added since additive is ignored + assert "new_column" not in updated_df.columns + + context.close() + + with time_machine.travel("2023-01-10 00:00:00 UTC"): + context = Context(paths=[tmp_path], config=config) + context.run() + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 2 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not still in table since destructive was applied + assert "name" not in updated_df.columns + # new_column is still not added since additive is ignored + assert "new_column" not in updated_df.columns + + with time_machine.travel("2023-01-11 00:00:00 UTC"): + updated_model = """ + MODEL ( + name test_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column ds, + forward_only true, + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + CAST(1 AS STRING) as id, + 'other' as other_column, + @start_ds as ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(updated_model) + + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.SCHEMA_DIFFER.compatible_types = { + exp.DataType.build("INT"): {exp.DataType.build("STRING")} + } + context.plan("prod", auto_apply=True, no_prompts=True, run=True) + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + + assert len(updated_df) == 3 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not still in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is still not added since additive is ignored + assert "new_column" not in updated_df.columns + # The additive change was ignored since we set the change as compatible therefore + # instead of getting strings in the result we still return ints + assert updated_df["id"].tolist() == [1, 1, 1] + + with time_machine.travel("2023-01-12 00:00:00 UTC"): + updated_model = """ + MODEL ( + name test_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column ds, + forward_only true, + on_destructive_change allow, + on_additive_change allow + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + CAST(1 AS STRING) as id, + 'other' as other_column, + @start_ds as ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(updated_model) + + context = Context(paths=[tmp_path], config=config) + # Make the change compatible since that means we will attempt and alter now that is considered additive + context.engine_adapter.SCHEMA_DIFFER.compatible_types = { + exp.DataType.build("INT"): {exp.DataType.build("STRING")} + } + context.plan("prod", auto_apply=True, no_prompts=True, run=True) + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + + assert len(updated_df) == 4 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not still in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is now added since it is additive is now allowed + assert "new_column" in updated_df.columns + # The change is now reflected since an additive alter could be performed + assert updated_df["id"].dropna().tolist() == ["1", "1", "1", "1"] + + context.close() + + def test_incremental_by_unique_key_model_ignore_destructive_change(tmp_path: Path): models_dir = tmp_path / "models" models_dir.mkdir() @@ -7993,7 +8202,7 @@ def test_incremental_by_unique_key_model_ignore_destructive_change(tmp_path: Pat context.close() -def test_incremental_unmanaged_model_ignore_destructive_change(tmp_path: Path): +def test_incremental_by_unique_key_model_ignore_additive_change(tmp_path: Path): models_dir = tmp_path / "models" models_dir.mkdir() data_dir = tmp_path / "data" @@ -8009,8 +8218,11 @@ def test_incremental_unmanaged_model_ignore_destructive_change(tmp_path: Path): initial_model = f""" MODEL ( name test_model, - kind INCREMENTAL_UNMANAGED( - on_destructive_change ignore + kind INCREMENTAL_BY_UNIQUE_KEY ( + unique_key id, + forward_only true, + on_destructive_change allow, + on_additive_change ignore ), start '2023-01-01', cron '@daily' @@ -8051,15 +8263,18 @@ def test_incremental_unmanaged_model_ignore_destructive_change(tmp_path: Path): initial_model = """ MODEL ( name test_model, - kind INCREMENTAL_UNMANAGED( - on_destructive_change ignore + kind INCREMENTAL_BY_UNIQUE_KEY ( + unique_key id, + forward_only true, + on_destructive_change allow, + on_additive_change ignore ), start '2023-01-01', cron '@daily' ); - SELECT - *, + SELECT + *, 2 as id, 3 as new_column, @start_ds as ds @@ -8075,6 +8290,567 @@ def test_incremental_unmanaged_model_ignore_destructive_change(tmp_path: Path): # The existing data should still be there and new data should be loaded updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns + + context.close() + + with time_machine.travel("2023-01-10 00:00:00 UTC"): + context = Context(paths=[tmp_path], config=config) + context.run() + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 2 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is still not in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns + + context.close() + + +def test_incremental_unmanaged_model_ignore_destructive_change(tmp_path: Path): + models_dir = tmp_path / "models" + models_dir.mkdir() + data_dir = tmp_path / "data" + data_dir.mkdir() + data_filepath = data_dir / "test.duckdb" + + config = Config( + model_defaults=ModelDefaultsConfig(dialect="duckdb"), + default_connection=DuckDBConnectionConfig(database=str(data_filepath)), + ) + + # Initial model with 3 columns + initial_model = f""" + MODEL ( + name test_model, + kind INCREMENTAL_UNMANAGED( + on_destructive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 'test_name' as name, + @start_ds as ds + FROM + source_table; + """ + + # Write initial model + (models_dir / "test_model.sql").write_text(initial_model) + + with time_machine.travel("2023-01-08 00:00:00 UTC"): + # Create context and apply initial model + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute("CREATE TABLE source_table (source_id INT)") + context.engine_adapter.execute("INSERT INTO source_table VALUES (1)") + + # Apply initial plan and load data + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify initial data was loaded + initial_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(initial_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in initial_df.columns + assert "name" in initial_df.columns + assert "ds" in initial_df.columns + + context.close() + + # remove `name` column and add new column + initial_model = """ + MODEL ( + name test_model, + kind INCREMENTAL_UNMANAGED( + on_destructive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 2 as id, + 3 as new_column, + @start_ds as ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(initial_model) + + context = Context(paths=[tmp_path], config=config) + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + + assert len(updated_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is still in table since destructive was ignored + assert "name" in updated_df.columns + # new_column is added since it is additive and allowed + assert "new_column" in updated_df.columns + + context.close() + + with time_machine.travel("2023-01-10 00:00:00 UTC"): + context = Context(paths=[tmp_path], config=config) + context.run() + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 2 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is still in table since destructive was ignored + assert "name" in updated_df.columns + # new_column is added since it is additive and allowed + assert "new_column" in updated_df.columns + + context.close() + + +def test_incremental_unmanaged_model_ignore_additive_change(tmp_path: Path): + models_dir = tmp_path / "models" + models_dir.mkdir() + data_dir = tmp_path / "data" + data_dir.mkdir() + data_filepath = data_dir / "test.duckdb" + + config = Config( + model_defaults=ModelDefaultsConfig(dialect="duckdb"), + default_connection=DuckDBConnectionConfig(database=str(data_filepath)), + ) + + # Initial model with 3 columns + initial_model = f""" + MODEL ( + name test_model, + kind INCREMENTAL_UNMANAGED( + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 'test_name' as name, + @start_ds as ds + FROM + source_table; + """ + + # Write initial model + (models_dir / "test_model.sql").write_text(initial_model) + + with time_machine.travel("2023-01-08 00:00:00 UTC"): + # Create context and apply initial model + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute("CREATE TABLE source_table (source_id INT)") + context.engine_adapter.execute("INSERT INTO source_table VALUES (1)") + + # Apply initial plan and load data + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify initial data was loaded + initial_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(initial_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in initial_df.columns + assert "name" in initial_df.columns + assert "ds" in initial_df.columns + + context.close() + + # remove `name` column and add new column + initial_model = """ + MODEL ( + name test_model, + kind INCREMENTAL_UNMANAGED( + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 2 as id, + 3 as new_column, + @start_ds as ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(initial_model) + + context = Context(paths=[tmp_path], config=config) + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + + assert len(updated_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns + + context.close() + + with time_machine.travel("2023-01-10 00:00:00 UTC"): + context = Context(paths=[tmp_path], config=config) + context.run() + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 2 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not still in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns + + context.close() + + +def test_scd_type_2_by_time_ignore_destructive_change(tmp_path: Path): + models_dir = tmp_path / "models" + models_dir.mkdir() + data_dir = tmp_path / "data" + data_dir.mkdir() + data_filepath = data_dir / "test.duckdb" + + config = Config( + model_defaults=ModelDefaultsConfig(dialect="duckdb"), + default_connection=DuckDBConnectionConfig(database=str(data_filepath)), + ) + + # Initial model with 3 columns + initial_model = f""" + MODEL ( + name test_model, + kind SCD_TYPE_2_BY_TIME ( + unique_key id, + updated_at_name ds, + on_destructive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 'test_name' as name, + @start_dt as ds + FROM + source_table; + """ + + # Write initial model + (models_dir / "test_model.sql").write_text(initial_model) + + with time_machine.travel("2023-01-08 00:00:00 UTC"): + # Create context and apply initial model + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute("CREATE TABLE source_table (source_id INT)") + context.engine_adapter.execute("INSERT INTO source_table VALUES (1)") + + # Apply initial plan and load data + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify initial data was loaded + initial_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(initial_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in initial_df.columns + assert "name" in initial_df.columns + assert "ds" in initial_df.columns + + context.close() + + # remove `name` column and add new column + initial_model = """ + MODEL ( + name test_model, + kind SCD_TYPE_2_BY_TIME ( + unique_key id, + updated_at_name ds, + on_destructive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 3 as new_column, + @start_dt as ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(initial_model) + + context = Context(paths=[tmp_path], config=config) + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + + assert len(updated_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is still in table since destructive was ignored + assert "name" in updated_df.columns + # new_column is added since it is additive and allowed + assert "new_column" in updated_df.columns + + context.close() + + with time_machine.travel("2023-01-10 00:00:00 UTC"): + context = Context(paths=[tmp_path], config=config) + context.run() + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 2 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is still in table since destructive was ignored + assert "name" in updated_df.columns + # new_column is added since it is additive and allowed + assert "new_column" in updated_df.columns + + context.close() + + +def test_scd_type_2_by_time_ignore_additive_change(tmp_path: Path): + models_dir = tmp_path / "models" + models_dir.mkdir() + data_dir = tmp_path / "data" + data_dir.mkdir() + data_filepath = data_dir / "test.duckdb" + + config = Config( + model_defaults=ModelDefaultsConfig(dialect="duckdb"), + default_connection=DuckDBConnectionConfig(database=str(data_filepath)), + ) + + # Initial model with 3 columns + initial_model = f""" + MODEL ( + name test_model, + kind SCD_TYPE_2_BY_TIME ( + unique_key id, + updated_at_name ds, + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 'test_name' as name, + @start_dt as ds + FROM + source_table; + """ + + # Write initial model + (models_dir / "test_model.sql").write_text(initial_model) + + with time_machine.travel("2023-01-08 00:00:00 UTC"): + # Create context and apply initial model + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute("CREATE TABLE source_table (source_id INT)") + context.engine_adapter.execute("INSERT INTO source_table VALUES (1)") + + # Apply initial plan and load data + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify initial data was loaded + initial_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(initial_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in initial_df.columns + assert "name" in initial_df.columns + assert "ds" in initial_df.columns + + context.close() + + # remove `name` column and add new column + initial_model = """ + MODEL ( + name test_model, + kind SCD_TYPE_2_BY_TIME ( + unique_key id, + updated_at_name ds, + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 3 as new_column, + @start_dt as ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(initial_model) + + context = Context(paths=[tmp_path], config=config) + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + + assert len(updated_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not still in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns + + context.close() + + with time_machine.travel("2023-01-10 00:00:00 UTC"): + context = Context(paths=[tmp_path], config=config) + context.run() + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 2 + assert "source_id" in initial_df.columns + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not still in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns + + context.close() + + +def test_scd_type_2_by_column_ignore_destructive_change(tmp_path: Path): + models_dir = tmp_path / "models" + models_dir.mkdir() + data_dir = tmp_path / "data" + data_dir.mkdir() + data_filepath = data_dir / "test.duckdb" + + config = Config( + model_defaults=ModelDefaultsConfig(dialect="duckdb"), + default_connection=DuckDBConnectionConfig(database=str(data_filepath)), + ) + + # Initial model with 3 columns + initial_model = f""" + MODEL ( + name test_model, + kind SCD_TYPE_2_BY_COLUMN ( + unique_key id, + columns [name], + on_destructive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 'test_name' as name, + @start_ds as ds + FROM + source_table; + """ + + # Write initial model + (models_dir / "test_model.sql").write_text(initial_model) + + with time_machine.travel("2023-01-08 00:00:00 UTC"): + # Create context and apply initial model + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute("CREATE TABLE source_table (source_id INT)") + context.engine_adapter.execute("INSERT INTO source_table VALUES (1)") + + # Apply initial plan and load data + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify initial data was loaded + initial_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(initial_df) == 1 + assert "source_id" in initial_df.columns + assert "id" in initial_df.columns + assert "name" in initial_df.columns + assert "ds" in initial_df.columns + + context.close() + + # remove `name` column and add new column + initial_model = """ + MODEL ( + name test_model, + kind SCD_TYPE_2_BY_COLUMN ( + unique_key id, + columns [new_column], + on_destructive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + *, + 1 as id, + 3 as new_column, + @start_ds as ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(initial_model) + + context = Context(paths=[tmp_path], config=config) + context.plan("prod", auto_apply=True, no_prompts=True) + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 1 assert "source_id" in initial_df.columns assert "id" in updated_df.columns @@ -8102,7 +8878,7 @@ def test_incremental_unmanaged_model_ignore_destructive_change(tmp_path: Path): context.close() -def test_scd_type_2_by_time_ignore_destructive_change(tmp_path: Path): +def test_scd_type_2_by_column_ignore_additive_change(tmp_path: Path): models_dir = tmp_path / "models" models_dir.mkdir() data_dir = tmp_path / "data" @@ -8116,25 +8892,27 @@ def test_scd_type_2_by_time_ignore_destructive_change(tmp_path: Path): # Initial model with 3 columns initial_model = f""" - MODEL ( - name test_model, - kind SCD_TYPE_2_BY_TIME ( - unique_key id, - updated_at_name ds, - on_destructive_change ignore - ), - start '2023-01-01', - cron '@daily' - ); + MODEL ( + name test_model, + kind SCD_TYPE_2_BY_COLUMN ( + unique_key id, + columns [stable], + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); - SELECT - *, - 1 as id, - 'test_name' as name, - @start_dt as ds - FROM - source_table; - """ + SELECT + *, + 1 as id, + 'test_name' as name, + 'stable' as stable, + @start_ds as ds + FROM + source_table; + """ # Write initial model (models_dir / "test_model.sql").write_text(initial_model) @@ -8160,25 +8938,27 @@ def test_scd_type_2_by_time_ignore_destructive_change(tmp_path: Path): # remove `name` column and add new column initial_model = """ - MODEL ( - name test_model, - kind SCD_TYPE_2_BY_TIME ( - unique_key id, - updated_at_name ds, - on_destructive_change ignore - ), - start '2023-01-01', - cron '@daily' - ); + MODEL ( + name test_model, + kind SCD_TYPE_2_BY_COLUMN ( + unique_key id, + columns [stable], + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); - SELECT - *, - 1 as id, - 3 as new_column, - @start_dt as ds - FROM - source_table; - """ + SELECT + *, + 1 as id, + 'stable2' as stable, + 3 as new_column, + @start_ds as ds + FROM + source_table; + """ (models_dir / "test_model.sql").write_text(initial_model) context = Context(paths=[tmp_path], config=config) @@ -8192,10 +8972,10 @@ def test_scd_type_2_by_time_ignore_destructive_change(tmp_path: Path): assert "source_id" in initial_df.columns assert "id" in updated_df.columns assert "ds" in updated_df.columns - # name is still in table since destructive was ignored - assert "name" in updated_df.columns - # new_column is added since it is additive and allowed - assert "new_column" in updated_df.columns + # name is not still in table since destructive was ignored + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns context.close() @@ -8207,15 +8987,15 @@ def test_scd_type_2_by_time_ignore_destructive_change(tmp_path: Path): assert "source_id" in initial_df.columns assert "id" in updated_df.columns assert "ds" in updated_df.columns - # name is still in table since destructive was ignored - assert "name" in updated_df.columns - # new_column is added since it is additive and allowed - assert "new_column" in updated_df.columns + # name is not still in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns context.close() -def test_scd_type_2_by_column_ignore_destructive_change(tmp_path: Path): +def test_incremental_partition_ignore_destructive_change(tmp_path: Path): models_dir = tmp_path / "models" models_dir.mkdir() data_dir = tmp_path / "data" @@ -8231,11 +9011,10 @@ def test_scd_type_2_by_column_ignore_destructive_change(tmp_path: Path): initial_model = f""" MODEL ( name test_model, - kind SCD_TYPE_2_BY_COLUMN ( - unique_key id, - columns [name], + kind INCREMENTAL_BY_PARTITION ( on_destructive_change ignore ), + partitioned_by [ds], start '2023-01-01', cron '@daily' ); @@ -8275,11 +9054,10 @@ def test_scd_type_2_by_column_ignore_destructive_change(tmp_path: Path): initial_model = """ MODEL ( name test_model, - kind SCD_TYPE_2_BY_COLUMN ( - unique_key id, - columns [new_column], + kind INCREMENTAL_BY_PARTITION ( on_destructive_change ignore ), + partitioned_by [ds], start '2023-01-01', cron '@daily' ); @@ -8328,7 +9106,7 @@ def test_scd_type_2_by_column_ignore_destructive_change(tmp_path: Path): context.close() -def test_incremental_partition_ignore_destructive_change(tmp_path: Path): +def test_incremental_partition_ignore_additive_change(tmp_path: Path): models_dir = tmp_path / "models" models_dir.mkdir() data_dir = tmp_path / "data" @@ -8345,7 +9123,8 @@ def test_incremental_partition_ignore_destructive_change(tmp_path: Path): MODEL ( name test_model, kind INCREMENTAL_BY_PARTITION ( - on_destructive_change ignore + on_destructive_change allow, + on_additive_change ignore ), partitioned_by [ds], start '2023-01-01', @@ -8388,7 +9167,8 @@ def test_incremental_partition_ignore_destructive_change(tmp_path: Path): MODEL ( name test_model, kind INCREMENTAL_BY_PARTITION ( - on_destructive_change ignore + on_destructive_change allow, + on_additive_change ignore ), partitioned_by [ds], start '2023-01-01', @@ -8416,10 +9196,10 @@ def test_incremental_partition_ignore_destructive_change(tmp_path: Path): assert "source_id" in initial_df.columns assert "id" in updated_df.columns assert "ds" in updated_df.columns - # name is still in table since destructive was ignored - assert "name" in updated_df.columns - # new_column is added since it is additive and allowed - assert "new_column" in updated_df.columns + # name is not still in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns context.close() @@ -8431,10 +9211,10 @@ def test_incremental_partition_ignore_destructive_change(tmp_path: Path): assert "source_id" in initial_df.columns assert "id" in updated_df.columns assert "ds" in updated_df.columns - # name is still in table since destructive was ignored - assert "name" in updated_df.columns - # new_column is added since it is additive and allowed - assert "new_column" in updated_df.columns + # name is not still in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns context.close() @@ -8599,3 +9379,169 @@ def test_incremental_by_time_model_ignore_destructive_change_unit_test(tmp_path: assert test_result.testsRun == len(test_result.successes) context.close() + + +def test_incremental_by_time_model_ignore_additive_change_unit_test(tmp_path: Path): + models_dir = tmp_path / "models" + models_dir.mkdir() + data_dir = tmp_path / "data" + data_dir.mkdir() + data_filepath = data_dir / "test.duckdb" + test_dir = tmp_path / "tests" + test_dir.mkdir() + test_filepath = test_dir / "test_test_model.yaml" + + config = Config( + model_defaults=ModelDefaultsConfig(dialect="duckdb"), + default_connection=DuckDBConnectionConfig(database=str(data_filepath)), + ) + + # Initial model with 3 columns + initial_model = f""" + MODEL ( + name test_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column ds, + forward_only true, + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + id, + name, + ds + FROM + source_table; + """ + + # Write initial model + (models_dir / "test_model.sql").write_text(initial_model) + + initial_test = f""" + +test_test_model: + model: test_model + inputs: + source_table: + - id: 1 + name: 'test_name' + ds: '2025-01-01' + outputs: + query: + - id: 1 + name: 'test_name' + ds: '2025-01-01' +""" + + # Write initial test + test_filepath.write_text(initial_test) + + with time_machine.travel("2023-01-08 00:00:00 UTC"): + # Create context and apply initial model + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute( + "CREATE TABLE source_table (id INT, name STRING, new_column INT, ds STRING)" + ) + context.engine_adapter.execute( + "INSERT INTO source_table VALUES (1, 'test_name', NULL, '2023-01-01')" + ) + + # Apply initial plan and load data + context.plan("prod", auto_apply=True, no_prompts=True, skip_tests=True) + test_result = context.test() + + # Verify initial data was loaded + initial_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(initial_df) == 1 + assert "id" in initial_df.columns + assert "name" in initial_df.columns + assert "ds" in initial_df.columns + assert len(test_result.successes) == 1 + assert test_result.testsRun == len(test_result.successes) + + context.close() + + # remove `name` column and add new column + initial_model = """ + MODEL ( + name test_model, + kind INCREMENTAL_BY_TIME_RANGE ( + time_column ds, + forward_only true, + on_destructive_change allow, + on_additive_change ignore + ), + start '2023-01-01', + cron '@daily' + ); + + SELECT + id, + new_column, + ds + FROM + source_table; + """ + (models_dir / "test_model.sql").write_text(initial_model) + + # `new_column` is in the output since unit tests are based on the model definition that currently + # exists and doesn't take into account the historical changes to the table. Therefore `new_column` is + # not actually in the table but it is represented in the test + updated_test = f""" + test_test_model: + model: test_model + inputs: + source_table: + - id: 1 + new_column: 3 + ds: '2025-01-01' + outputs: + query: + - id: 1 + new_column: 3 + ds: '2025-01-01' + """ + + # Write initial test + test_filepath.write_text(updated_test) + + context = Context(paths=[tmp_path], config=config) + context.plan("prod", auto_apply=True, no_prompts=True, skip_tests=True) + test_result = context.test() + + # Verify data loading continued to work + # The existing data should still be there and new data should be loaded + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 1 + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is not in table since destructive was ignored + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns + assert len(test_result.successes) == 1 + assert test_result.testsRun == len(test_result.successes) + + context.close() + + with time_machine.travel("2023-01-10 00:00:00 UTC"): + context = Context(paths=[tmp_path], config=config) + context.engine_adapter.execute("INSERT INTO source_table VALUES (2, NULL, 3, '2023-01-09')") + context.run() + test_result = context.test() + updated_df = context.fetchdf('SELECT * FROM "default"."test_model"') + assert len(updated_df) == 2 + assert "id" in updated_df.columns + assert "ds" in updated_df.columns + # name is still not in table since destructive was allowed + assert "name" not in updated_df.columns + # new_column is not added since it is additive and ignored + assert "new_column" not in updated_df.columns + assert len(test_result.successes) == 1 + assert test_result.testsRun == len(test_result.successes) + + context.close() diff --git a/tests/core/test_model.py b/tests/core/test_model.py index df758713a6..81b74ab32a 100644 --- a/tests/core/test_model.py +++ b/tests/core/test_model.py @@ -1814,7 +1814,8 @@ def test_render_definition(): partition_by_time_column TRUE, forward_only FALSE, disable_restatement FALSE, - on_destructive_change 'ERROR' + on_destructive_change 'ERROR', + on_additive_change 'ALLOW' ), storage_format iceberg, partitioned_by `a`, @@ -5522,7 +5523,8 @@ def test_when_matched(): batch_concurrency 1, forward_only FALSE, disable_restatement FALSE, - on_destructive_change 'ERROR' + on_destructive_change 'ERROR', + on_additive_change 'ALLOW' ) ); @@ -5574,7 +5576,8 @@ def fingerprint_merge( batch_concurrency 1, forward_only FALSE, disable_restatement FALSE, - on_destructive_change 'ERROR' + on_destructive_change 'ERROR', + on_additive_change 'ALLOW' ) ); @@ -7466,6 +7469,46 @@ def test_forward_only_on_destructive_change_config() -> None: assert context_model.on_destructive_change.is_allow +def test_model_meta_on_additive_change_property() -> None: + """Test that ModelMeta has on_additive_change property that works like on_destructive_change.""" + from sqlmesh.core.model.kind import IncrementalByTimeRangeKind, OnAdditiveChange + from sqlmesh.core.model.meta import ModelMeta + + # Test incremental model with on_additive_change=ERROR + incremental_kind = IncrementalByTimeRangeKind( + time_column="c", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ) + model_meta = ModelMeta(name="test_model", kind=incremental_kind) + assert model_meta.on_additive_change == OnAdditiveChange.ERROR + + # Test incremental model with on_additive_change=WARN + incremental_kind = IncrementalByTimeRangeKind( + time_column="c", + forward_only=True, + on_additive_change=OnAdditiveChange.WARN, + ) + model_meta = ModelMeta(name="test_model", kind=incremental_kind) + assert model_meta.on_additive_change == OnAdditiveChange.WARN + + # Test incremental model with default on_additive_change (should be ALLOW) + incremental_kind = IncrementalByTimeRangeKind( + time_column="c", + forward_only=True, + ) + model_meta = ModelMeta(name="test_model", kind=incremental_kind) + assert model_meta.on_additive_change == OnAdditiveChange.ALLOW + + incremental_kind = IncrementalByTimeRangeKind( + time_column="c", + forward_only=True, + on_additive_change=OnAdditiveChange.IGNORE, + ) + model_meta = ModelMeta(name="test_model", kind=incremental_kind) + assert model_meta.on_additive_change == OnAdditiveChange.IGNORE + + def test_incremental_by_partition(sushi_context, assert_exp_eq): expressions = d.parse( """ @@ -7784,7 +7827,8 @@ def test_model_kind_to_expression(): partition_by_time_column TRUE, forward_only FALSE, disable_restatement FALSE, -on_destructive_change 'ERROR' +on_destructive_change 'ERROR', +on_additive_change 'ALLOW' )""" ) @@ -7818,7 +7862,8 @@ def test_model_kind_to_expression(): lookback 3, forward_only TRUE, disable_restatement TRUE, -on_destructive_change 'WARN' +on_destructive_change 'WARN', +on_additive_change 'ALLOW' )""" ) @@ -7843,7 +7888,8 @@ def test_model_kind_to_expression(): batch_concurrency 1, forward_only FALSE, disable_restatement FALSE, -on_destructive_change 'ERROR' +on_destructive_change 'ERROR', +on_additive_change 'ALLOW' )""" ) @@ -7870,7 +7916,8 @@ def test_model_kind_to_expression(): batch_concurrency 1, forward_only FALSE, disable_restatement FALSE, -on_destructive_change 'ERROR' +on_destructive_change 'ERROR', +on_additive_change 'ALLOW' )""" ) @@ -7898,7 +7945,8 @@ def test_model_kind_to_expression(): batch_concurrency 1, forward_only FALSE, disable_restatement FALSE, -on_destructive_change 'ERROR' +on_destructive_change 'ERROR', +on_additive_change 'ALLOW' )""" ) @@ -7920,7 +7968,8 @@ def test_model_kind_to_expression(): == """INCREMENTAL_BY_PARTITION ( forward_only TRUE, disable_restatement FALSE, -on_destructive_change 'ERROR' +on_destructive_change 'ERROR', +on_additive_change 'ALLOW' )""" ) @@ -7972,7 +8021,8 @@ def test_model_kind_to_expression(): time_data_type TIMESTAMP, forward_only TRUE, disable_restatement TRUE, -on_destructive_change 'ERROR' +on_destructive_change 'ERROR', +on_additive_change 'ALLOW' )""" ) @@ -8003,7 +8053,8 @@ def test_model_kind_to_expression(): time_data_type TIMESTAMP, forward_only TRUE, disable_restatement TRUE, -on_destructive_change 'ERROR' +on_destructive_change 'ERROR', +on_additive_change 'ALLOW' )""" ) @@ -8034,7 +8085,8 @@ def test_model_kind_to_expression(): time_data_type TIMESTAMP, forward_only TRUE, disable_restatement TRUE, -on_destructive_change 'ERROR' +on_destructive_change 'ERROR', +on_additive_change 'ALLOW' )""" ) @@ -8215,7 +8267,8 @@ def test_merge_filter(): batch_concurrency 1, forward_only FALSE, disable_restatement FALSE, - on_destructive_change 'ERROR' + on_destructive_change 'ERROR', + on_additive_change 'ALLOW' ) ); @@ -8942,6 +8995,7 @@ def test_auto_restatement(): forward_only FALSE, disable_restatement FALSE, on_destructive_change 'ERROR', + on_additive_change 'ALLOW', auto_restatement_cron '@daily' )""" ) @@ -8971,6 +9025,7 @@ def test_auto_restatement(): forward_only FALSE, disable_restatement FALSE, on_destructive_change 'ERROR', + on_additive_change 'ALLOW', auto_restatement_cron '@daily' )""" ) diff --git a/tests/core/test_plan.py b/tests/core/test_plan.py index 7254a924b1..c9c19376d9 100644 --- a/tests/core/test_plan.py +++ b/tests/core/test_plan.py @@ -4,6 +4,8 @@ from unittest.mock import patch import pytest + +from sqlmesh.core.console import TerminalConsole from sqlmesh.utils.metaprogramming import Executable from tests.core.test_table_diff import create_test_console import time_machine @@ -24,7 +26,7 @@ SqlModel, ModelKindName, ) -from sqlmesh.core.model.kind import OnDestructiveChange +from sqlmesh.core.model.kind import OnDestructiveChange, OnAdditiveChange from sqlmesh.core.model.seed import Seed from sqlmesh.core.plan import Plan, PlanBuilder, SnapshotIntervals from sqlmesh.core.snapshot import ( @@ -480,6 +482,75 @@ def test_forward_only_plan_allow_destructive_models( assert mock_logger.call_count == 0 +def test_forward_only_plan_allow_additive_models( + mocker, make_snapshot, make_snapshot_on_additive_change +): + # forward-only model, not forward-only plan + snapshot_a_old, snapshot_a = make_snapshot_on_additive_change() + + context_diff_a = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={snapshot_a.name: (snapshot_a, snapshot_a_old)}, + snapshots={snapshot_a.snapshot_id: snapshot_a, snapshot_a_old.snapshot_id: snapshot_a_old}, + new_snapshots={snapshot_a.snapshot_id: snapshot_a}, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + with pytest.raises(PlanError, match="Plan requires an additive change to a forward-only model"): + PlanBuilder(context_diff_a, forward_only=False).build() + + console = TerminalConsole() + log_warning_spy = mocker.spy(console, "log_warning") + assert PlanBuilder( + context_diff_a, forward_only=False, allow_additive_models=['"a"'], console=console + ).build() + assert log_warning_spy.call_count == 0 + + snapshot_a_old, snapshot_a = make_snapshot_on_additive_change( + on_additive_change=OnAdditiveChange.WARN + ) + + context_diff_a = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={snapshot_a.name: (snapshot_a, snapshot_a_old)}, + snapshots={snapshot_a.snapshot_id: snapshot_a, snapshot_a_old.snapshot_id: snapshot_a_old}, + new_snapshots={snapshot_a.snapshot_id: snapshot_a}, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + log_warning_spy.reset_mock() + assert PlanBuilder(context_diff_a, forward_only=False, console=console).build() + log_warning_spy.assert_called_once_with(""" +Plan requires additive change to forward-only model '"a"'s schema that adds column 'three'. + +Schema changes: + ALTER TABLE "a" ADD COLUMN three TEXT""") + + def test_forward_only_model_on_destructive_change( make_snapshot, make_snapshot_on_destructive_change ): @@ -748,6 +819,7 @@ def test_missing_intervals_lookback(make_snapshot, mocker: MockerFixture): no_gaps=False, forward_only=False, allow_destructive_models=set(), + allow_additive_models=set(), include_unmodified=True, environment_naming_info=EnvironmentNamingInfo(), directly_modified={snapshot_a.snapshot_id}, @@ -3276,6 +3348,645 @@ def _build_plan() -> Plan: assert to_datetime(plan.execution_time) == to_datetime(output_execution_time) +def test_plan_builder_additive_change_error_blocks_plan(make_snapshot): + """Test that additive changes block plan when on_additive_change=ERROR.""" + # Create models with actual schema differences + # Use explicit column schemas in CTE so columns_to_types can be determined + old_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, '2022-01-01'::DATE as ds + ) + select id, name, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + # New model with additional column (additive change) + new_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'email@test.com'::VARCHAR as email, '2022-01-01'::DATE as ds + ) + select id, name, email, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + old_snapshot = make_snapshot(old_model) + new_snapshot = make_snapshot(new_model) + + # Set previous versions to simulate a modification + new_snapshot.previous_versions = ( + SnapshotDataVersion( + fingerprint=SnapshotFingerprint( + data_hash="old_data_hash", + metadata_hash="old_metadata_hash", + ), + version="old_version", + change_category=SnapshotChangeCategory.FORWARD_ONLY, + dev_table_suffix="dev", + ), + ) + + context_diff = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={old_snapshot.name: (new_snapshot, old_snapshot)}, + snapshots={ + old_snapshot.snapshot_id: old_snapshot, + new_snapshot.snapshot_id: new_snapshot, + }, + new_snapshots={new_snapshot.snapshot_id: new_snapshot}, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + builder = PlanBuilder(context_diff, forward_only=True) + + # Should raise PlanError for additive changes when on_additive_change=ERROR + with pytest.raises(PlanError, match="additive change"): + builder.build() + + +def test_plan_builder_additive_change_warn_allows_plan(make_snapshot): + """Test that additive changes allow plan with warning when on_additive_change=WARN.""" + old_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, '2022-01-01'::DATE as ds + ) + select id, name, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.WARN, + ), + ) + + new_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'email@test.com'::VARCHAR as email, '2022-01-01'::DATE as ds + ) + select id, name, email, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.WARN, + ), + ) + + old_snapshot = make_snapshot(old_model) + new_snapshot = make_snapshot(new_model) + + # Set previous versions to simulate a modification + new_snapshot.previous_versions = ( + SnapshotDataVersion( + fingerprint=SnapshotFingerprint( + data_hash="old_data_hash", + metadata_hash="old_metadata_hash", + ), + version="old_version", + change_category=SnapshotChangeCategory.FORWARD_ONLY, + dev_table_suffix="dev", + ), + ) + + context_diff = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={old_snapshot.name: (new_snapshot, old_snapshot)}, + snapshots={ + old_snapshot.snapshot_id: old_snapshot, + new_snapshot.snapshot_id: new_snapshot, + }, + new_snapshots={new_snapshot.snapshot_id: new_snapshot}, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + builder = PlanBuilder(context_diff, forward_only=True) + + # Should log warning but not fail + with patch.object(builder._console, "log_additive_change") as mock_log_additive: + plan = builder.build() + assert plan is not None + mock_log_additive.assert_called() # Should have logged an additive change + + +def test_plan_builder_additive_change_allow_permits_plan(make_snapshot): + """Test that additive changes are permitted when on_additive_change=ALLOW.""" + old_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, '2022-01-01'::DATE as ds + ) + select id, name, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ALLOW, + ), + ) + + new_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'email@test.com'::VARCHAR as email, '2022-01-01'::DATE as ds + ) + select id, name, email, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ALLOW, + ), + ) + + old_snapshot = make_snapshot(old_model) + new_snapshot = make_snapshot(new_model) + + # Set previous versions to simulate a modification + new_snapshot.previous_versions = ( + SnapshotDataVersion( + fingerprint=SnapshotFingerprint( + data_hash="old_data_hash", + metadata_hash="old_metadata_hash", + ), + version="old_version", + change_category=SnapshotChangeCategory.FORWARD_ONLY, + dev_table_suffix="dev", + ), + ) + + context_diff = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={old_snapshot.name: (new_snapshot, old_snapshot)}, + snapshots={ + old_snapshot.snapshot_id: old_snapshot, + new_snapshot.snapshot_id: new_snapshot, + }, + new_snapshots={new_snapshot.snapshot_id: new_snapshot}, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + builder = PlanBuilder(context_diff, forward_only=True) + + # Should build plan without issues + plan = builder.build() + assert plan is not None + + +def test_plan_builder_additive_change_ignore_skips_validation(make_snapshot): + """Test that additive changes are ignored when on_additive_change=IGNORE.""" + old_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, '2022-01-01'::DATE as ds + ) + select id, name, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.IGNORE, + ), + ) + + new_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'email@test.com'::VARCHAR as email, '2022-01-01'::DATE as ds + ) + select id, name, email, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.IGNORE, + ), + ) + + old_snapshot = make_snapshot(old_model) + new_snapshot = make_snapshot(new_model) + + # Set previous versions to simulate a modification + new_snapshot.previous_versions = ( + SnapshotDataVersion( + fingerprint=SnapshotFingerprint( + data_hash="old_data_hash", + metadata_hash="old_metadata_hash", + ), + version="old_version", + change_category=SnapshotChangeCategory.FORWARD_ONLY, + dev_table_suffix="dev", + ), + ) + + context_diff = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={old_snapshot.name: (new_snapshot, old_snapshot)}, + snapshots={ + old_snapshot.snapshot_id: old_snapshot, + new_snapshot.snapshot_id: new_snapshot, + }, + new_snapshots={new_snapshot.snapshot_id: new_snapshot}, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + builder = PlanBuilder(context_diff, forward_only=True) + + # Should build plan without any validation + with patch("sqlmesh.core.plan.builder.logger.warning") as mock_warning: + plan = builder.build() + assert plan is not None + mock_warning.assert_not_called() # Should not log any warnings + + +def test_plan_builder_mixed_destructive_and_additive_changes(make_snapshot): + """Test scenarios with both destructive and additive changes.""" + # Test case: on_destructive_change=IGNORE, on_additive_change=ERROR + # Should ignore destructive changes but error on additive changes + old_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'old_value'::VARCHAR as old_col, '2022-01-01'::DATE as ds + ) + select id, name, old_col, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + new_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'new_value'::VARCHAR as new_col, '2022-01-01'::DATE as ds + ) + select id, name, new_col, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + old_snapshot = make_snapshot(old_model) + new_snapshot = make_snapshot(new_model) + + # Set previous versions to simulate a modification + new_snapshot.previous_versions = ( + SnapshotDataVersion( + fingerprint=SnapshotFingerprint( + data_hash="old_data_hash", + metadata_hash="old_metadata_hash", + ), + version="old_version", + change_category=SnapshotChangeCategory.FORWARD_ONLY, + dev_table_suffix="dev", + ), + ) + + context_diff = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={old_snapshot.name: (new_snapshot, old_snapshot)}, + snapshots={ + old_snapshot.snapshot_id: old_snapshot, + new_snapshot.snapshot_id: new_snapshot, + }, + new_snapshots={new_snapshot.snapshot_id: new_snapshot}, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + builder = PlanBuilder(context_diff, forward_only=True) + + # Should error on additive change (new_col), but ignore destructive change (old_col removal) + with pytest.raises(PlanError, match="additive change"): + builder.build() + + +def test_plan_builder_allow_additive_models_flag(make_snapshot): + """Test that --allow-additive-model flag overrides on_additive_change=ERROR.""" + old_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, '2022-01-01'::DATE as ds + ) + select id, name, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + # New model with additional column (additive change) + new_model = SqlModel( + name="test_model", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'email@test.com'::VARCHAR as email, '2022-01-01'::DATE as ds + ) + select id, name, email, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + old_snapshot = make_snapshot(old_model) + new_snapshot = make_snapshot(new_model) + + # Set previous versions to simulate a modification + new_snapshot.previous_versions = ( + SnapshotDataVersion( + fingerprint=SnapshotFingerprint( + data_hash="old_data_hash", + metadata_hash="old_metadata_hash", + ), + version="old_version", + change_category=SnapshotChangeCategory.FORWARD_ONLY, + dev_table_suffix="dev", + ), + ) + + context_diff = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={new_snapshot.name: (new_snapshot, old_snapshot)}, + snapshots={new_snapshot.snapshot_id: new_snapshot}, + new_snapshots={new_snapshot.snapshot_id: new_snapshot}, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + # First, verify that without the flag, the plan fails with additive change error + builder = PlanBuilder(context_diff, forward_only=True) + with pytest.raises(PlanError, match="additive change"): + builder.build() + + # Now test that the --allow-additive-model flag allows the plan to succeed + builder_with_flag = PlanBuilder( + context_diff, + forward_only=True, + allow_additive_models={'"test_model"'}, + ) + + # Should succeed without raising an exception + plan = builder_with_flag.build() + assert plan is not None + + +def test_plan_builder_allow_additive_models_pattern_matching(make_snapshot): + """Test that --allow-additive-model flag supports pattern matching like destructive models.""" + # Create two models with additive changes + old_model_1 = SqlModel( + name="test.model_1", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, '2022-01-01'::DATE as ds + ) + select id, name, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + new_model_1 = SqlModel( + name="test.model_1", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'email@test.com'::VARCHAR as email, '2022-01-01'::DATE as ds + ) + select id, name, email, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + old_model_2 = SqlModel( + name="other.model_2", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, '2022-01-01'::DATE as ds + ) + select id, name, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + new_model_2 = SqlModel( + name="other.model_2", + dialect="duckdb", + query=parse_one(""" + with source as ( + select 1::INT as id, 'test'::VARCHAR as name, 'phone'::VARCHAR as phone, '2022-01-01'::DATE as ds + ) + select id, name, phone, ds from source + """), + kind=IncrementalByTimeRangeKind( + time_column="ds", + forward_only=True, + on_additive_change=OnAdditiveChange.ERROR, + ), + ) + + old_snapshot_1 = make_snapshot(old_model_1) + new_snapshot_1 = make_snapshot(new_model_1) + old_snapshot_2 = make_snapshot(old_model_2) + new_snapshot_2 = make_snapshot(new_model_2) + + # Set previous versions to simulate modifications + for new_snapshot in [new_snapshot_1, new_snapshot_2]: + new_snapshot.previous_versions = ( + SnapshotDataVersion( + fingerprint=SnapshotFingerprint( + data_hash="old_data_hash", + metadata_hash="old_metadata_hash", + ), + version="old_version", + change_category=SnapshotChangeCategory.FORWARD_ONLY, + dev_table_suffix="dev", + ), + ) + + context_diff = ContextDiff( + environment="prod", + is_new_environment=False, + is_unfinalized_environment=False, + normalize_environment_name=True, + create_from="prod", + create_from_env_exists=True, + added=set(), + removed_snapshots={}, + modified_snapshots={ + new_snapshot_1.name: (new_snapshot_1, old_snapshot_1), + new_snapshot_2.name: (new_snapshot_2, old_snapshot_2), + }, + snapshots={ + new_snapshot_1.snapshot_id: new_snapshot_1, + new_snapshot_2.snapshot_id: new_snapshot_2, + }, + new_snapshots={ + new_snapshot_1.snapshot_id: new_snapshot_1, + new_snapshot_2.snapshot_id: new_snapshot_2, + }, + previous_plan_id=None, + previously_promoted_snapshot_ids=set(), + previous_finalized_snapshots=None, + previous_gateway_managed_virtual_layer=False, + gateway_managed_virtual_layer=False, + environment_statements=[], + ) + + # Test pattern matching: allow only models in "test" schema + # In real usage, patterns would be expanded by Context.expand_model_selections + # Here we simulate what the expansion would produce + builder_with_pattern = PlanBuilder( + context_diff, + forward_only=True, + allow_additive_models={'"test"."model_1"'}, # Only allow test.model_1, not other.model_2 + ) + + # Should still fail because other.model_2 is not allowed + with pytest.raises(PlanError, match="additive change"): + builder_with_pattern.build() + + # Test allowing both patterns + builder_with_both = PlanBuilder( + context_diff, + forward_only=True, + allow_additive_models={'"test"."model_1"', '"other"."model_2"'}, # Allow both models + ) + + # Should succeed + plan = builder_with_both.build() + assert plan is not None + + def test_environment_statements_change_allows_dev_environment_creation(make_snapshot): snapshot = make_snapshot( SqlModel( diff --git a/tests/core/test_plan_stages.py b/tests/core/test_plan_stages.py index 7b172caf6a..744c7d18bf 100644 --- a/tests/core/test_plan_stages.py +++ b/tests/core/test_plan_stages.py @@ -105,6 +105,7 @@ def test_build_plan_stages_basic( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -213,6 +214,7 @@ def test_build_plan_stages_with_before_all_and_after_all( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -322,6 +324,7 @@ def test_build_plan_stages_select_models( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -422,6 +425,7 @@ def test_build_plan_stages_basic_no_backfill( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -541,6 +545,7 @@ def test_build_plan_stages_restatement( }, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -651,6 +656,7 @@ def test_build_plan_stages_forward_only( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -778,6 +784,7 @@ def test_build_plan_stages_forward_only_dev( restatements={}, is_dev=True, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -902,6 +909,7 @@ def _get_snapshots(snapshot_ids: t.List[SnapshotId]) -> t.Dict[SnapshotId, Snaps restatements={}, is_dev=True, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1038,6 +1046,7 @@ def test_build_plan_stages_forward_only_ensure_finalized_snapshots( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=True, @@ -1113,6 +1122,7 @@ def test_build_plan_stages_removed_model( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1195,6 +1205,7 @@ def test_build_plan_stages_environment_suffix_target_changed( restatements={}, is_dev=True, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1293,6 +1304,7 @@ def test_build_plan_stages_indirect_non_breaking_view_migration( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1378,6 +1390,7 @@ def test_build_plan_stages_virtual_environment_mode_filtering( restatements={}, is_dev=True, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1430,6 +1443,7 @@ def test_build_plan_stages_virtual_environment_mode_filtering( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1492,6 +1506,7 @@ def test_build_plan_stages_virtual_environment_mode_filtering( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1565,6 +1580,7 @@ def test_build_plan_stages_virtual_environment_mode_no_updates( restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1628,6 +1644,7 @@ def test_adjust_intervals_new_forward_only_dev_intervals( restatements={}, is_dev=True, # Dev environment allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1695,6 +1712,7 @@ def test_adjust_intervals_restatement_removal( restatements=restatements, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, @@ -1787,6 +1805,7 @@ def test_adjust_intervals_should_force_rebuild(make_snapshot, mocker: MockerFixt restatements={}, is_dev=False, allow_destructive_models=set(), + allow_additive_models=set(), forward_only=False, end_bounded=False, ensure_finalized_snapshots=False, diff --git a/tests/core/test_schema_diff.py b/tests/core/test_schema_diff.py index fd14b0b9b3..916bead3e6 100644 --- a/tests/core/test_schema_diff.py +++ b/tests/core/test_schema_diff.py @@ -10,11 +10,14 @@ TableAlterColumnPosition, TableAlterOperation, get_schema_differ, + TableAlterAddColumnOperation, + TableAlterDropColumnOperation, + TableAlterChangeColumnTypeOperation, ) def test_schema_diff_calculate(): - alter_expressions = SchemaDiffer( + alter_operations = SchemaDiffer( **{ "support_positional_add": False, "support_nested_operations": False, @@ -24,7 +27,7 @@ def test_schema_diff_calculate(): }, } ).compare_columns( - "apply_to_table", + exp.to_table("apply_to_table"), { "id": exp.DataType.build("INT"), "name": exp.DataType.build("STRING"), @@ -39,7 +42,7 @@ def test_schema_diff_calculate(): }, ) - assert [x.sql() for x in alter_expressions] == [ + assert [x.expression.sql() for x in alter_operations] == [ """ALTER TABLE apply_to_table DROP COLUMN price""", """ALTER TABLE apply_to_table ADD COLUMN new_column DOUBLE""", """ALTER TABLE apply_to_table ALTER COLUMN name SET DATA TYPE INT""", @@ -55,7 +58,7 @@ def test_schema_diff_drop_cascade(): "drop_cascade": True, } ).compare_columns( - "apply_to_table", + exp.to_table("apply_to_table"), { "id": exp.DataType.build("INT"), "name": exp.DataType.build("STRING"), @@ -67,7 +70,7 @@ def test_schema_diff_drop_cascade(): }, ) - assert [x.sql() for x in alter_expressions] == [ + assert [x.expression.sql() for x in alter_expressions] == [ """ALTER TABLE apply_to_table DROP COLUMN price CASCADE""" ] @@ -83,7 +86,7 @@ def test_schema_diff_calculate_type_transitions(): }, } ).compare_columns( - "apply_to_table", + exp.to_table("apply_to_table"), { "id": exp.DataType.build("INT"), "ds": exp.DataType.build("STRING"), @@ -94,7 +97,7 @@ def test_schema_diff_calculate_type_transitions(): }, ) - assert [x.sql() for x in alter_expressions] == [ + assert [x.expression.sql() for x in alter_expressions] == [ """ALTER TABLE apply_to_table DROP COLUMN id""", """ALTER TABLE apply_to_table ADD COLUMN id BIGINT""", """ALTER TABLE apply_to_table ALTER COLUMN ds SET DATA TYPE INT""", @@ -114,10 +117,14 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "STRING", - "STRUCT", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ) ], {}, @@ -127,10 +134,14 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT
", [ - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "STRING", - expected_table_struct="STRUCT
", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT
" + ), + array_element_selector="", position=TableAlterColumnPosition.first(), ) ], @@ -141,10 +152,14 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "STRING", - expected_table_struct="STRUCT", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", position=TableAlterColumnPosition.middle(after="id"), ) ], @@ -155,22 +170,34 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT
", [ - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "STRING", - expected_table_struct="STRUCT
", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT
" + ), + array_element_selector="", position=TableAlterColumnPosition.first(), ), - TableAlterOperation.add( - TableAlterColumn.primitive("address2"), - "STRING", - expected_table_struct="STRUCT
", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address2")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT
" + ), + array_element_selector="", position=TableAlterColumnPosition.middle(after="id"), ), - TableAlterOperation.add( - TableAlterColumn.primitive("address3"), - "STRING", - expected_table_struct="STRUCT
", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address3")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT
" + ), + array_element_selector="", position=TableAlterColumnPosition.last(after="age"), ), ], @@ -181,16 +208,24 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT
", [ - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "STRING", - expected_table_struct="STRUCT
", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT
" + ), + array_element_selector="", position=TableAlterColumnPosition.first(), ), - TableAlterOperation.add( - TableAlterColumn.primitive("address2"), - "STRING", - expected_table_struct="STRUCT
", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address2")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT
" + ), + array_element_selector="", position=TableAlterColumnPosition.middle(after="address"), ), ], @@ -204,10 +239,11 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("id"), - "STRUCT", - "INT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("id")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ) ], {}, @@ -217,10 +253,11 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("name"), - "STRUCT", - "STRING", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ) ], {}, @@ -230,10 +267,11 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("age"), - "STRUCT", - "INT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("age")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ) ], {}, @@ -243,20 +281,27 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("id"), - "STRUCT", - "INT", - ), - TableAlterOperation.drop( - TableAlterColumn.primitive("middle"), - "STRUCT", - "STRING", - ), - TableAlterOperation.drop( - TableAlterColumn.primitive("age"), - "STRUCT", - "INT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("id")], + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("middle")], + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("age")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ), ], {}, @@ -266,15 +311,21 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT
", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("address"), - "STRUCT", - "STRING", - ), - TableAlterOperation.drop( - TableAlterColumn.primitive("address2"), - "STRUCT", - "STRING", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address2")], + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ), ], {}, @@ -296,11 +347,15 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("id"), - "STRING", - current_type="INT", - expected_table_struct="STRUCT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("id")], + column_type=exp.DataType.build("STRING"), + current_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ) ], dict( @@ -317,21 +372,30 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("name"), - "STRUCT", - "STRING", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "STRING", - expected_table_struct="STRUCT", - ), - TableAlterOperation.alter_type( - TableAlterColumn.primitive("id"), - "STRING", - current_type="INT", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("id")], + column_type=exp.DataType.build("STRING"), + current_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ), ], dict( @@ -348,13 +412,17 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_d"), ], - "INT", - expected_table_struct="STRUCT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + array_element_selector="", position=TableAlterColumnPosition.first(), ), ], @@ -365,13 +433,17 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_d"), ], - "INT", - expected_table_struct="STRUCT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + array_element_selector="", position=TableAlterColumnPosition.last(after="col_c"), ), ], @@ -382,14 +454,18 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_d"), ], - "INT", - expected_table_struct="STRUCT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), position=TableAlterColumnPosition.middle(after="col_a"), + array_element_selector="", ), ], dict(support_positional_add=True, support_nested_operations=True), @@ -399,23 +475,31 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_d"), ], - "INT", - expected_table_struct="STRUCT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), position=TableAlterColumnPosition.first(), + array_element_selector="", ), - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_e"), ], - "INT", - expected_table_struct="STRUCT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), position=TableAlterColumnPosition.middle(after="col_d"), + array_element_selector="", ), ], dict(support_positional_add=True, support_nested_operations=True), @@ -425,20 +509,28 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT, txt TEXT>", [ - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.primitive("txt"), ], - "TEXT", - expected_table_struct="STRUCT, txt TEXT>", - ), - TableAlterOperation.add( - [ + column_type=exp.DataType.build("TEXT"), + expected_table_struct=exp.DataType.build( + "STRUCT, txt TEXT>" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_d"), ], - "INT", - expected_table_struct="STRUCT, txt TEXT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT, txt TEXT>" + ), + array_element_selector="", ), ], dict(support_positional_add=False, support_nested_operations=True), @@ -448,13 +540,16 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_a"), ], - "STRUCT>", - "INT", + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + array_element_selector="", ), ], dict( @@ -468,13 +563,16 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_c"), ], - "STRUCT>", - "INT", + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + array_element_selector="", ), ], dict( @@ -488,13 +586,16 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_b"), ], - "STRUCT>", - "INT", + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + array_element_selector="", ), ], dict( @@ -508,19 +609,25 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), ], - expected_table_struct="STRUCT", - column_type="STRUCT", + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ), - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), ], - expected_table_struct="STRUCT>", - column_type="STRUCT", + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + column_type=exp.DataType.build("STRUCT"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -533,21 +640,27 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_a"), ], - "STRUCT>", - "INT", - ), - TableAlterOperation.drop( - [ + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + array_element_selector="", + ), + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_b"), ], - "STRUCT>", - "INT", + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + array_element_selector="", ), ], dict( @@ -561,15 +674,18 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.alter_type( - [ + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_c"), ], - "TEXT", - expected_table_struct="STRUCT>", - position=TableAlterColumnPosition.last(after="col_b"), + column_type=exp.DataType.build("TEXT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), current_type=exp.DataType.build("INT"), + array_element_selector="", ), ], dict( @@ -585,41 +701,55 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_a"), ], - "STRUCT>", - "INT", - ), - TableAlterOperation.add( - [ + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_d"), ], - "INT", - expected_table_struct="STRUCT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), position=TableAlterColumnPosition.first(), + array_element_selector="", ), - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_e"), ], - "INT", - expected_table_struct="STRUCT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), position=TableAlterColumnPosition.middle(after="col_b"), + array_element_selector="", ), - TableAlterOperation.alter_type( - [ + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_c"), ], - "TEXT", - expected_table_struct="STRUCT>", - position=TableAlterColumnPosition.last(after="col_e"), + column_type=exp.DataType.build("TEXT"), + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), current_type=exp.DataType.build("INT"), + array_element_selector="", ), ], dict( @@ -636,19 +766,25 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>", "STRUCT>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), ], - expected_table_struct="STRUCT", - column_type="STRUCT", + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ), - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), ], - expected_table_struct="STRUCT>", - column_type="STRUCT", + expected_table_struct=exp.DataType.build( + "STRUCT>" + ), + column_type=exp.DataType.build("STRUCT"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -664,41 +800,55 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>>", "STRUCT, col_c INT>>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_b"), ], - "STRUCT>>", - "INT", - ), - TableAlterOperation.add( - [ + expected_table_struct=exp.DataType.build( + "STRUCT>>" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.primitive("col_c"), ], - "INT", - expected_table_struct="STRUCT, col_c INT>>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT, col_c INT>>" + ), position=TableAlterColumnPosition.last("nested_info"), + array_element_selector="", ), - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.struct("nested_info"), TableAlterColumn.primitive("nest_col_b"), ], - "STRUCT, col_c INT>>", - "INT", - ), - TableAlterOperation.add( - [ + expected_table_struct=exp.DataType.build( + "STRUCT, col_c INT>>" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.struct("info"), TableAlterColumn.struct("nested_info"), TableAlterColumn.primitive("nest_col_c"), ], - "INT", - expected_table_struct="STRUCT, col_c INT>>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT, col_c INT>>" + ), position=TableAlterColumnPosition.last("nest_col_a"), + array_element_selector="", ), ], dict( @@ -715,14 +865,18 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>>", "STRUCT>>", [ - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_struct("infos"), TableAlterColumn.primitive("col_d"), ], - "INT", - expected_table_struct="STRUCT>>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>>" + ), position=TableAlterColumnPosition.middle("col_b"), + array_element_selector="", ), ], dict(support_positional_add=True, support_nested_operations=True), @@ -732,13 +886,16 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>>", "STRUCT>>", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_struct("infos"), TableAlterColumn.primitive("col_b"), ], - "STRUCT>>", - "INT", + expected_table_struct=exp.DataType.build( + "STRUCT>>" + ), + array_element_selector="", ), ], dict( @@ -752,15 +909,18 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>>", "STRUCT>>", [ - TableAlterOperation.alter_type( - [ + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_struct("infos"), TableAlterColumn.primitive("col_c"), ], - "TEXT", - expected_table_struct="STRUCT>>", - position=TableAlterColumnPosition.last("col_b"), - current_type="INT", + column_type=exp.DataType.build("TEXT"), + expected_table_struct=exp.DataType.build( + "STRUCT>>" + ), + current_type=exp.DataType.build("INT"), + array_element_selector="", ), ], dict( @@ -776,20 +936,26 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>>", "STRUCT>, col_e INT>", [ - TableAlterOperation.add( - [ - TableAlterColumn.primitive("col_e"), - ], - "INT", - expected_table_struct="STRUCT>, col_e INT>", - ), - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("col_e")], + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>, col_e INT>" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_struct("infos"), TableAlterColumn.primitive("col_d"), ], - "INT", - expected_table_struct="STRUCT>, col_e INT>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT>, col_e INT>" + ), + array_element_selector="", ), ], dict(support_positional_add=False, support_nested_operations=True), @@ -799,13 +965,17 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT>>", "STRUCT>, values ARRAY>", [ - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_primitive("values"), ], - "ARRAY", - expected_table_struct="STRUCT>, values ARRAY>", + column_type=exp.DataType.build("ARRAY"), + expected_table_struct=exp.DataType.build( + "STRUCT>, values ARRAY>" + ), position=TableAlterColumnPosition.last("infos"), + array_element_selector="", ), ], dict(support_positional_add=True, support_nested_operations=True), @@ -822,19 +992,19 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - [ - TableAlterColumn.primitive("ids"), - ], - "STRUCT", - "INT", - ), - TableAlterOperation.add( - [ - TableAlterColumn.primitive("ids"), - ], - "ARRAY", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("ids")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("ids")], + column_type=exp.DataType.build("ARRAY"), + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], {}, @@ -844,19 +1014,23 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_primitive("ids"), ], - "STRUCT", - "ARRAY", + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ), - TableAlterOperation.add( - [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_primitive("ids"), ], - "INT", - expected_table_struct="STRUCT", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], {}, @@ -876,11 +1050,15 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("address"), - "VARCHAR(121)", - current_type="VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(121)"), + current_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ) ], {}, @@ -890,11 +1068,15 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("address"), - "VARCHAR(121)", - current_type="VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(121)"), + current_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ) ], dict( @@ -906,15 +1088,21 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("address"), - "STRUCT", - "VARCHAR(120)", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "VARCHAR(121)", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(121)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -926,16 +1114,22 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("address"), - "STRUCT", - "VARCHAR(120)", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "VARCHAR(100)", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(100)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), position=TableAlterColumnPosition.last("id"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -949,16 +1143,20 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("address"), - "STRUCT", - "VARCHAR(120)", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "VARCHAR", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR"), + expected_table_struct=exp.DataType.build("STRUCT"), position=TableAlterColumnPosition.last("id"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -970,16 +1168,22 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("address"), - "STRUCT", - "VARCHAR", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), position=TableAlterColumnPosition.last("id"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -991,11 +1195,13 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", # default of 1 --> VARCHAR(1) "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("address"), - "VARCHAR(2)", - current_type="VARCHAR", - expected_table_struct="STRUCT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(2)"), + current_type=exp.DataType.build("VARCHAR"), + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ) ], dict( @@ -1009,16 +1215,20 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", # default of 1 --> VARCHAR(1) [ - TableAlterOperation.drop( - TableAlterColumn.primitive("address"), - "STRUCT", - "VARCHAR(120)", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "VARCHAR", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR"), + expected_table_struct=exp.DataType.build("STRUCT"), position=TableAlterColumnPosition.last("id"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -1033,11 +1243,15 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("address"), - "VARCHAR(max)", - current_type="VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(max)"), + current_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ) ], dict( @@ -1051,11 +1265,15 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("address"), - "VARCHAR(max)", - current_type="VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(max)"), + current_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ) ], dict( @@ -1069,16 +1287,22 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("address"), - "STRUCT", - "VARCHAR(max)", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), position=TableAlterColumnPosition.last("id"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -1093,11 +1317,13 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("address"), - "VARCHAR", - current_type="VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR"), + current_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ) ], dict( @@ -1113,16 +1339,22 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("address"), - "STRUCT", - "VARCHAR", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), position=TableAlterColumnPosition.last("id"), + array_element_selector="", + is_part_of_destructive_change=True, ), ], dict( @@ -1139,11 +1371,13 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("address"), - "TEXT", - current_type="VARCHAR(120)", - expected_table_struct="STRUCT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("TEXT"), + current_type=exp.DataType.build("VARCHAR(120)"), + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ) ], dict( @@ -1204,13 +1438,17 @@ def test_schema_diff_calculate_type_transitions(): "STRUCT", "STRUCT", [ - TableAlterOperation.alter_type( - TableAlterColumn.primitive("total"), - "FLOAT", - current_type="INT", + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("total")], + column_type=exp.DataType.build("FLOAT"), + current_type=exp.DataType.build("INT"), # Note that the resulting table struct will not match what we defined as the desired # result since it could be coerced - expected_table_struct="STRUCT", + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ) ], dict( @@ -1232,7 +1470,9 @@ def test_struct_diff( ): resolver = SchemaDiffer(**config) operations = resolver._from_structs( - exp.DataType.build(current_struct), exp.DataType.build(new_struct) + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", ) assert operations == expected_diff @@ -1260,7 +1500,7 @@ def test_schema_diff_calculate_duckdb(duck_conn): }, ) - alter_expressions = engine_adapter.get_alter_expressions("apply_to_table", "schema_from_table") + alter_expressions = engine_adapter.get_alter_operations("apply_to_table", "schema_from_table") engine_adapter.alter_table(alter_expressions) assert engine_adapter.columns("apply_to_table") == { "id": exp.DataType.build("int"), @@ -1271,46 +1511,57 @@ def test_schema_diff_calculate_duckdb(duck_conn): def test_schema_diff_alter_op_column(): - nested = TableAlterOperation.add( - [ + nested = TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_struct("nested"), TableAlterColumn.primitive("col_a"), ], - "INT", - expected_table_struct="STRUCT>>", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build("STRUCT>>"), position=TableAlterColumnPosition.last("id"), + array_element_selector="", ) - assert nested.column("").sql() == "nested.col_a" - nested_complete_column = TableAlterOperation.add( - [ + assert nested.column.sql() == "nested.col_a" + nested_complete_column = TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_struct("nested_1", quoted=True), TableAlterColumn.struct("nested_2"), TableAlterColumn.array_of_struct("nested_3"), TableAlterColumn.primitive("col_a", quoted=True), ], - "INT", - expected_table_struct="""STRUCT>>>>>""", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + """STRUCT>>>>>""" + ), position=TableAlterColumnPosition.last("id"), + array_element_selector="", ) - assert nested_complete_column.column("").sql() == '"nested_1".nested_2.nested_3."col_a"' - nested_one_more_complete_column = TableAlterOperation.add( - [ + assert nested_complete_column.column.sql() == '"nested_1".nested_2.nested_3."col_a"' + nested_one_more_complete_column = TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_struct("nested_1", quoted=True), TableAlterColumn.struct("nested_2"), TableAlterColumn.array_of_struct("nested_3"), TableAlterColumn.struct("nested_4"), TableAlterColumn.primitive("col_a", quoted=True), ], - "INT", - expected_table_struct="""STRUCT>>>>>>""", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + """STRUCT>>>>>>""" + ), position=TableAlterColumnPosition.last("id"), + array_element_selector="", ) assert ( - nested_one_more_complete_column.column("").sql() + nested_one_more_complete_column.column.sql() == '"nested_1".nested_2.nested_3.nested_4."col_a"' ) - super_nested = TableAlterOperation.add( - [ + super_nested = TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ TableAlterColumn.array_of_struct("nested_1", quoted=True), TableAlterColumn.struct("nested_2"), TableAlterColumn.array_of_struct("nested_3"), @@ -1321,12 +1572,15 @@ def test_schema_diff_alter_op_column(): TableAlterColumn.array_of_struct("nested_8"), TableAlterColumn.primitive("col_a", quoted=True), ], - "INT", - expected_table_struct="""STRUCT>>>>>>>>>>>""", + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + """STRUCT>>>>>>>>>>>""" + ), position=TableAlterColumnPosition.last("id"), + array_element_selector="element", ) assert ( - super_nested.column("element").sql() + super_nested.column.sql() == '"nested_1".element.nested_2.nested_3.element.nested_4.nested_5."nested_6".nested_7.nested_8.element."col_a"' ) @@ -1339,10 +1593,11 @@ def test_schema_diff_alter_op_column(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("name"), - "STRUCT", - "STRING", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", ) ], [], # No operations when ignoring destructive @@ -1353,15 +1608,21 @@ def test_schema_diff_alter_op_column(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("name"), - "STRUCT", - "STRING", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("name"), - "BIGINT", - "STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + column_type=exp.DataType.build("BIGINT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + is_part_of_destructive_change=True, ), ], [], # No operations when ignoring destructive @@ -1372,18 +1633,26 @@ def test_schema_diff_alter_op_column(): "STRUCT", "STRUCT", [ - TableAlterOperation.add( - TableAlterColumn.primitive("new_col"), - "STRING", - "STRUCT", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("new_col")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ), ], [ # Same operation when ignoring destructive - TableAlterOperation.add( - TableAlterColumn.primitive("new_col"), - "STRING", - "STRUCT", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("new_col")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ), ], {}, @@ -1393,35 +1662,52 @@ def test_schema_diff_alter_op_column(): "STRUCT", "STRUCT", [ - TableAlterOperation.drop( - TableAlterColumn.primitive("name"), - "STRUCT", - "STRING", - ), - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "STRING", - "STRUCT", - ), - TableAlterOperation.alter_type( - TableAlterColumn.primitive("id"), - "STRING", - current_type="INT", - expected_table_struct="STRUCT", + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("id")], + column_type=exp.DataType.build("STRING"), + current_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ), ], [ # Only non-destructive operations remain - TableAlterOperation.add( - TableAlterColumn.primitive("address"), - "STRING", - "STRUCT", - ), - TableAlterOperation.alter_type( - TableAlterColumn.primitive("id"), - "STRING", - current_type="INT", - expected_table_struct="STRUCT", + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("id")], + column_type=exp.DataType.build("STRING"), + current_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", ), ], dict( @@ -1443,13 +1729,19 @@ def test_ignore_destructive_operations( # Test with destructive operations allowed (default behavior) operations_with_destructive = resolver._from_structs( - exp.DataType.build(current_struct), exp.DataType.build(new_struct), ignore_destructive=False + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_destructive=False, ) assert operations_with_destructive == expected_diff_with_destructive # Test with destructive operations ignored operations_ignore_destructive = resolver._from_structs( - exp.DataType.build(current_struct), exp.DataType.build(new_struct), ignore_destructive=True + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_destructive=True, ) assert operations_ignore_destructive == expected_diff_ignore_destructive @@ -1491,7 +1783,7 @@ def test_ignore_destructive_compare_columns(): assert len(alter_expressions_ignore_destructive) == 2 # Only ADD + ALTER # Verify the operations are correct - operations_sql = [expr.sql() for expr in alter_expressions_ignore_destructive] + operations_sql = [expr.expression.sql() for expr in alter_expressions_ignore_destructive] add_column_found = any("ADD COLUMN new_col DOUBLE" in op for op in operations_sql) alter_column_found = any("ALTER COLUMN id SET DATA TYPE" in op for op in operations_sql) drop_column_found = any("DROP COLUMN to_drop" in op for op in operations_sql) @@ -1513,15 +1805,21 @@ def test_ignore_destructive_nested_struct_without_support(): # With destructive operations allowed - should do DROP+ADD of entire struct operations_with_destructive = schema_differ._from_structs( - exp.DataType.build(current_struct), exp.DataType.build(new_struct), ignore_destructive=False + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_destructive=False, ) assert len(operations_with_destructive) == 2 # DROP struct + ADD struct - assert operations_with_destructive[0].is_drop - assert operations_with_destructive[1].is_add + assert isinstance(operations_with_destructive[0], TableAlterDropColumnOperation) + assert isinstance(operations_with_destructive[1], TableAlterAddColumnOperation) # With destructive operations ignored - should do nothing operations_ignore_destructive = schema_differ._from_structs( - exp.DataType.build(current_struct), exp.DataType.build(new_struct), ignore_destructive=True + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_destructive=True, ) assert len(operations_ignore_destructive) == 0 @@ -1582,7 +1880,10 @@ def test_ignore_destructive_edge_cases(): new_struct = "STRUCT<>" # Remove all columns operations_ignore_destructive = schema_differ._from_structs( - exp.DataType.build(current_struct), exp.DataType.build(new_struct), ignore_destructive=True + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_destructive=True, ) assert len(operations_ignore_destructive) == 0 @@ -1590,10 +1891,16 @@ def test_ignore_destructive_edge_cases(): same_struct = "STRUCT" operations_same_with_destructive = schema_differ._from_structs( - exp.DataType.build(same_struct), exp.DataType.build(same_struct), ignore_destructive=False + exp.DataType.build(same_struct), + exp.DataType.build(same_struct), + "apply_to_table", + ignore_destructive=False, ) operations_same_ignore_destructive = schema_differ._from_structs( - exp.DataType.build(same_struct), exp.DataType.build(same_struct), ignore_destructive=True + exp.DataType.build(same_struct), + exp.DataType.build(same_struct), + "apply_to_table", + ignore_destructive=True, ) assert len(operations_same_with_destructive) == 0 assert len(operations_same_ignore_destructive) == 0 @@ -1603,11 +1910,346 @@ def test_ignore_destructive_edge_cases(): new_struct = "STRUCT" operations_add_with_destructive = schema_differ._from_structs( - exp.DataType.build(current_struct), exp.DataType.build(new_struct), ignore_destructive=False + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_destructive=False, ) operations_add_ignore_destructive = schema_differ._from_structs( - exp.DataType.build(current_struct), exp.DataType.build(new_struct), ignore_destructive=True + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_destructive=True, ) assert len(operations_add_with_destructive) == 2 # ADD name, ADD age assert len(operations_add_ignore_destructive) == 2 # Same operations assert operations_add_with_destructive == operations_add_ignore_destructive + + +@pytest.mark.parametrize( + "current_struct, new_struct, expected_diff_with_additive, expected_diff_ignore_additive, config", + [ + # Simple ADD operation - should be ignored when ignore_additive=True + ( + "STRUCT", + "STRUCT", + [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("age")], + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ) + ], + [], # No operations when ignoring additive + {}, + ), + # Multiple ADD operations - should all be ignored when ignore_additive=True + ( + "STRUCT", + "STRUCT", + [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("age")], + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + ], + [], # No operations when ignoring additive + {}, + ), + # Pure DROP operation - should work same way regardless of ignore_additive + ( + "STRUCT", + "STRUCT", + [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("age")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + ], + [ + # Same operation when ignoring additive + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("age")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + ], + {}, + ), + # Mix of additive and non-additive operations + ( + "STRUCT", + "STRUCT", + [ + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("address")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterChangeColumnTypeOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("id")], + column_type=exp.DataType.build("STRING"), + current_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("something")], + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("something")], + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + is_part_of_destructive_change=True, + ), + ], + [ + # Only non-additive operations remain (alter is considered additive since it was a compatible change) + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("name")], + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + ), + TableAlterDropColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("something")], + expected_table_struct=exp.DataType.build("STRUCT"), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("something")], + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT" + ), + array_element_selector="", + is_part_of_destructive_change=True, + ), + ], + dict( + compatible_types={ + exp.DataType.build("INT"): {exp.DataType.build("STRING")}, + } + ), + ), + # ADD operations with nested structs - should be ignored when ignore_additive=True + ( + "STRUCT>", + "STRUCT, new_field STRING>", + [ + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[TableAlterColumn.primitive("new_field")], + column_type=exp.DataType.build("STRING"), + expected_table_struct=exp.DataType.build( + "STRUCT, new_field STRING>" + ), + array_element_selector="", + ), + TableAlterAddColumnOperation( + target_table=exp.to_table("apply_to_table"), + column_parts=[ + TableAlterColumn.struct("info"), + TableAlterColumn.primitive("col_c"), + ], + column_type=exp.DataType.build("INT"), + expected_table_struct=exp.DataType.build( + "STRUCT, new_field STRING>" + ), + array_element_selector="", + ), + ], + [], # No operations when ignoring additive + dict(support_nested_operations=True), + ), + ], +) +def test_ignore_additive_operations( + current_struct, + new_struct, + expected_diff_with_additive: t.List[TableAlterOperation], + expected_diff_ignore_additive: t.List[TableAlterOperation], + config: t.Dict[str, t.Any], +): + resolver = SchemaDiffer(**config) + + # Test with additive operations allowed (default behavior) + operations_with_additive = resolver._from_structs( + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_additive=False, + ) + assert operations_with_additive == expected_diff_with_additive + + # Test with additive operations ignored + operations_ignore_additive = resolver._from_structs( + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_additive=True, + ) + assert operations_ignore_additive == expected_diff_ignore_additive + + +def test_ignore_additive_edge_cases(): + """Test edge cases for ignore_additive behavior.""" + schema_differ = SchemaDiffer(support_positional_add=True) + + # Test when all operations are additive - should result in empty list + current_struct = "STRUCT" + new_struct = "STRUCT" # Add all columns + + operations_ignore_additive = schema_differ._from_structs( + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_additive=True, + ) + assert len(operations_ignore_additive) == 0 + + # Test when no operations are needed - should result in empty list regardless of ignore_additive + same_struct = "STRUCT" + + operations_same_with_additive = schema_differ._from_structs( + exp.DataType.build(same_struct), + exp.DataType.build(same_struct), + "apply_to_table", + ignore_additive=False, + ) + operations_same_ignore_additive = schema_differ._from_structs( + exp.DataType.build(same_struct), + exp.DataType.build(same_struct), + "apply_to_table", + ignore_additive=True, + ) + assert len(operations_same_with_additive) == 0 + assert len(operations_same_ignore_additive) == 0 + + # Test when only DROP operations are needed - should be same regardless of ignore_additive + current_struct = "STRUCT" + new_struct = "STRUCT" + + operations_drop_with_additive = schema_differ._from_structs( + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_additive=False, + ) + operations_drop_ignore_additive = schema_differ._from_structs( + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_additive=True, + ) + assert len(operations_drop_with_additive) == 2 # DROP name, DROP age + assert len(operations_drop_ignore_additive) == 2 # Same operations + assert operations_drop_with_additive == operations_drop_ignore_additive + + +def test_ignore_both_destructive_and_additive(): + """Test behavior when both ignore_destructive and ignore_additive are True.""" + schema_differ = SchemaDiffer( + support_positional_add=True, + compatible_types={ + exp.DataType.build("INT"): {exp.DataType.build("STRING")}, + }, + ) + + current_struct = "STRUCT" + new_struct = "STRUCT" # DROP name, ADD address, ALTER id + + operations_ignore_both = schema_differ._from_structs( + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_destructive=True, + ignore_additive=True, + ) + assert len(operations_ignore_both) == 0 + + +def test_ignore_additive_array_operations(): + """Test ignore_additive with array of struct operations.""" + schema_differ = SchemaDiffer( + support_nested_operations=True, + support_positional_add=True, + ) + + current_struct = "STRUCT>>" + new_struct = "STRUCT>>" + + # With additive operations allowed - should add to array struct + operations_with_additive = schema_differ._from_structs( + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_additive=False, + ) + assert len(operations_with_additive) == 1 # ADD to array struct + assert isinstance(operations_with_additive[0], TableAlterAddColumnOperation) + + # With additive operations ignored - should do nothing + operations_ignore_additive = schema_differ._from_structs( + exp.DataType.build(current_struct), + exp.DataType.build(new_struct), + "apply_to_table", + ignore_additive=True, + ) + assert len(operations_ignore_additive) == 0 diff --git a/tests/core/test_snapshot.py b/tests/core/test_snapshot.py index bce091595c..b6836b74ac 100644 --- a/tests/core/test_snapshot.py +++ b/tests/core/test_snapshot.py @@ -135,6 +135,7 @@ def test_json(snapshot: Snapshot): "batch_size": 30, "forward_only": False, "on_destructive_change": "ERROR", + "on_additive_change": "ALLOW", "partition_by_time_column": True, "disable_restatement": False, "dialect": "spark", @@ -913,7 +914,7 @@ def test_fingerprint(model: Model, parent_model: Model): original_fingerprint = SnapshotFingerprint( data_hash="3301649319", - metadata_hash="1125608408", + metadata_hash="3575333731", ) assert fingerprint == original_fingerprint @@ -1013,7 +1014,7 @@ def test_fingerprint_jinja_macros(model: Model): ) original_fingerprint = SnapshotFingerprint( data_hash="2908339239", - metadata_hash="1125608408", + metadata_hash="3575333731", ) fingerprint = fingerprint_from_node(model, nodes={}) diff --git a/tests/core/test_snapshot_evaluator.py b/tests/core/test_snapshot_evaluator.py index 60931b1602..077e65f470 100644 --- a/tests/core/test_snapshot_evaluator.py +++ b/tests/core/test_snapshot_evaluator.py @@ -41,7 +41,7 @@ ExternalModel, model, ) -from sqlmesh.core.model.kind import OnDestructiveChange, ExternalKind +from sqlmesh.core.model.kind import OnDestructiveChange, ExternalKind, OnAdditiveChange from sqlmesh.core.node import IntervalUnit from sqlmesh.core.snapshot import ( DeployabilityIndex, @@ -57,7 +57,12 @@ from sqlmesh.core.snapshot.evaluator import CustomMaterialization, SnapshotCreationFailedError from sqlmesh.utils.concurrency import NodeExecutionFailedError from sqlmesh.utils.date import to_timestamp -from sqlmesh.utils.errors import ConfigError, SQLMeshError, DestructiveChangeError +from sqlmesh.utils.errors import ( + ConfigError, + SQLMeshError, + DestructiveChangeError, + AdditiveChangeError, +) from sqlmesh.utils.metaprogramming import Executable from sqlmesh.utils.pydantic import list_of_fields_validator @@ -1535,7 +1540,7 @@ def python_func(**kwargs): def test_create_clone_in_dev(mocker: MockerFixture, adapter_mock, make_snapshot): adapter_mock.SUPPORTS_CLONING = True - adapter_mock.get_alter_expressions.return_value = [] + adapter_mock.get_alter_operations.return_value = [] evaluator = SnapshotEvaluator(adapter_mock) model = load_sql_based_model( @@ -1587,10 +1592,11 @@ def test_create_clone_in_dev(mocker: MockerFixture, adapter_mock, make_snapshot) rendered_physical_properties={}, ) - adapter_mock.get_alter_expressions.assert_called_once_with( + adapter_mock.get_alter_operations.assert_called_once_with( f"sqlmesh__test_schema.test_schema__test_model__{snapshot.version}__dev", f"sqlmesh__test_schema.test_schema__test_model__{snapshot.version}__dev_schema_tmp", ignore_destructive=False, + ignore_additive=False, ) adapter_mock.alter_table.assert_called_once_with([]) @@ -1602,7 +1608,7 @@ def test_create_clone_in_dev(mocker: MockerFixture, adapter_mock, make_snapshot) def test_drop_clone_in_dev_when_migration_fails(mocker: MockerFixture, adapter_mock, make_snapshot): adapter_mock.SUPPORTS_CLONING = True - adapter_mock.get_alter_expressions.return_value = [] + adapter_mock.get_alter_operations.return_value = [] evaluator = SnapshotEvaluator(adapter_mock) adapter_mock.alter_table.side_effect = Exception("Migration failed") @@ -1644,10 +1650,11 @@ def test_drop_clone_in_dev_when_migration_fails(mocker: MockerFixture, adapter_m rendered_physical_properties={}, ) - adapter_mock.get_alter_expressions.assert_called_once_with( + adapter_mock.get_alter_operations.assert_called_once_with( f"sqlmesh__test_schema.test_schema__test_model__{snapshot.version}__dev", f"sqlmesh__test_schema.test_schema__test_model__{snapshot.version}__dev_schema_tmp", ignore_destructive=False, + ignore_additive=False, ) adapter_mock.alter_table.assert_called_once_with([]) @@ -1667,7 +1674,7 @@ def test_create_clone_in_dev_self_referencing( mocker: MockerFixture, adapter_mock, make_snapshot, use_this_model: bool ): adapter_mock.SUPPORTS_CLONING = True - adapter_mock.get_alter_expressions.return_value = [] + adapter_mock.get_alter_operations.return_value = [] evaluator = SnapshotEvaluator(adapter_mock) from_table = "test_schema.test_model" if not use_this_model else "@this_model" @@ -1773,7 +1780,7 @@ def columns(table_name): assert isinstance(destructive_change_err, DestructiveChangeError) assert ( str(destructive_change_err) - == "\nPlan requires a destructive change to forward-only model '\"test_schema\".\"test_model\"'s schema that drops column 'b'.\n\nSchema changes:\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 DROP COLUMN b\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 ADD COLUMN a INT\n\nTo allow the destructive change, set the model's `on_destructive_change` setting to `warn` or `allow` or include the model in the plan's `--allow-destructive-model` option.\n" + == "\nPlan requires destructive change to forward-only model '\"test_schema\".\"test_model\"'s schema that drops column 'b'.\n\nSchema changes:\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 DROP COLUMN b\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 ADD COLUMN a INT\n\nTo allow the destructive change, set the model's `on_destructive_change` setting to `warn`, `allow`, or `ignore` or include the model in the plan's `--allow-destructive-model` option.\n" ) # WARN @@ -1794,7 +1801,7 @@ def columns(table_name): evaluator.migrate([snapshot], {}, deployability_index=DeployabilityIndex.none_deployable()) assert ( mock_logger.call_args[0][0] - == "\nPlan requires a destructive change to forward-only model '\"test_schema\".\"test_model\"'s schema that drops column 'b'.\n\nSchema changes:\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 DROP COLUMN b\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 ADD COLUMN a INT" + == "\nPlan requires destructive change to forward-only model '\"test_schema\".\"test_model\"'s schema that drops column 'b'.\n\nSchema changes:\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 DROP COLUMN b\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 ADD COLUMN a INT" ) # allow destructive @@ -1808,6 +1815,79 @@ def columns(table_name): assert mock_logger.call_count == 0 +def test_on_additive_change_runtime_check( + mocker: MockerFixture, + make_snapshot, + make_mocked_engine_adapter, +): + adapter = make_mocked_engine_adapter(EngineAdapter) + + current_table = "sqlmesh__test_schema.test_schema__test_model__1" + + def columns(table_name): + if table_name == current_table: + return { + "c": exp.DataType.build("int"), + "a": exp.DataType.build("int"), + } + return { + "c": exp.DataType.build("int"), + "a": exp.DataType.build("int"), + "b": exp.DataType.build("int"), + } + + adapter.columns = columns # type: ignore + mocker.patch.object( + adapter, + "get_data_object", + return_value=DataObject(schema="test_schema", name="test_model", type=DataObjectType.TABLE), + ) + + evaluator = SnapshotEvaluator(adapter) + + # SQLMesh default: ERROR + model = SqlModel( + name="test_schema.test_model", + kind=IncrementalByTimeRangeKind(time_column="a", on_additive_change=OnAdditiveChange.ERROR), + query=parse_one("SELECT c, a, b FROM tbl WHERE ds BETWEEN @start_ds and @end_ds"), + ) + snapshot = make_snapshot(model, version="1") + snapshot.change_category = SnapshotChangeCategory.BREAKING + snapshot.forward_only = True + snapshot.previous_versions = snapshot.all_versions + + with pytest.raises(NodeExecutionFailedError) as ex: + evaluator.migrate([snapshot], {}, deployability_index=DeployabilityIndex.none_deployable()) + + additive_change_error = ex.value.__cause__ + assert isinstance(additive_change_error, AdditiveChangeError) + assert ( + str(additive_change_error) + == "\nPlan requires additive change to forward-only model '\"test_schema\".\"test_model\"'s schema that adds column 'b'.\n\nSchema changes:\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 ADD COLUMN b INT\n\nTo allow the additive change, set the model's `on_additive_change` setting to `warn`, `allow`, or `ignore` or include the model in the plan's `--allow-additive-model` option.\n" + ) + + # WARN + model = SqlModel( + name="test_schema.test_model", + kind=IncrementalByTimeRangeKind( + time_column="a", on_additive_change=OnDestructiveChange.WARN + ), + query=parse_one("SELECT c, a FROM tbl WHERE ds BETWEEN @start_ds and @end_ds"), + ) + snapshot = make_snapshot(model, version="1") + snapshot.change_category = SnapshotChangeCategory.BREAKING + snapshot.forward_only = True + snapshot.previous_versions = snapshot.all_versions + + logger = logging.getLogger("sqlmesh.core.snapshot.evaluator") + with patch.object(logger, "warning") as mock_logger: + evaluator.migrate([snapshot], {}, deployability_index=DeployabilityIndex.none_deployable()) + assert ( + mock_logger.call_args[0][0] + == "\nPlan requires additive change to forward-only model '\"test_schema\".\"test_model\"'s schema that adds column 'b'.\n\nSchema changes:\n ALTER TABLE sqlmesh__test_schema.test_schema__test_model__1 ADD COLUMN b INT" + ) + + def test_forward_only_snapshot_for_added_model(mocker: MockerFixture, adapter_mock, make_snapshot): adapter_mock.SUPPORTS_CLONING = False evaluator = SnapshotEvaluator(adapter_mock) @@ -3698,10 +3778,11 @@ def test_migrate_snapshot(snapshot: Snapshot, mocker: MockerFixture, adapter_moc ] ) - adapter_mock.get_alter_expressions.assert_called_once_with( + adapter_mock.get_alter_operations.assert_called_once_with( snapshot.table_name(), f"{new_snapshot.table_name()}_schema_tmp", ignore_destructive=False, + ignore_additive=False, ) @@ -3730,7 +3811,7 @@ def test_migrate_managed(adapter_mock, make_snapshot, mocker: MockerFixture): adapter_mock.drop_data_object_on_type_mismatch.return_value = False # no schema changes - no-op - adapter_mock.get_alter_expressions.return_value = [] + adapter_mock.get_alter_operations.return_value = [] evaluator.migrate( target_snapshots=[snapshot], snapshots={}, @@ -3743,7 +3824,7 @@ def test_migrate_managed(adapter_mock, make_snapshot, mocker: MockerFixture): adapter_mock.reset_mock() # schema changes - exception thrown - adapter_mock.get_alter_expressions.return_value = [exp.Alter()] + adapter_mock.get_alter_operations.return_value = [exp.Alter()] with pytest.raises(NodeExecutionFailedError) as ex: evaluator.migrate( @@ -3968,10 +4049,11 @@ def columns(table_name): ) # The second mock adapter has to be called only for the gateway-specific model - adapter_mock.get_alter_expressions.assert_called_once_with( + adapter_mock.get_alter_operations.assert_called_once_with( snapshot_2.table_name(True), f"{snapshot_2.table_name(True)}_schema_tmp", ignore_destructive=False, + ignore_additive=False, ) diff --git a/tests/dbt/test_config.py b/tests/dbt/test_config.py index 44b6cd7911..72994fe33c 100644 --- a/tests/dbt/test_config.py +++ b/tests/dbt/test_config.py @@ -9,7 +9,7 @@ from sqlmesh.core.config import Config, ModelDefaultsConfig from sqlmesh.core.dialect import jinja_query from sqlmesh.core.model import SqlModel -from sqlmesh.core.model.kind import OnDestructiveChange +from sqlmesh.core.model.kind import OnDestructiveChange, OnAdditiveChange from sqlmesh.dbt.common import Dependencies from sqlmesh.dbt.context import DbtContext from sqlmesh.dbt.loader import sqlmesh_config @@ -133,6 +133,7 @@ def test_model_to_sqlmesh_fields(): assert kind.batch_size == 5 assert kind.lookback == 3 assert kind.on_destructive_change == OnDestructiveChange.ALLOW + assert kind.on_additive_change == OnAdditiveChange.ALLOW assert ( kind.merge_filter.sql(dialect=model.dialect) == """55 > "__MERGE_SOURCE__"."b" AND "__MERGE_TARGET__"."session_start" > CURRENT_DATE + INTERVAL '7' DAY""" @@ -162,11 +163,14 @@ def test_model_to_sqlmesh_fields(): start="Jan 1 2023", batch_size=5, batch_concurrency=2, + on_schema_change="ignore", ) model = model_config.to_sqlmesh(context) assert isinstance(model.kind, IncrementalByTimeRangeKind) assert model.kind.batch_concurrency == 2 assert model.kind.time_column.column.name == "ds" + assert model.kind.on_destructive_change == OnDestructiveChange.IGNORE + assert model.kind.on_additive_change == OnAdditiveChange.IGNORE def test_test_to_sqlmesh_fields(): @@ -1032,3 +1036,40 @@ def test_depends_on(assert_exp_eq, sushi_test_project): # Make sure the query wasn't rendered assert not sqlmesh_model._query_renderer._cache + + +@pytest.mark.parametrize( + "on_schema_change, expected_additive, expected_destructive", + [ + ("ignore", OnAdditiveChange.IGNORE, OnDestructiveChange.IGNORE), + ("fail", OnAdditiveChange.ERROR, OnDestructiveChange.ERROR), + ("append_new_columns", OnAdditiveChange.ALLOW, OnDestructiveChange.IGNORE), + ("sync_all_columns", OnAdditiveChange.ALLOW, OnDestructiveChange.ALLOW), + ], +) +def test_on_schema_change_properties( + on_schema_change: str, + expected_additive: OnAdditiveChange, + expected_destructive: OnDestructiveChange, +): + model_config = ModelConfig( + name="name", + package_name="package", + alias="model", + schema="custom", + database="database", + materialized=Materialization.INCREMENTAL, + sql="SELECT * FROM foo.table", + time_column="ds", + start="Jan 1 2023", + batch_size=5, + batch_concurrency=2, + on_schema_change=on_schema_change, + ) + context = DbtContext() + context.project_name = "Foo" + context.target = DuckDbConfig(name="target", schema="foo") + model = model_config.to_sqlmesh(context) + + assert model.on_additive_change == expected_additive + assert model.on_destructive_change == expected_destructive diff --git a/tests/dbt/test_transformation.py b/tests/dbt/test_transformation.py index cefedd6814..fdb8345398 100644 --- a/tests/dbt/test_transformation.py +++ b/tests/dbt/test_transformation.py @@ -29,7 +29,12 @@ SqlModel, ViewKind, ) -from sqlmesh.core.model.kind import SCDType2ByColumnKind, SCDType2ByTimeKind +from sqlmesh.core.model.kind import ( + SCDType2ByColumnKind, + SCDType2ByTimeKind, + OnDestructiveChange, + OnAdditiveChange, +) from sqlmesh.core.state_sync.db.snapshot import _snapshot_to_json from sqlmesh.dbt.builtin import _relation_info_to_relation from sqlmesh.dbt.column import ( @@ -113,6 +118,8 @@ def test_model_kind(): updated_at_as_valid_from=True, updated_at_name="updated_at", dialect="duckdb", + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.ALLOW, ) assert ModelConfig( materialized=Materialization.SNAPSHOT, @@ -126,6 +133,8 @@ def test_model_kind(): columns=["foo"], execution_time_as_valid_from=True, dialect="duckdb", + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.ALLOW, ) assert ModelConfig( materialized=Materialization.SNAPSHOT, @@ -140,23 +149,40 @@ def test_model_kind(): columns=["foo"], execution_time_as_valid_from=True, dialect="bigquery", + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.ALLOW, ) assert ModelConfig(materialized=Materialization.INCREMENTAL, time_column="foo").model_kind( context - ) == IncrementalByTimeRangeKind(time_column="foo", dialect="duckdb", forward_only=True) + ) == IncrementalByTimeRangeKind( + time_column="foo", + dialect="duckdb", + forward_only=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, + ) assert ModelConfig( materialized=Materialization.INCREMENTAL, time_column="foo", incremental_strategy="delete+insert", forward_only=False, - ).model_kind(context) == IncrementalByTimeRangeKind(time_column="foo", dialect="duckdb") + ).model_kind(context) == IncrementalByTimeRangeKind( + time_column="foo", + dialect="duckdb", + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, + ) assert ModelConfig( materialized=Materialization.INCREMENTAL, time_column="foo", incremental_strategy="insert_overwrite", ).model_kind(context) == IncrementalByTimeRangeKind( - time_column="foo", dialect="duckdb", forward_only=True + time_column="foo", + dialect="duckdb", + forward_only=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( materialized=Materialization.INCREMENTAL, @@ -164,13 +190,22 @@ def test_model_kind(): unique_key=["bar"], dialect="bigquery", ).model_kind(context) == IncrementalByTimeRangeKind( - time_column="foo", dialect="bigquery", forward_only=True + time_column="foo", + dialect="bigquery", + forward_only=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( materialized=Materialization.INCREMENTAL, unique_key=["bar"], incremental_strategy="merge" ).model_kind(context) == IncrementalByUniqueKeyKind( - unique_key=["bar"], dialect="duckdb", forward_only=True, disable_restatement=False + unique_key=["bar"], + dialect="duckdb", + forward_only=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) dbt_incremental_predicate = "DBT_INTERNAL_DEST.session_start > dateadd(day, -7, current_date)" @@ -189,30 +224,52 @@ def test_model_kind(): forward_only=True, disable_restatement=False, merge_filter=expected_sqlmesh_predicate, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig(materialized=Materialization.INCREMENTAL, unique_key=["bar"]).model_kind( context ) == IncrementalByUniqueKeyKind( - unique_key=["bar"], dialect="duckdb", forward_only=True, disable_restatement=False + unique_key=["bar"], + dialect="duckdb", + forward_only=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( materialized=Materialization.INCREMENTAL, unique_key=["bar"], full_refresh=False ).model_kind(context) == IncrementalByUniqueKeyKind( - unique_key=["bar"], dialect="duckdb", forward_only=True, disable_restatement=True + unique_key=["bar"], + dialect="duckdb", + forward_only=True, + disable_restatement=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( materialized=Materialization.INCREMENTAL, unique_key=["bar"], full_refresh=True ).model_kind(context) == IncrementalByUniqueKeyKind( - unique_key=["bar"], dialect="duckdb", forward_only=True, disable_restatement=False + unique_key=["bar"], + dialect="duckdb", + forward_only=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( materialized=Materialization.INCREMENTAL, unique_key=["bar"], disable_restatement=True ).model_kind(context) == IncrementalByUniqueKeyKind( - unique_key=["bar"], dialect="duckdb", forward_only=True, disable_restatement=True + unique_key=["bar"], + dialect="duckdb", + forward_only=True, + disable_restatement=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -221,7 +278,12 @@ def test_model_kind(): disable_restatement=True, full_refresh=True, ).model_kind(context) == IncrementalByUniqueKeyKind( - unique_key=["bar"], dialect="duckdb", forward_only=True, disable_restatement=True + unique_key=["bar"], + dialect="duckdb", + forward_only=True, + disable_restatement=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -236,6 +298,8 @@ def test_model_kind(): forward_only=True, disable_restatement=True, auto_restatement_cron="0 0 * * *", + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) # Test incompatibile incremental strategies @@ -245,13 +309,23 @@ def test_model_kind(): unique_key=["bar"], incremental_strategy=incremental_strategy, ).model_kind(context) == IncrementalByUniqueKeyKind( - unique_key=["bar"], dialect="duckdb", forward_only=True, disable_restatement=False + unique_key=["bar"], + dialect="duckdb", + forward_only=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( materialized=Materialization.INCREMENTAL, time_column="foo", incremental_strategy="merge" ).model_kind(context) == IncrementalByTimeRangeKind( - time_column="foo", dialect="duckdb", forward_only=True, disable_restatement=False + time_column="foo", + dialect="duckdb", + forward_only=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -260,7 +334,12 @@ def test_model_kind(): incremental_strategy="merge", full_refresh=True, ).model_kind(context) == IncrementalByTimeRangeKind( - time_column="foo", dialect="duckdb", forward_only=True, disable_restatement=False + time_column="foo", + dialect="duckdb", + forward_only=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -269,7 +348,12 @@ def test_model_kind(): incremental_strategy="merge", full_refresh=False, ).model_kind(context) == IncrementalByTimeRangeKind( - time_column="foo", dialect="duckdb", forward_only=True, disable_restatement=False + time_column="foo", + dialect="duckdb", + forward_only=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -278,7 +362,12 @@ def test_model_kind(): incremental_strategy="append", disable_restatement=True, ).model_kind(context) == IncrementalByTimeRangeKind( - time_column="foo", dialect="duckdb", forward_only=True, disable_restatement=True + time_column="foo", + dialect="duckdb", + forward_only=True, + disable_restatement=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -287,7 +376,12 @@ def test_model_kind(): incremental_strategy="insert_overwrite", partition_by={"field": "bar"}, forward_only=False, - ).model_kind(context) == IncrementalByTimeRangeKind(time_column="foo", dialect="duckdb") + ).model_kind(context) == IncrementalByTimeRangeKind( + time_column="foo", + dialect="duckdb", + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, + ) assert ModelConfig( materialized=Materialization.INCREMENTAL, @@ -303,6 +397,8 @@ def test_model_kind(): forward_only=False, auto_restatement_cron="0 0 * * *", auto_restatement_intervals=3, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -310,33 +406,56 @@ def test_model_kind(): incremental_strategy="insert_overwrite", partition_by={"field": "bar"}, ).model_kind(context) == IncrementalUnmanagedKind( - insert_overwrite=True, disable_restatement=False + insert_overwrite=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig(materialized=Materialization.INCREMENTAL).model_kind( context - ) == IncrementalUnmanagedKind(insert_overwrite=True, disable_restatement=False) + ) == IncrementalUnmanagedKind( + insert_overwrite=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, + ) assert ModelConfig(materialized=Materialization.INCREMENTAL, forward_only=False).model_kind( context ) == IncrementalUnmanagedKind( - insert_overwrite=True, disable_restatement=False, forward_only=False + insert_overwrite=True, + disable_restatement=False, + forward_only=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( materialized=Materialization.INCREMENTAL, incremental_strategy="append" - ).model_kind(context) == IncrementalUnmanagedKind(disable_restatement=False) + ).model_kind(context) == IncrementalUnmanagedKind( + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, + ) assert ModelConfig( materialized=Materialization.INCREMENTAL, incremental_strategy="append", full_refresh=None - ).model_kind(context) == IncrementalUnmanagedKind(disable_restatement=False) + ).model_kind(context) == IncrementalUnmanagedKind( + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, + ) assert ModelConfig( materialized=Materialization.INCREMENTAL, incremental_strategy="insert_overwrite", partition_by={"field": "bar", "data_type": "int64"}, ).model_kind(context) == IncrementalUnmanagedKind( - insert_overwrite=True, disable_restatement=False + insert_overwrite=True, + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -345,7 +464,10 @@ def test_model_kind(): partition_by={"field": "bar", "data_type": "int64"}, full_refresh=False, ).model_kind(context) == IncrementalUnmanagedKind( - insert_overwrite=True, disable_restatement=True + insert_overwrite=True, + disable_restatement=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -355,7 +477,10 @@ def test_model_kind(): disable_restatement=True, full_refresh=True, ).model_kind(context) == IncrementalUnmanagedKind( - insert_overwrite=True, disable_restatement=True + insert_overwrite=True, + disable_restatement=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -364,7 +489,10 @@ def test_model_kind(): partition_by={"field": "bar", "data_type": "int64"}, disable_restatement=True, ).model_kind(context) == IncrementalUnmanagedKind( - insert_overwrite=True, disable_restatement=True + insert_overwrite=True, + disable_restatement=True, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ModelConfig( @@ -372,7 +500,11 @@ def test_model_kind(): incremental_strategy="insert_overwrite", auto_restatement_cron="0 0 * * *", ).model_kind(context) == IncrementalUnmanagedKind( - insert_overwrite=True, auto_restatement_cron="0 0 * * *", disable_restatement=False + insert_overwrite=True, + auto_restatement_cron="0 0 * * *", + disable_restatement=False, + on_destructive_change=OnDestructiveChange.IGNORE, + on_additive_change=OnAdditiveChange.IGNORE, ) assert ( @@ -401,6 +533,7 @@ def test_model_kind_snapshot_bigquery(): updated_at_name="updated_at", time_data_type=exp.DataType.build("TIMESTAMPTZ"), dialect="bigquery", + on_destructive_change=OnDestructiveChange.IGNORE, ) # time_data_type is bigquery version even though model dialect is DuckDB @@ -419,6 +552,7 @@ def test_model_kind_snapshot_bigquery(): updated_at_name="updated_at", time_data_type=exp.DataType.build("TIMESTAMPTZ"), # bigquery version dialect="duckdb", + on_destructive_change=OnDestructiveChange.IGNORE, ) diff --git a/tests/integrations/github/cicd/test_integration.py b/tests/integrations/github/cicd/test_integration.py index d69311fb3d..e974ea6fc2 100644 --- a/tests/integrations/github/cicd/test_integration.py +++ b/tests/integrations/github/cicd/test_integration.py @@ -313,7 +313,7 @@ def test_merge_pr_has_non_breaking_change( +++ - @@ -16,7 +16,8 @@ + @@ -17,7 +17,8 @@ SELECT CAST(o.waiter_id AS INT) AS waiter_id, @@ -524,7 +524,7 @@ def test_merge_pr_has_non_breaking_change_diff_start( +++ - @@ -16,7 +16,8 @@ + @@ -17,7 +17,8 @@ SELECT CAST(o.waiter_id AS INT) AS waiter_id, @@ -1047,7 +1047,7 @@ def test_no_merge_since_no_deploy_signal( +++ - @@ -16,7 +16,8 @@ + @@ -17,7 +17,8 @@ SELECT CAST(o.waiter_id AS INT) AS waiter_id, @@ -1247,7 +1247,7 @@ def test_no_merge_since_no_deploy_signal_no_approvers_defined( +++ - @@ -16,7 +16,8 @@ + @@ -17,7 +17,8 @@ SELECT CAST(o.waiter_id AS INT) AS waiter_id, @@ -1429,7 +1429,7 @@ def test_deploy_comment_pre_categorized( +++ - @@ -16,7 +16,8 @@ + @@ -17,7 +17,8 @@ SELECT CAST(o.waiter_id AS INT) AS waiter_id, @@ -2346,7 +2346,7 @@ def test_has_required_approval_but_not_base_branch( +++ - @@ -16,7 +16,8 @@ + @@ -17,7 +17,8 @@ SELECT CAST(o.waiter_id AS INT) AS waiter_id,