Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/concepts/models/python_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ For example, pre/post-statements might modify settings or create indexes. Howeve

You can set the `pre_statements` and `post_statements` arguments to a list of SQL strings, SQLGlot expressions, or macro calls to define the model's pre/post-statements.

**Project-level defaults:** You can also define pre/post-statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).

``` python linenums="1" hl_lines="8-12"
@model(
"db.test_model",
Expand Down Expand Up @@ -182,6 +184,8 @@ These can be used, for example, to grant privileges on views of the virtual laye

Similar to pre/post-statements you can set the `on_virtual_update` argument in the `@model` decorator to a list of SQL strings, SQLGlot expressions, or macro calls.

**Project-level defaults:** You can also define on-virtual-update statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project (including Python models) and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).

``` python linenums="1" hl_lines="8"
@model(
"db.test_model",
Expand Down
2 changes: 2 additions & 0 deletions docs/concepts/models/seed_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,8 @@ ALTER SESSION SET TIMEZONE = 'PST';

Seed models also support on-virtual-update statements, which are executed after the completion of the [Virtual Update](#virtual-update).

**Project-level defaults:** You can also define on-virtual-update statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project (including seed models) and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).

These must be enclosed within an `ON_VIRTUAL_UPDATE_BEGIN;` ...; `ON_VIRTUAL_UPDATE_END;` block:

```sql linenums="1" hl_lines="8-13"
Expand Down
4 changes: 4 additions & 0 deletions docs/concepts/models/sql_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ For example, pre/post-statements might modify settings or create a table index.

Pre/post-statements are just standard SQL commands located before/after the model query. They must end with a semi-colon, and the model query must end with a semi-colon if a post-statement is present. The [example above](#example) contains both pre- and post-statements.

**Project-level defaults:** You can also define pre/post-statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).

!!! warning

Pre/post-statements are evaluated twice: when a model's table is created and when its query logic is evaluated. Executing statements more than once can have unintended side-effects, so you can [conditionally execute](../macros/sqlmesh_macros.md#prepost-statements) them based on SQLMesh's [runtime stage](../macros/macro_variables.md#runtime-variables).
Expand Down Expand Up @@ -97,6 +99,8 @@ The optional on-virtual-update statements allow you to execute SQL commands afte

These can be used, for example, to grant privileges on views of the virtual layer.

**Project-level defaults:** You can also define on-virtual-update statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).

These SQL statements must be enclosed within an `ON_VIRTUAL_UPDATE_BEGIN;` ...; `ON_VIRTUAL_UPDATE_END;` block like this:

```sql linenums="1" hl_lines="10-15"
Expand Down
39 changes: 39 additions & 0 deletions docs/reference/model_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,42 @@ You can also use the `@model_kind_name` variable to fine-tune control over `phys
)
```

You can aso define `pre_statements`, `post_statements` and `on_virtual_update` statements at the project level that will be applied to all models. These default statements are merged with any model-specific statements, with default statements executing first, followed by model-specific statements.

=== "YAML"

```yaml linenums="1"
model_defaults:
dialect: duckdb
pre_statements:
- "SET timeout = 300000"
post_statements:
- "@IF(@runtime_stage = 'evaluating', ANALYZE @this_model)"
on_virtual_update:
- "GRANT SELECT ON @this_model TO ROLE analyst_role"
```

=== "Python"

```python linenums="1"
from sqlmesh.core.config import Config, ModelDefaultsConfig

config = Config(
model_defaults=ModelDefaultsConfig(
dialect="duckdb",
pre_statements=[
"SET query_timeout = 300000",
],
post_statements=[
"@IF(@runtime_stage = 'evaluating', ANALYZE @this_model)",
],
on_virtual_update=[
"GRANT SELECT ON @this_model TO ROLE analyst_role",
],
),
)
```


The SQLMesh project-level `model_defaults` key supports the following options, described in the [general model properties](#general-model-properties) table above:

Expand All @@ -155,6 +191,9 @@ The SQLMesh project-level `model_defaults` key supports the following options, d
- allow_partials
- enabled
- interval_unit
- pre_statements (described [here](../concepts/models/sql_models.md#pre--and-post-statements))
- post_statements (described [here](../concepts/models/sql_models.md#pre--and-post-statements))
- on_virtual_update (described [here](../concepts/models/sql_models.md#on-virtual-update-statements))


### Model Naming
Expand Down
7 changes: 7 additions & 0 deletions sqlmesh/core/config/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import typing as t

from sqlglot import exp
from sqlmesh.core.dialect import parse_one, extract_func_call
from sqlmesh.core.config.base import BaseConfig
from sqlmesh.core.model.kind import (
Expand Down Expand Up @@ -41,6 +42,9 @@ class ModelDefaultsConfig(BaseConfig):
allow_partials: Whether the models can process partial (incomplete) data intervals.
enabled: Whether the models are enabled.
interval_unit: The temporal granularity of the models data intervals. By default computed from cron.
pre_statements: The list of SQL statements that get executed before a model runs.
post_statements: The list of SQL statements that get executed before a model runs.
on_virtual_update: The list of SQL statements to be executed after the virtual update.

"""

Expand All @@ -61,6 +65,9 @@ class ModelDefaultsConfig(BaseConfig):
interval_unit: t.Optional[t.Union[str, IntervalUnit]] = None
enabled: t.Optional[t.Union[str, bool]] = None
formatting: t.Optional[t.Union[str, bool]] = None
pre_statements: t.Optional[t.List[t.Union[str, exp.Expression]]] = None
post_statements: t.Optional[t.List[t.Union[str, exp.Expression]]] = None
on_virtual_update: t.Optional[t.List[t.Union[str, exp.Expression]]] = None

_model_kind_validator = model_kind_validator
_on_destructive_change_validator = on_destructive_change_validator
Expand Down
18 changes: 18 additions & 0 deletions sqlmesh/core/model/definition.py
Original file line number Diff line number Diff line change
Expand Up @@ -2472,6 +2472,24 @@ def _create_model(

statements: t.List[t.Union[exp.Expression, t.Tuple[exp.Expression, bool]]] = []

# Merge default pre_statements with model-specific pre_statements
if "pre_statements" in defaults:
kwargs["pre_statements"] = [
exp.maybe_parse(stmt, dialect=dialect) for stmt in defaults["pre_statements"]
] + kwargs.get("pre_statements", [])

# Merge default post_statements with model-specific post_statements
if "post_statements" in defaults:
kwargs["post_statements"] = [
exp.maybe_parse(stmt, dialect=dialect) for stmt in defaults["post_statements"]
] + kwargs.get("post_statements", [])

# Merge default on_virtual_update with model-specific on_virtual_update
if "on_virtual_update" in defaults:
kwargs["on_virtual_update"] = [
exp.maybe_parse(stmt, dialect=dialect) for stmt in defaults["on_virtual_update"]
] + kwargs.get("on_virtual_update", [])

if "pre_statements" in kwargs:
statements.extend(kwargs["pre_statements"])
if "query" in kwargs:
Expand Down
59 changes: 59 additions & 0 deletions tests/core/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -676,6 +676,65 @@ def test_load_model_defaults_audits(tmp_path):
assert config.model_defaults.audits[1][1]["threshold"].this == "1000"


def test_load_model_defaults_statements(tmp_path):
config_path = tmp_path / "config_model_defaults_statements.yaml"
with open(config_path, "w", encoding="utf-8") as fd:
fd.write(
"""
model_defaults:
dialect: duckdb
pre_statements:
- SET memory_limit = '10GB'
- CREATE TEMP TABLE temp_data AS SELECT 1 as id
post_statements:
- DROP TABLE IF EXISTS temp_data
- ANALYZE @this_model
- SET memory_limit = '5GB'
on_virtual_update:
- UPDATE stats_table SET last_update = CURRENT_TIMESTAMP
"""
)

config = load_config_from_paths(
Config,
project_paths=[config_path],
)

assert config.model_defaults.pre_statements is not None
assert len(config.model_defaults.pre_statements) == 2
assert isinstance(exp.maybe_parse(config.model_defaults.pre_statements[0]), exp.Set)
assert isinstance(exp.maybe_parse(config.model_defaults.pre_statements[1]), exp.Create)

assert config.model_defaults.post_statements is not None
assert len(config.model_defaults.post_statements) == 3
assert isinstance(exp.maybe_parse(config.model_defaults.post_statements[0]), exp.Drop)
assert isinstance(exp.maybe_parse(config.model_defaults.post_statements[1]), exp.Analyze)
assert isinstance(exp.maybe_parse(config.model_defaults.post_statements[2]), exp.Set)

assert config.model_defaults.on_virtual_update is not None
assert len(config.model_defaults.on_virtual_update) == 1
assert isinstance(exp.maybe_parse(config.model_defaults.on_virtual_update[0]), exp.Update)


def test_load_model_defaults_validation_statements(tmp_path):
config_path = tmp_path / "config_model_defaults_statements_wrong.yaml"
with open(config_path, "w", encoding="utf-8") as fd:
fd.write(
"""
model_defaults:
dialect: duckdb
pre_statements:
- 313
"""
)

with pytest.raises(TypeError, match=r"expected str instance, int found"):
config = load_config_from_paths(
Config,
project_paths=[config_path],
)


def test_scheduler_config(tmp_path_factory):
config_path = tmp_path_factory.mktemp("yaml_config") / "config.yaml"
with open(config_path, "w", encoding="utf-8") as fd:
Expand Down
136 changes: 136 additions & 0 deletions tests/core/test_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -2731,3 +2731,139 @@ def _get_missing_intervals(name: str) -> t.List[t.Tuple[datetime, datetime]]:
assert context.engine_adapter.fetchall(
"select min(start_dt), max(end_dt) from sqlmesh_example__pr_env.unrelated_monthly_model"
) == [(to_datetime("2020-01-01 00:00:00"), to_datetime("2020-01-31 23:59:59.999999"))]


def test_defaults_pre_post_statements(tmp_path: Path):
config_path = tmp_path / "config.yaml"
models_path = tmp_path / "models"
models_path.mkdir()

# Create config with default statements
config_path.write_text(
"""
model_defaults:
dialect: duckdb
pre_statements:
- SET memory_limit = '10GB'
- SET threads = @var1
post_statements:
- ANALYZE @this_model
variables:
var1: 4
"""
)

# Create a model
model_path = models_path / "test_model.sql"
model_path.write_text(
"""
MODEL (
name test_model,
kind FULL
);

SELECT 1 as id, 'test' as status;
"""
)

ctx = Context(paths=[tmp_path])

# Initial plan and apply
initial_plan = ctx.plan(auto_apply=True, no_prompts=True)
assert len(initial_plan.new_snapshots) == 1

snapshot = list(initial_plan.new_snapshots)[0]
model = snapshot.model

# Verify statements are in the model and python environment has been popuplated
assert len(model.pre_statements) == 2
assert len(model.post_statements) == 1
assert model.python_env[c.SQLMESH_VARS].payload == "{'var1': 4}"

# Verify the statements contain the expected SQL
assert model.pre_statements[0].sql() == "SET memory_limit = '10GB'"
assert model.render_pre_statements()[0].sql() == "SET \"memory_limit\" = '10GB'"
assert model.pre_statements[1].sql() == "SET threads = @var1"
assert model.render_pre_statements()[1].sql() == 'SET "threads" = 4'

# Update config to change pre_statement
config_path.write_text(
"""
model_defaults:
dialect: duckdb
pre_statements:
- SET memory_limit = '5GB' # Changed value
post_statements:
- ANALYZE @this_model
"""
)

# Reload context and create new plan
ctx = Context(paths=[tmp_path])
updated_plan = ctx.plan(no_prompts=True)

# Should detect a change due to different pre_statements
assert len(updated_plan.directly_modified) == 1

# Apply the plan
ctx.apply(updated_plan)

# Reload the models to get the updated version
ctx.load()
new_model = ctx.models['"test_model"']

# Verify updated statements
assert len(new_model.pre_statements) == 1
assert new_model.pre_statements[0].sql() == "SET memory_limit = '5GB'"
assert new_model.render_pre_statements()[0].sql() == "SET \"memory_limit\" = '5GB'"

# Verify the change was detected by the plan
assert len(updated_plan.directly_modified) == 1


def test_model_defaults_statements_with_on_virtual_update(tmp_path: Path):
config_path = tmp_path / "config.yaml"
models_path = tmp_path / "models"
models_path.mkdir()

# Create config with on_virtual_update
config_path.write_text(
"""
model_defaults:
dialect: duckdb
on_virtual_update:
- SELECT 'Model-defailt virtual update' AS message
"""
)

# Create a model with its own on_virtual_update as wel
model_path = models_path / "test_model.sql"
model_path.write_text(
"""
MODEL (
name test_model,
kind FULL
);

SELECT 1 as id, 'test' as name;

ON_VIRTUAL_UPDATE_BEGIN;
SELECT 'Model-specific update' AS message;
ON_VIRTUAL_UPDATE_END;
"""
)

ctx = Context(paths=[tmp_path])

# Plan and apply
plan = ctx.plan(auto_apply=True, no_prompts=True)

snapshot = list(plan.new_snapshots)[0]
model = snapshot.model

# Verify both default and model-specific on_virtual_update statements
assert len(model.on_virtual_update) == 2

# Default statements should come first
assert model.on_virtual_update[0].sql() == "SELECT 'Model-defailt virtual update' AS message"
assert model.on_virtual_update[1].sql() == "SELECT 'Model-specific update' AS message"
Loading