Skip to content

Commit 705a670

Browse files
Merge remote-tracking branch 'upstream/main' into feat/add-fabric-engine
2 parents e693baf + 3ee4e9f commit 705a670

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+1604
-240
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,8 @@ repos:
2323
files: *files
2424
require_serial: true
2525
exclude: ^(tests/fixtures)
26+
- id: valid migrations
27+
name: valid migrations
28+
entry: tooling/validating_migration_numbers.sh
29+
language: system
30+
pass_filenames: false

docs/concepts/models/python_models.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ For example, pre/post-statements might modify settings or create indexes. Howeve
102102

103103
You can set the `pre_statements` and `post_statements` arguments to a list of SQL strings, SQLGlot expressions, or macro calls to define the model's pre/post-statements.
104104

105+
**Project-level defaults:** You can also define pre/post-statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).
106+
105107
``` python linenums="1" hl_lines="8-12"
106108
@model(
107109
"db.test_model",
@@ -182,6 +184,8 @@ These can be used, for example, to grant privileges on views of the virtual laye
182184

183185
Similar to pre/post-statements you can set the `on_virtual_update` argument in the `@model` decorator to a list of SQL strings, SQLGlot expressions, or macro calls.
184186

187+
**Project-level defaults:** You can also define on-virtual-update statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project (including Python models) and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).
188+
185189
``` python linenums="1" hl_lines="8"
186190
@model(
187191
"db.test_model",

docs/concepts/models/seed_models.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,8 @@ ALTER SESSION SET TIMEZONE = 'PST';
203203

204204
Seed models also support on-virtual-update statements, which are executed after the completion of the [Virtual Update](#virtual-update).
205205

206+
**Project-level defaults:** You can also define on-virtual-update statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project (including seed models) and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).
207+
206208
These must be enclosed within an `ON_VIRTUAL_UPDATE_BEGIN;` ...; `ON_VIRTUAL_UPDATE_END;` block:
207209

208210
```sql linenums="1" hl_lines="8-13"

docs/concepts/models/sql_models.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@ For example, pre/post-statements might modify settings or create a table index.
6767

6868
Pre/post-statements are just standard SQL commands located before/after the model query. They must end with a semi-colon, and the model query must end with a semi-colon if a post-statement is present. The [example above](#example) contains both pre- and post-statements.
6969

70+
**Project-level defaults:** You can also define pre/post-statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).
71+
7072
!!! warning
7173

7274
Pre/post-statements are evaluated twice: when a model's table is created and when its query logic is evaluated. Executing statements more than once can have unintended side-effects, so you can [conditionally execute](../macros/sqlmesh_macros.md#prepost-statements) them based on SQLMesh's [runtime stage](../macros/macro_variables.md#runtime-variables).
@@ -97,6 +99,8 @@ The optional on-virtual-update statements allow you to execute SQL commands afte
9799

98100
These can be used, for example, to grant privileges on views of the virtual layer.
99101

102+
**Project-level defaults:** You can also define on-virtual-update statements at the project level using `model_defaults` in your configuration. These will be applied to all models in your project and merged with any model-specific statements. Default statements are executed first, followed by model-specific statements. Learn more about this in the [model configuration reference](../../reference/model_configuration.md#model-defaults).
103+
100104
These SQL statements must be enclosed within an `ON_VIRTUAL_UPDATE_BEGIN;` ...; `ON_VIRTUAL_UPDATE_END;` block like this:
101105

102106
```sql linenums="1" hl_lines="10-15"

docs/guides/configuration.md

Lines changed: 104 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -320,10 +320,14 @@ The cache directory is automatically created if it doesn't exist. You can clear
320320

321321
SQLMesh creates schemas, physical tables, and views in the data warehouse/engine. Learn more about why and how SQLMesh creates schema in the ["Why does SQLMesh create schemas?" FAQ](../faq/faq.md#schema-question).
322322

323-
The default SQLMesh behavior described in the FAQ is appropriate for most deployments, but you can override where SQLMesh creates physical tables and views with the `physical_schema_mapping`, `environment_suffix_target`, and `environment_catalog_mapping` configuration options. These options are in the [environments](../reference/configuration.md#environments) section of the configuration reference page.
323+
The default SQLMesh behavior described in the FAQ is appropriate for most deployments, but you can override *where* SQLMesh creates physical tables and views with the `physical_schema_mapping`, `environment_suffix_target`, and `environment_catalog_mapping` configuration options.
324+
325+
You can also override *what* the physical tables are called by using the `physical_table_naming_convention` option.
326+
327+
These options are in the [environments](../reference/configuration.md#environments) section of the configuration reference page.
324328

325329
#### Physical table schemas
326-
By default, SQLMesh creates physical tables for a model with a naming convention of `sqlmesh__[model schema]`.
330+
By default, SQLMesh creates physical schemas for a model with a naming convention of `sqlmesh__[model schema]`.
327331

328332
This can be overridden on a per-schema basis using the `physical_schema_mapping` option, which removes the `sqlmesh__` prefix and uses the [regex pattern](https://docs.python.org/3/library/re.html#regular-expression-syntax) you provide to map the schemas defined in your model to their corresponding physical schemas.
329333

@@ -436,6 +440,104 @@ Given the example of a model called `my_schema.users` with a default catalog of
436440
- Using `environment_suffix_target: catalog` only works on engines that support querying across different catalogs. If your engine does not support cross-catalog queries then you will need to use `environment_suffix_target: schema` or `environment_suffix_target: table` instead.
437441
- Automatic catalog creation is not supported on all engines even if they support cross-catalog queries. For engines where it is not supported, the catalogs must be managed externally from SQLMesh and exist prior to invoking SQLMesh.
438442

443+
#### Physical table naming convention
444+
445+
Out of the box, SQLMesh has the following defaults set:
446+
447+
- `environment_suffix_target: schema`
448+
- `physical_table_naming_convention: schema_and_table`
449+
- no `physical_schema_mapping` overrides, so a `sqlmesh__<model schema>` physical schema will be created for each model schema
450+
451+
This means that given a catalog of `warehouse` and a model named `finance_mart.transaction_events_over_threshold`, SQLMesh will create physical tables using the following convention:
452+
453+
```
454+
# <catalog>.sqlmesh__<schema>.<schema>__<table>__<fingerprint>
455+
456+
warehouse.sqlmesh__finance_mart.finance_mart__transaction_events_over_threshold__<fingerprint>
457+
```
458+
459+
This deliberately contains some redundancy with the *model* schema as it's repeated at the physical layer in both the physical schema name as well as the physical table name.
460+
461+
This default exists to make the physical table names portable between different configurations. If you were to define a `physical_schema_mapping` that maps all models to the same physical schema, since the model schema is included in the table name as well, there are no naming conflicts.
462+
463+
##### Table only
464+
465+
Some engines have object name length limitations which cause them to [silently truncate](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS) table and view names that exceed this limit. This behaviour breaks SQLMesh, so we raise a runtime error if we detect the engine would silently truncate the name of the table we are trying to create.
466+
467+
Having redundancy in the physical table names does reduce the number of characters that can be utilised in model names. To increase the number of characters available to model names, you can use `physical_table_naming_convention` like so:
468+
469+
=== "YAML"
470+
471+
```yaml linenums="1"
472+
physical_table_naming_convention: table_only
473+
```
474+
475+
=== "Python"
476+
477+
```python linenums="1"
478+
from sqlmesh.core.config import Config, ModelDefaultsConfig, TableNamingConvention
479+
480+
config = Config(
481+
model_defaults=ModelDefaultsConfig(dialect=<dialect>),
482+
physical_table_naming_convention=TableNamingConvention.TABLE_ONLY,
483+
)
484+
```
485+
486+
This will cause SQLMesh to omit the model schema from the table name and generate physical names that look like (using the above example):
487+
```
488+
# <catalog>.sqlmesh__<schema>.<table>__<fingerprint>
489+
490+
warehouse.sqlmesh__finance_mart.transaction_events_over_threshold__<fingerprint>
491+
```
492+
493+
Notice that the model schema name is no longer part of the physical table name. This allows for slightly longer model names on engines with low identifier length limits, which may be useful for your project.
494+
495+
In this configuration, it is your responsibility to ensure that any schema overrides in `physical_schema_mapping` result in each model schema getting mapped to a unique physical schema.
496+
497+
For example, the following configuration will cause **data corruption**:
498+
499+
```yaml
500+
physical_table_naming_convention: table_only
501+
physical_schema_mapping:
502+
'.*': sqlmesh
503+
```
504+
505+
This is because every model schema is mapped to the same physical schema but the model schema name is omitted from the physical table name.
506+
507+
##### MD5 hash
508+
509+
If you *still* need more characters, you can set `physical_table_naming_convention: hash_md5` like so:
510+
511+
=== "YAML"
512+
513+
```yaml linenums="1"
514+
physical_table_naming_convention: hash_md5
515+
```
516+
517+
=== "Python"
518+
519+
```python linenums="1"
520+
from sqlmesh.core.config import Config, ModelDefaultsConfig, TableNamingConvention
521+
522+
config = Config(
523+
model_defaults=ModelDefaultsConfig(dialect=<dialect>),
524+
physical_table_naming_convention=TableNamingConvention.HASH_MD5,
525+
)
526+
```
527+
528+
This will cause SQLMesh generate physical names that are always 45-50 characters in length and look something like:
529+
530+
```
531+
# sqlmesh_md5__<hash of what we would have generated using 'schema_and_table'>
532+
533+
sqlmesh_md5__d3b07384d113edec49eaa6238ad5ff00
534+
535+
# or, for a dev preview
536+
sqlmesh_md5__d3b07384d113edec49eaa6238ad5ff00__dev
537+
```
538+
539+
This has a downside that now it's much more difficult to determine which table corresponds to which model by just looking at the database with a SQL client. However, the table names have a predictable length so there are no longer any surprises with identfiers exceeding the max length at the physical layer.
540+
439541
#### Environment view catalogs
440542

441543
By default, SQLMesh creates an environment view in the same [catalog](../concepts/glossary.md#catalog) as the physical table the view points to. The physical table's catalog is determined by either the catalog specified in the model name or the default catalog defined in the connection.

0 commit comments

Comments
 (0)