Chore: Split migration script implementations into DDL and DML by izeigerman · Pull Request #5307 · SQLMesh/sqlmesh

izeigerman · 2025-09-05T20:34:40Z

The split allows us to skip unnecessary DML operations during the initial state setup, which significantly speeds up initialization when using engines like BigQuery.

In my test the init time went from ~1m30s down to ~25s

Copilot

Pull Request Overview

This PR splits migration script implementations into two functions: migrate_ddl for schema changes and migrate_dml for data manipulation. This separation allows the system to skip unnecessary data operations during initial setup when tables are empty, significantly improving initialization performance (from ~1m30s to ~25s in BigQuery).

Key Changes

Split single migrate function into migrate_ddl and migrate_dml functions across all migration files
Updated migration execution logic to conditionally run DML operations only when state tables already exist
Fixed field ordering consistency in environment queries by sorting field names

Reviewed Changes

Copilot reviewed 98 out of 98 changed files in this pull request and generated no comments.

File	Description
`sqlmesh/core/state_sync/db/migrator.py`	Modified migration execution logic to separate DDL and DML operations
`sqlmesh/core/state_sync/db/environment.py`	Fixed field ordering consistency by sorting Environment field names
`sqlmesh/core/context.py`	Added status message for project initialization
95 migration files	Split `migrate` function into `migrate_ddl` and `migrate_dml` functions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

eakmanrq · 2025-09-05T21:00:44Z

sqlmesh/core/state_sync/db/environment.py

    def _environment_summmary_from_row(self, row: t.Tuple[str, ...]) -> EnvironmentSummary:
        return EnvironmentSummary(
-            **{field: row[i] for i, field in enumerate(EnvironmentSummary.all_fields())}
+            **{field: row[i] for i, field in enumerate(sorted(EnvironmentSummary.all_fields()))}


Why is sorting needed?

Because all_fields() returns a set

Chore: Split migration script implementations into DDL and DML

732601e

izeigerman requested review from a team and Copilot September 5, 2025 20:34

Copilot AI reviewed Sep 5, 2025

View reviewed changes

fix tests

e6cad2b

eakmanrq reviewed Sep 5, 2025

View reviewed changes

eakmanrq approved these changes Sep 5, 2025

View reviewed changes

migrate_ddl -> migrate_schemas, migrate_dml -> migrate_rows

b10a32a

izeigerman merged commit c40d791 into main Sep 5, 2025
36 checks passed

izeigerman deleted the chore-migrate-split-ddl-dml branch September 5, 2025 23:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore: Split migration script implementations into DDL and DML#5307

Chore: Split migration script implementations into DDL and DML#5307
izeigerman merged 3 commits intomainfrom
chore-migrate-split-ddl-dml

izeigerman commented Sep 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

eakmanrq Sep 5, 2025

Uh oh!

izeigerman Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

izeigerman commented Sep 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

eakmanrq Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

izeigerman Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants