Skip to content

Chore: Split migration script implementations into DDL and DML#5307

Merged
izeigerman merged 3 commits intomainfrom
chore-migrate-split-ddl-dml
Sep 5, 2025
Merged

Chore: Split migration script implementations into DDL and DML#5307
izeigerman merged 3 commits intomainfrom
chore-migrate-split-ddl-dml

Conversation

@izeigerman
Copy link
Collaborator

The split allows us to skip unnecessary DML operations during the initial state setup, which significantly speeds up initialization when using engines like BigQuery.

In my test the init time went from ~1m30s down to ~25s

@izeigerman izeigerman requested review from a team and Copilot September 5, 2025 20:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR splits migration script implementations into two functions: migrate_ddl for schema changes and migrate_dml for data manipulation. This separation allows the system to skip unnecessary data operations during initial setup when tables are empty, significantly improving initialization performance (from ~1m30s to ~25s in BigQuery).

Key Changes

  • Split single migrate function into migrate_ddl and migrate_dml functions across all migration files
  • Updated migration execution logic to conditionally run DML operations only when state tables already exist
  • Fixed field ordering consistency in environment queries by sorting field names

Reviewed Changes

Copilot reviewed 98 out of 98 changed files in this pull request and generated no comments.

File Description
sqlmesh/core/state_sync/db/migrator.py Modified migration execution logic to separate DDL and DML operations
sqlmesh/core/state_sync/db/environment.py Fixed field ordering consistency by sorting Environment field names
sqlmesh/core/context.py Added status message for project initialization
95 migration files Split migrate function into migrate_ddl and migrate_dml functions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

def _environment_summmary_from_row(self, row: t.Tuple[str, ...]) -> EnvironmentSummary:
return EnvironmentSummary(
**{field: row[i] for i, field in enumerate(EnvironmentSummary.all_fields())}
**{field: row[i] for i, field in enumerate(sorted(EnvironmentSummary.all_fields()))}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is sorting needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because all_fields() returns a set

@izeigerman izeigerman merged commit c40d791 into main Sep 5, 2025
36 checks passed
@izeigerman izeigerman deleted the chore-migrate-split-ddl-dml branch September 5, 2025 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants