Chore: Cache results of get_data_objects by izeigerman · Pull Request #5467 · SQLMesh/sqlmesh

izeigerman · 2025-09-30T22:17:26Z

This reduces the number of INFORMATION_SCHEMA queries from 28 down to 6 on the fresh sushi project. I anticipate equally dramatic improvements in other scenarios.

Copilot

Pull Request Overview

This PR implements caching for the get_data_objects method to reduce INFORMATION_SCHEMA queries from 28 to 6 in the fresh sushi project. The optimization adds an in-memory cache that stores data object lookups and intelligently invalidates entries when DDL operations modify the database schema.

Adds _data_object_cache dictionary to track data object existence/non-existence
Implements cache invalidation on create/drop operations for tables, views, and other data objects
Refactors snapshot evaluator methods to distinguish between physical and virtual data objects

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`sqlmesh/core/engine_adapter/base.py`	Adds caching logic to `get_data_objects`, cache invalidation on DDL operations, and helper functions
`sqlmesh/core/snapshot/evaluator.py`	Refactors data object retrieval methods and adds virtual data object pre-fetching for promotion
`tests/core/engine_adapter/test_base.py`	Adds comprehensive test coverage for caching functionality and invalidation scenarios
`tests/core/test_snapshot_evaluator.py`	Updates test mocks to accommodate new caching behavior
`tests/core/engine_adapter/test_snowflake.py`	Fixes test parameter format from identifier objects to string literals
`tests/core/engine_adapter/test_athena.py`	Adds cache clearing call to maintain test isolation

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

sqlmesh/core/engine_adapter/base.py

Copilot · 2025-09-30T22:18:11Z

tests/core/engine_adapter/test_snowflake.py

+    adapter.drop_managed_table(table_name="foo.bar", exists=False)
+    adapter.drop_managed_table(table_name="foo.bar", exists=True)


The change from exp.parse_identifier("foo") to "foo.bar" string literal appears to be fixing a test issue, but this changes the test from testing a single identifier to a qualified name. Verify this change aligns with the intended test behavior and that the corresponding SQL output changes are expected.

the API was never meant to take identifiers in its argument

sqlmesh/core/engine_adapter/base.py

eakmanrq · 2025-09-30T22:41:32Z

@izeigerman how did you determine where to add the clears? For example I think replace query needs it since it checks for table existence. Unless I'm not reading the implementation correctly, you record the lack of existence in cache and therefore the cache would need to be updated after replace query runs if the table did not exist.

Edit: I'm actually not sure why it doesn't use def table_exists but the point still stands generally.

izeigerman · 2025-10-01T00:13:09Z

how did you determine where to add the clears? For example I think replace query needs it since it checks for table existence. Unless I'm not reading the implementation correctly, you record the lack of existence in cache and therefore the cache would need to be updated after replace query runs if the table did not exist.

Wdym? We clear the cache every time we create a table, including during replace query.

Edit: I'm actually not sure why it doesn't use def table_exists but the point still stands generally.

yeah, using the cache for table_exists is a good point.

eakmanrq · 2025-10-01T14:56:08Z

Wdym? We clear the cache every time we create a table, including during replace query.

Yes I see now. The internal create call clears the cache which is why it is handled.

Did you verify that no engine overrides a create or drop call that also needs to update the cache?

Also I noticed that public engine adapter methods that accept arbitrary SQL (fetch*, execute) are assumed to not contain create/drop calls. Should we make them pessimistic and clear cache? Could potentially clear the cache is they contain create or drop expressions?

izeigerman · 2025-10-02T00:00:04Z

Did you verify that no engine overrides a create or drop call that also needs to update the cache?

I did. Do you happen to see places I missed?

Also I noticed that public engine adapter methods that accept arbitrary SQL (fetch*, execute) are assumed to not contain create/drop calls. Should we make them pessimistic and clear cache? Could potentially clear the cache is they contain create or drop expressions?

Let me think about this

eakmanrq · 2025-10-02T00:31:43Z

Unless I am missing something, I think theses need reviewed:

georgesittas

Interesting PR, nice work.

sqlmesh/core/engine_adapter/base.py

izeigerman requested review from a team and Copilot September 30, 2025 22:17

Copilot AI reviewed Sep 30, 2025

View reviewed changes

Chore: Cache results of get_data_objects

ea82df0

izeigerman force-pushed the chore-cache-data-objects branch from fca603b to ea82df0 Compare September 30, 2025 22:21

eakmanrq reviewed Sep 30, 2025

View reviewed changes

sqlmesh/core/engine_adapter/base.py Show resolved Hide resolved

address comments

c7f5771

izeigerman force-pushed the chore-cache-data-objects branch from b91a30a to c7f5771 Compare October 1, 2025 00:38

georgesittas reviewed Oct 2, 2025

View reviewed changes

sqlmesh/core/engine_adapter/base.py Outdated Show resolved Hide resolved

sqlmesh/core/engine_adapter/base.py Outdated Show resolved Hide resolved

izeigerman force-pushed the chore-cache-data-objects branch from d0adc72 to c0d9022 Compare October 2, 2025 17:44

eakmanrq approved these changes Oct 2, 2025

View reviewed changes

only cache data objects mantained by sqlmesh

032cd97

izeigerman force-pushed the chore-cache-data-objects branch from c0d9022 to 032cd97 Compare October 2, 2025 18:29

izeigerman merged commit 42fbc64 into main Oct 2, 2025
45 of 46 checks passed

izeigerman deleted the chore-cache-data-objects branch October 2, 2025 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore: Cache results of get_data_objects#5467

Chore: Cache results of get_data_objects#5467
izeigerman merged 3 commits intomainfrom
chore-cache-data-objects

izeigerman commented Sep 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Sep 30, 2025

Uh oh!

izeigerman Sep 30, 2025

Uh oh!

Uh oh!

eakmanrq commented Sep 30, 2025 •

edited

Loading

Uh oh!

izeigerman commented Oct 1, 2025 •

edited

Loading

Uh oh!

eakmanrq commented Oct 1, 2025

Uh oh!

izeigerman commented Oct 2, 2025

Uh oh!

eakmanrq commented Oct 2, 2025

Uh oh!

georgesittas left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		adapter.drop_managed_table(table_name="foo.bar", exists=False)
		adapter.drop_managed_table(table_name="foo.bar", exists=True)

Conversation

izeigerman commented Sep 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

izeigerman Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eakmanrq commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

izeigerman commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eakmanrq commented Oct 1, 2025

Uh oh!

izeigerman commented Oct 2, 2025

Uh oh!

eakmanrq commented Oct 2, 2025

Uh oh!

georgesittas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eakmanrq commented Sep 30, 2025 •

edited

Loading

izeigerman commented Oct 1, 2025 •

edited

Loading