Chore: Cache results of get_data_objects#5467
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR implements caching for the get_data_objects method to reduce INFORMATION_SCHEMA queries from 28 to 6 in the fresh sushi project. The optimization adds an in-memory cache that stores data object lookups and intelligently invalidates entries when DDL operations modify the database schema.
- Adds
_data_object_cachedictionary to track data object existence/non-existence - Implements cache invalidation on create/drop operations for tables, views, and other data objects
- Refactors snapshot evaluator methods to distinguish between physical and virtual data objects
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
sqlmesh/core/engine_adapter/base.py |
Adds caching logic to get_data_objects, cache invalidation on DDL operations, and helper functions |
sqlmesh/core/snapshot/evaluator.py |
Refactors data object retrieval methods and adds virtual data object pre-fetching for promotion |
tests/core/engine_adapter/test_base.py |
Adds comprehensive test coverage for caching functionality and invalidation scenarios |
tests/core/test_snapshot_evaluator.py |
Updates test mocks to accommodate new caching behavior |
tests/core/engine_adapter/test_snowflake.py |
Fixes test parameter format from identifier objects to string literals |
tests/core/engine_adapter/test_athena.py |
Adds cache clearing call to maintain test isolation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| adapter.drop_managed_table(table_name="foo.bar", exists=False) | ||
| adapter.drop_managed_table(table_name="foo.bar", exists=True) |
There was a problem hiding this comment.
The change from exp.parse_identifier("foo") to "foo.bar" string literal appears to be fixing a test issue, but this changes the test from testing a single identifier to a qualified name. Verify this change aligns with the intended test behavior and that the corresponding SQL output changes are expected.
There was a problem hiding this comment.
the API was never meant to take identifiers in its argument
fca603b to
ea82df0
Compare
|
@izeigerman how did you determine where to add the clears? For example I think replace query needs it since it checks for table existence. Unless I'm not reading the implementation correctly, you record the lack of existence in cache and therefore the cache would need to be updated after replace query runs if the table did not exist. Edit: I'm actually not sure why it doesn't use |
Wdym? We clear the cache every time we create a table, including during replace query.
yeah, using the cache for |
b91a30a to
c7f5771
Compare
Yes I see now. The internal create call clears the cache which is why it is handled. Did you verify that no engine overrides a create or drop call that also needs to update the cache? Also I noticed that public engine adapter methods that accept arbitrary SQL ( |
I did. Do you happen to see places I missed?
Let me think about this |
georgesittas
left a comment
There was a problem hiding this comment.
Interesting PR, nice work.
d0adc72 to
c0d9022
Compare
c0d9022 to
032cd97
Compare
This reduces the number of INFORMATION_SCHEMA queries from 28 down to 6 on the fresh
sushiproject. I anticipate equally dramatic improvements in other scenarios.