Fix: Include unexpired downstream views when cleaning up expired tables#5098
Fix: Include unexpired downstream views when cleaning up expired tables#5098
Conversation
| snapshots_by_version = defaultdict(set) | ||
| snapshots_by_dev_version = defaultdict(set) | ||
| for s in snapshots: | ||
| for s in sv_snapshots: |
There was a problem hiding this comment.
These variable renames were just to placate mypy, because snapshot is used above to refer to objects of type Snapshot which does not match the objects of type SharedVersionSnapshot used here
d06a9ef to
65b4c2e
Compare
eeba4dd to
854f3ec
Compare
| if not ignore_ttl: | ||
| expired_query = expired_query.where( | ||
| (exp.column("updated_ts") + exp.column("ttl_ms")) <= current_ts | ||
| ((exp.column("updated_ts") + exp.column("ttl_ms")) <= current_ts) |
There was a problem hiding this comment.
I’m wondering if it might make sense to break this into two steps, first querying only with the initial condition
(exp.column("updated_ts") + exp.column("ttl_ms")) <= current_ts
then only if that query returns results, we proceed with a second query for the views. the downside is that it adds an extra query in cases where there are expired tables.but I’m thinking about the scenario where there are no expired tables, where currently we still end up querying snapshots via full_candidates = self.get_snapshots(candidates.keys()) for all views and going through the full logic, even though there won’t be any results in the end. that said im not sure since both have tradeoffs
There was a problem hiding this comment.
Yeah I understand the concern. I've broken it into the following steps (which only apply if ignore_ttl is not set):
- Query a count of expired snapshots
- If that count is >0, add in the OR clause to the query to fetch all view snapshots as well
- If that count is 0, use it to short-circuit and return
| adapter.create_table(long_table_name, {"col": exp.DataType.build("int")}) | ||
|
|
||
|
|
||
| def test_janitor_drops_downstream_unexpired_hard_dependencies( |
There was a problem hiding this comment.
could you also add a test for a transitive dependency? For example:
Table A is expired ← View B (not expired) ← View C (not expired)
I believe this case was handled with the reversed dag before and that View C got picked up, but I’m not entirely certain without the dag construction now, so it’d be good to have a test to confirm
There was a problem hiding this comment.
Yeah you're right, the drops weren't cascading through transitive dependencies. I've re implemented this to use a DAG again which simplified the implementation because it can be traversed in topological order and for any given node we can:
- check if its expired (gets put directly in the expired list)
- if it's a view, check if any of its parents are in the expired list. if they are, it should be expired too
I also updated the tests to test transitive dependencies too
78cc597 to
532d1c5
Compare
themisvaltinos
left a comment
There was a problem hiding this comment.
looks good to me, thanks for addressing comments and adding the extra tests
| if snapshot.expiration_ts <= current_ts: | ||
| # All expired snapshots should be included regardless | ||
| expired_candidates[snapshot.snapshot_id] = snapshot.name_version | ||
| elif snapshot.model_kind_name == ModelKindName.VIEW: |
There was a problem hiding this comment.
Can this be filtered by a database rather than the application layer?
There was a problem hiding this comment.
How do you mean? Line 242 adjusts the query to the database to include the views:
if expired_record_count > 0:
expired_query = t.cast(
exp.Select, expired_query.or_(exp.column("kind_name").eq(ModelKindName.VIEW))
)
The DAG can only be built in the application layer, right? Since we have no way of knowing at the database level what views are affected, don't we have to select all views and filter them at the application layer?
We can't filter to expired views in the DB query because that's what caused this problem to begin with.
|
After internal discussion, closing in favour of #5133 |
Fixes #5083 #4767
Prior to this, it was possible for the janitor to drop a snapshot table while another snapshot view still depended on it.
The reason is not related to DAG order - it turns out the janitor always drops in DAG order.
Rather, the reason was that it's possible to get into a situation where a snapshot table has expired (based on
updated_ts + ttl_msbeing earlier than the current time) but a downstream view that depends on it had not expired, because it was created later.A naive implementation might simply just call
DROP... CASCADEto automatically drop downstream objects but the problem with this is that while the physical objects are gone there will still be references in SQLMesh state.So this PR updates the code that detects expired objects to also consider unexpired views downstream of expired tables.
Open question: Is it ever valid to drop an expired upstream dependency and not cascade that to all downstream dependencies? Should we be dropping all downstream unexpired tables too?