Skip to content

Fix: Use drop cascade in janitor#5133

Merged
erindru merged 10 commits intomainfrom
erin/janitor-drop-cascade
Aug 20, 2025
Merged

Fix: Use drop cascade in janitor#5133
erindru merged 10 commits intomainfrom
erin/janitor-drop-cascade

Conversation

@erindru
Copy link
Collaborator

@erindru erindru commented Aug 12, 2025

Second attempt at #5098

Fixes #5083 #4767 #4353

The solution in #5098 unfortunately needed to read in all the view snapshot records in order to work out the dependency graph of what should be dropped, because the way our state database is currently structured means that this operation can't easily be pushed down to the database level.

Fetching a large amount of snapshots from state sync can cause memory problems in other state sync implementations that buffer instead of stream so this approach was considered a non-starter.

In addition, it turns out that having dangling pointers in state is not as bad as I originally thought it was due to "create if not exists"-style logic elsewhere.

So this PR does the following:

  • Implement DROP CASCADE in the SnapshotEvaluator cleanup method. This leaves dangling pointers in state but that is considered acceptable
  • Check that it works by producing a scenario on Postgres that was previously failing

return None

def drop_table(self, table_name: TableName, exists: bool = True) -> None:
def drop_table(self, table_name: TableName, exists: bool = True, **kwargs: t.Any) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since drop_view() already had **kwargs I figured it was ok to add it to drop_table() as well

def delete(self, name: str, **kwargs: t.Any) -> None:
_check_table_db_is_physical_schema(name, kwargs["physical_schema"])
self.adapter.drop_table(name)
self.adapter.drop_table(name, cascade=kwargs.pop("cascade", False))
Copy link
Collaborator Author

@erindru erindru Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately just pop off the cascade argument because delete() is called with a bunch of other arguments that arent relevant for drop_table() but end up making their way to the exp.Drop AST node in the EngineAdapter if they aren't filtered out here

@erindru erindru marked this pull request as ready for review August 12, 2025 02:14
@erindru erindru force-pushed the erin/janitor-drop-cascade branch 4 times, most recently from 0718546 to 23340ce Compare August 17, 2025 23:02
# we need to set cascade=true or we will get a 'cant drop because other objects depend on it'-style
# error on engines that enforce referential integrity, such as Postgres
# this situation can happen when a snapshot expires but downstream view snapshots that reference it have not yet expired
cascade=True,
Copy link
Contributor

@themisvaltinos themisvaltinos Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we control this with a flag which is set if the engine supports cascade or not (maybe from the schema differ)? unless im doing something wrong I tried with a BigQuery project for example to run the janitor which stops when it tries to drop a table with the error Syntax error: Expected end of input but got keyword CASCADE at

relevant docs: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#drop_table_statement it seems cascade is supported for schema but not table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we control this with a flag which is set if the engine supports cascade or not (maybe from the schema differ)?

Imo, this should happen further downstream, e.g., in the adapter itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch, the downside of not running this test across all engines.

I naively thought SQLGlot would not generate the CASCADE output if it's unsupported (even if cascade=true on the AST node) but I guess the fact it doesn't is why its not considered a validator.

I'll improve the coverage and make sure this works on all engines

@erindru erindru force-pushed the erin/janitor-drop-cascade branch from a90feca to 095fbdc Compare August 19, 2025 04:51
branches:
only:
- main
#filters:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: revert prior to merge

EnvironmentSuffixTarget.CATALOG,
],
)
def test_janitor(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test inits a project, creates a dev env, invalidates it, runs the janitor to clean it up and checks it was cleaned up

  • Across every engine we support
  • For every EnvironmentSuffixTarget we support

Copy link
Contributor

@themisvaltinos themisvaltinos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, nice the supported cascade list in the adapters is a much better solution to cover every operation than the flag i was suggesting for only this use case and thanks for adding the integration test!

Copy link
Contributor

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick pass– looks reasonable.

Comment on lines +1072 to +1080
if cascade and kind.upper() in self.SUPPORTED_DROP_CASCADE_OBJECT_KINDS:
drop_args["cascade"] = cascade
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're expecting CASCADE semantics, should this warn if cascade is unsupported?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't really "expect" the CASCADE semantics per se. We just want to delete our stuff without error. We're forced to do CASCADE because otherwise the engine won't let us delete.

Copy link
Contributor

@georgesittas georgesittas Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair; I was thinking that a warning could simply surface the fact that some objects were left orphan because we couldn't cascade. Although, if we don't know which objects depend on the removed thing, then its value is probably questionable anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I did consider warning but then I noticed we silently do nothing elsewhere so decided to keep with that strategy

@erindru erindru force-pushed the erin/janitor-drop-cascade branch from 00b6249 to de3ad7f Compare August 20, 2025 20:27
@erindru erindru force-pushed the erin/janitor-drop-cascade branch from 3dc4b23 to 3fd21fc Compare August 20, 2025 21:28
@erindru erindru merged commit f73cdfe into main Aug 20, 2025
27 of 30 checks passed
@erindru erindru deleted the erin/janitor-drop-cascade branch August 20, 2025 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected behaviour with FULL model kinds (DuckDB+Postgres)

4 participants