Conversation
| return None | ||
|
|
||
| def drop_table(self, table_name: TableName, exists: bool = True) -> None: | ||
| def drop_table(self, table_name: TableName, exists: bool = True, **kwargs: t.Any) -> None: |
There was a problem hiding this comment.
Since drop_view() already had **kwargs I figured it was ok to add it to drop_table() as well
| def delete(self, name: str, **kwargs: t.Any) -> None: | ||
| _check_table_db_is_physical_schema(name, kwargs["physical_schema"]) | ||
| self.adapter.drop_table(name) | ||
| self.adapter.drop_table(name, cascade=kwargs.pop("cascade", False)) |
There was a problem hiding this comment.
I deliberately just pop off the cascade argument because delete() is called with a bunch of other arguments that arent relevant for drop_table() but end up making their way to the exp.Drop AST node in the EngineAdapter if they aren't filtered out here
0718546 to
23340ce
Compare
| # we need to set cascade=true or we will get a 'cant drop because other objects depend on it'-style | ||
| # error on engines that enforce referential integrity, such as Postgres | ||
| # this situation can happen when a snapshot expires but downstream view snapshots that reference it have not yet expired | ||
| cascade=True, |
There was a problem hiding this comment.
should we control this with a flag which is set if the engine supports cascade or not (maybe from the schema differ)? unless im doing something wrong I tried with a BigQuery project for example to run the janitor which stops when it tries to drop a table with the error Syntax error: Expected end of input but got keyword CASCADE at
relevant docs: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#drop_table_statement it seems cascade is supported for schema but not table
There was a problem hiding this comment.
should we control this with a flag which is set if the engine supports cascade or not (maybe from the schema differ)?
Imo, this should happen further downstream, e.g., in the adapter itself.
There was a problem hiding this comment.
Oh, nice catch, the downside of not running this test across all engines.
I naively thought SQLGlot would not generate the CASCADE output if it's unsupported (even if cascade=true on the AST node) but I guess the fact it doesn't is why its not considered a validator.
I'll improve the coverage and make sure this works on all engines
a90feca to
095fbdc
Compare
.circleci/continue_config.yml
Outdated
| branches: | ||
| only: | ||
| - main | ||
| #filters: |
There was a problem hiding this comment.
TODO: revert prior to merge
| EnvironmentSuffixTarget.CATALOG, | ||
| ], | ||
| ) | ||
| def test_janitor( |
There was a problem hiding this comment.
This test inits a project, creates a dev env, invalidates it, runs the janitor to clean it up and checks it was cleaned up
- Across every engine we support
- For every EnvironmentSuffixTarget we support
themisvaltinos
left a comment
There was a problem hiding this comment.
lgtm, nice the supported cascade list in the adapters is a much better solution to cover every operation than the flag i was suggesting for only this use case and thanks for adding the integration test!
georgesittas
left a comment
There was a problem hiding this comment.
Did a quick pass– looks reasonable.
| if cascade and kind.upper() in self.SUPPORTED_DROP_CASCADE_OBJECT_KINDS: | ||
| drop_args["cascade"] = cascade |
There was a problem hiding this comment.
Since we're expecting CASCADE semantics, should this warn if cascade is unsupported?
There was a problem hiding this comment.
I think we don't really "expect" the CASCADE semantics per se. We just want to delete our stuff without error. We're forced to do CASCADE because otherwise the engine won't let us delete.
There was a problem hiding this comment.
That's fair; I was thinking that a warning could simply surface the fact that some objects were left orphan because we couldn't cascade. Although, if we don't know which objects depend on the removed thing, then its value is probably questionable anyway.
There was a problem hiding this comment.
Yeah I did consider warning but then I noticed we silently do nothing elsewhere so decided to keep with that strategy
00b6249 to
de3ad7f
Compare
This reverts commit 23340ce.
3dc4b23 to
3fd21fc
Compare
Second attempt at #5098
Fixes #5083 #4767 #4353
The solution in #5098 unfortunately needed to read in all the view snapshot records in order to work out the dependency graph of what should be dropped, because the way our state database is currently structured means that this operation can't easily be pushed down to the database level.
Fetching a large amount of snapshots from state sync can cause memory problems in other state sync implementations that buffer instead of stream so this approach was considered a non-starter.
In addition, it turns out that having dangling pointers in state is not as bad as I originally thought it was due to "create if not exists"-style logic elsewhere.
So this PR does the following:
DROP CASCADEin the SnapshotEvaluator cleanup method. This leaves dangling pointers in state but that is considered acceptable