feat!: linting rule no unregistered external models#5009
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new linting rule "nounregisteredexternalmodels" that validates all external model dependencies are properly registered in the external_models.yaml file. The rule helps ensure data lineage integrity by catching references to unregistered external models.
- Implements NoUnregisteredExternalModels rule class that checks model dependencies against registered external models
- Adds comprehensive test coverage that validates the rule detects unregistered external model references
- Updates the sushi example configuration to include the new linting rule
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sqlmesh/core/linter/rules/builtin.py | Implements the NoUnregisteredExternalModels rule class with dependency validation logic |
| tests/core/linter/test_builtin.py | Adds test case that removes external_models.yaml and verifies the rule detects violations |
| examples/sushi/config.py | Adds the new rule to the default linter configuration |
b3b6e23 to
4f37225
Compare
georgesittas
left a comment
There was a problem hiding this comment.
LGTM.
FYI, this is a breaking change, because builtin rules are automatically applied when someone's activated ALL rules.
I'd discussed with @izeigerman about having an "include" set to make even builtin rules opt-in w.r.t. ALL, so that we can be more selective of what's used by default. That way, we also reduce the friction for accepting community-contributed rules, such as this one: #4534.
4f37225 to
6603eaa
Compare
| enables the linter, and verifies that the linter raises a violation for a model | ||
| that depends on unregistered external models. | ||
| """ | ||
| sushi_paths = copy_to_temp_path("examples/sushi") |
There was a problem hiding this comment.
Do we even need the sushi project here? I think we could get the same violation but simply creating an empty project with a model such as SELECT * FROM external_model.
This way we don't have to:
- Delete the
external_models.yamlfile, it doesn't exist in the first place - We don't have to copy the whole sushi dir
- We don't have to string replace the config, we can create one from scratch
There was a problem hiding this comment.
I am curious to understand the objection to this test.
I don't particularly understand the rationale of the things you mention we avoid. While they are slightly inconvenient, the test is 50 lines, but because it's testing a fully fledged project, it gives me a lot of confidence. The only argument I can get is slow tests, but if we're optimising for that, we should be using other tools.
Another way, I kind of read this is maybe about a philosophical difference in testing: The test you describe is the most isolated, unit-y test you can build, it's as minimal a test as possible to prove the thing works whereas mine feels like I start with the integration tests and test the function way more thoroughly because I exercise through a lot more scenarios. A minimal unit test wouldn't reveal, for example, any false positive, but running through sushi, which exercises a lot of variations, just might.
Someone once described this testing to me as "happy path" testing. You build a robust happy e2e test (e.g. our sushi project) and then you exercise paths around it by altering the inputs.
There was a problem hiding this comment.
It's not an outright objection but it also doesn't feel like the right tool for this. My reasoning being:
- There's a lot more noise compared to a test that'd simply use a dummy model and a
Configobject. This makes understanding & updating the test harder and some parts of it may be considered "hardcoded" such as the YAML overwrite - There's obviously some performance overhead, and although we are not optimising for that per se, it shouldn't be neglected
- Imo you don't "win" anything in terms of integration testing by using
sushiinstead of a simpler project for this case
It makes sense to use the pre-existing projects such as sushi or multi when it comes to testing the entirety of the linter end to end, but otherwise this feels like it uses more real estate than is required for such a test case
fdfc1bd to
748d9f9
Compare
748d9f9 to
c0eb91c
Compare
|
The rule looks good, but I agree with @georgesittas. Can we introduce a list of rules that are enabled by default to avoid making this a breaking change. |
|
Getting this in to include with v0.203.0, can revisit the test if needed. |
No description provided.