Feat(dbt): Add support for selected resources context variable#5177
Feat(dbt): Add support for selected resources context variable#5177themisvaltinos merged 5 commits intomainfrom
Conversation
515dcd1 to
e181158
Compare
|
address comments let me know if this looks better @erindru |
sqlmesh/dbt/adapter.py
Outdated
|
|
||
| def _dbt_model_id(self, sqlmesh_model_name: str) -> str: | ||
| parts = [part.strip('"') for part in sqlmesh_model_name.split(".")] | ||
| return f"model.{parts[0]}.{parts[-1]}" |
There was a problem hiding this comment.
I had to check the dbt docs to understand why the hardcoded model. prefix is necessary, might be worth leaving a comment.
Also, sqlmesh_model_name is coming through as a quoted fqn, right? And this is on the RuntimeAdapter which has the snapshots available on self.snapshots?
Rather than string munging, is it possible to look up the matching snapshot and grab the original model name, something like:
f"model.{self.snapshots[sqlmesh_model_name].node.name}" ?
There was a problem hiding this comment.
yes didn't like the string munging either don't know if there was a cleaner way to do this. we don't keep snapshots in the RuntimeAdapter just use it for the table mapping but we could save it or use it during init similar to table mapping and keep a mapping of snapshot names to dbt ones. but the issue is this {self.snapshots[sqlmesh_model_name].node.name} wouldn't work in the way that it would return main.orders for example rather than jaffle_shop.orders for '"jaffle_shop"."main"."orders"'. I feel you encountered a similar issue with the selectors. the reason I want to have the dbt way and not sqlmesh btw is they might be used in macros as a key to the graph which would contain the dbt way in the nodes dictionary so model - package - model name rather than schema
There was a problem hiding this comment.
we don't keep snapshots in the RuntimeAdapter just use it for the table mapping
Ah, gotcha, this trips me up all the time. Yes you're right, it appears the snapshots currently limited to whatever is referenced by the model query, not all the snapshots available in the environment.
the reason I want to have the dbt way and not sqlmesh btw is they might be used in macros as a key to the graph
yes, absolutely, this needs to return the dbt model name and not the sqlmesh model name. The thing is, I don't know if we can accurately determine the dbt model name based on munging the SQLMesh fqn.
Although I guess all the dbt models do live in a global namespace and schemas are an implementation detail, so maybe this isn't actually the problem I think it is. In the absence of a clear better solution, let's run with this and see how far it gets us 👍
There was a problem hiding this comment.
The thing is, I don't know if we can accurately determine the dbt model name based on munging the SQLMesh fqn. Although I guess all the dbt models do live in a global namespace and schemas are an implementation detail, so maybe this isn't actually the problem I think it is.
actually it is the problem you think it is as I run more examples this solution falls short for example in Postgres where the package might not be the db name.. I'm looking for a better way to map the sqlmesh model names to the dbt ones
There was a problem hiding this comment.
@erindru so I tried different ways but all got quite complicated and without having perfect robustness. so settled in getting it for the snapshot itself (which does simplify things a bunch as all these helper methods are removed) as you originally suggested:
Rather than string munging, is it possible to look up the matching snapshot and grab the original model name
but since we don't have the dbt name in there, had to add it in the node. I remember you also needed this dbt model name somewhere for a mapping, unless I'm mistaken, so if you can take a final look to see how this looks
There was a problem hiding this comment.
Nice work, I didnt realise we already had a property BaseModelConfig.node_name to produce this value.
Yes, adding it to the main _Node abstraction is probably the way to go here because it populates it early on and makes it accessible from everywhere.
I think this will provide a nice way to leverage this value in the selector engine too
|
Yep this looks much better, just a couple of small things. Also I guess the /* {{ selected_resources }} */
select 1;And it returns the following on /*
[
'test.jaffle_shop.unique_stg_orders_order_id.e3b841c71a',
'model.jaffle_shop.stg_orders',
'test.jaffle_shop.unique_orders_order_id.fed79b3a6e',
'test.jaffle_shop.unique_stg_customers_customer_id.c7614daada',
'test.jaffle_shop.not_null_orders_credit_card_amount.d3ca593b59',
'seed.jaffle_shop.raw_payments',
'test.jaffle_shop.not_null_orders_order_id.cf6c17daed',
'test.jaffle_shop.unique_stg_payments_payment_id.3744510712',
'test.jaffle_shop.not_null_stg_payments_payment_id.c19cc50075',
'test.jaffle_shop.not_null_stg_customers_customer_id.e2cfb1f9aa',
'seed.jaffle_shop.raw_customers',
'model.jaffle_shop.stg_payments',
'test.jaffle_shop.not_null_orders_coupon_amount.ab90c90625',
'test.jaffle_shop.not_null_orders_amount.106140f9fd',
'test.jaffle_shop.accepted_values_orders_status__placed__shipped__completed__return_pending__returned.be6b5b5ec3',
'test.jaffle_shop.accepted_values_stg_orders_status__placed__shipped__completed__return_pending__returned.080fb20aad',
'test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64',
'model.jaffle_shop.customers',
'model.jaffle_shop.orders',
'test.jaffle_shop.not_null_orders_gift_card_amount.413a0d2d7a',
'test.jaffle_shop.accepted_values_stg_payments_payment_method__credit_card__coupon__bank_transfer__gift_card.3c3820f278',
'test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1',
'test.jaffle_shop.not_null_orders_bank_transfer_amount.7743500c49',
'test.jaffle_shop.relationships_orders_customer_id__customer_id__ref_customers_.c6ec7f58f2',
'seed.jaffle_shop.raw_orders',
'model.jaffle_shop.erin',
'test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d',
'test.jaffle_shop.not_null_orders_customer_id.c5f02694af',
'model.jaffle_shop.stg_customers'
] */ |
yes good point so mainly the goal of this pr was to match the So dbt if im not mistaken never runs tests as part of the run but as part of dbt test. Also for dbt seed as well, while for sqlmesh seed models run as part of the sqlmesh plan and run commands and not sqlmesh_dbt run as well. So since they are models they would appear in the selected resources during the run command but as models rather than seeds: This I feel provides information to the user that these have been selected to be run. and then dbt run: So it is a bit tough to map this in exactly the same 1-1 way with dbt. I guess I could map these instead of model as seed to keep the same keys though that would be a bit inconsistent with sqlmesh. Now regarding dbt tests, these are our audits which do run as part of the sqlmesh plan, my thinking was for now not include them to match more the dbt output when they use selectors and see the models. But in hindsight and to your point I wonder if it makes sense to include and have in the list of models and tests (mapped back from our audits) that have been run. Does that feel more intuitive to you? |
9f62e95 to
2ea1315
Compare
erindru
left a comment
There was a problem hiding this comment.
So dbt if im not mistaken never runs tests as part of the run but as part of dbt test
Also for dbt seed as well
Oh, sorry, of course 🤦 . It didnt occur to me that dbt list would have everything but dbt run, dbt test and dbt seed would just have their own relevant subsets, but it's very obvious when you mention it.
Ok, this LGTM then :)
2ea1315 to
e525615
Compare
sqlmesh/core/node.py
Outdated
| interval_unit_: t.Optional[IntervalUnit] = Field(alias="interval_unit", default=None) | ||
| tags: t.List[str] = [] | ||
| stamp: t.Optional[str] = None | ||
| node_name: t.Optional[str] = None # dbt node name |
There was a problem hiding this comment.
Should we just call this dbt_node_name instead? So it's clear to all internal code that this relates to dbt, is only populated in dbt mode and should not be confused with Snapshot.node.name.
Or even just dbt_name next to name since this is already on _Node
There was a problem hiding this comment.
good point this will be much clearer revised it to dbt_name
There was a problem hiding this comment.
LGTM, feel free to merge when ready
e525615 to
7089f72
Compare
7089f72 to
d1b2940
Compare
This aims to add support for selected resources in dbt projects. When accessed in jinja this would return they keys which correspond to the dbt nodes objects (example format:
model.project_name.model_name), but that have been selected for the current plan or run using the sqlmesh selectors.