Fix!: Avoid using rendered query when computing the data hash#5256
Fix!: Avoid using rendered query when computing the data hash#5256izeigerman merged 8 commits intomainfrom
Conversation
sqlmesh/core/model/definition.py
Outdated
| def _data_hash_values(self) -> t.List[str]: | ||
| return [ | ||
| *self._data_hash_values_no_query, | ||
| gen(self.query, comments=False), |
There was a problem hiding this comment.
can you get rid of this one to and just hash the raw sql?
There was a problem hiding this comment.
If we hash raw sql, doesnt that make it sensitive to whitespace changes?
There was a problem hiding this comment.
yes, that's why we now also check is_metadata_only_change when data hashes don't match
0ad4b2b to
e4f3f1e
Compare
|
|
||
| @property | ||
| def query(self) -> t.Union[exp.Query, d.JinjaQuery]: | ||
| return t.cast(t.Union[exp.Query, d.JinjaQuery], self.query_.parse(self.dialect)) |
There was a problem hiding this comment.
is this called multiple times? should we cache this somehow?
There was a problem hiding this comment.
yes, it's cached inside ParsableQuery
| _parsed: t.Optional[exp.Expression] = None | ||
| _parsed_dialect: t.Optional[str] = None | ||
|
|
||
| def parse(self, dialect: str) -> exp.Expression: |
There was a problem hiding this comment.
does the dialect ever change for a model outside of a test?
There was a problem hiding this comment.
It shouldn't, why? I'd rather implement this correctly in case circumstances change.
| cls, parsed_expression: exp.Expression, dialect: str, use_meta_sql: bool = False | ||
| ) -> ParsableSql: | ||
| sql = ( | ||
| parsed_expression.meta.get("sql") or parsed_expression.sql(dialect=dialect) |
There was a problem hiding this comment.
what are the situation where we wouldn't want to use meta sql?
There was a problem hiding this comment.
when I'm using a custom loader and do create_sql_model directly with a query. I don't think we can trust the correctness of the meta sql in that case.
0602afa to
26226d9
Compare
This update eliminates the need for a rendered query when computing the model’s data hash. This means that the categorizer can now decide that the snapshot represents a metadata change even though the data hash has changed.
Doing this has the following benefits: