Skip to content

Fix!: Avoid using rendered query when computing the data hash#5256

Merged
izeigerman merged 8 commits intomainfrom
chore-use-raw-query-in-fingerprint
Sep 2, 2025
Merged

Fix!: Avoid using rendered query when computing the data hash#5256
izeigerman merged 8 commits intomainfrom
chore-use-raw-query-in-fingerprint

Conversation

@izeigerman
Copy link
Collaborator

@izeigerman izeigerman commented Aug 28, 2025

This update eliminates the need for a rendered query when computing the model’s data hash. This means that the categorizer can now decide that the snapshot represents a metadata change even though the data hash has changed.

Doing this has the following benefits:

  • Faster fingerprint calculation
  • Eliminates the dependency on SQLGlot optimizer when calculating fingerprints which brings us one step closer to eliminating snapshot record migration as a result of the product upgrade

@izeigerman izeigerman requested a review from a team August 28, 2025 22:14
def _data_hash_values(self) -> t.List[str]:
return [
*self._data_hash_values_no_query,
gen(self.query, comments=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you get rid of this one to and just hash the raw sql?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hash raw sql, doesnt that make it sensitive to whitespace changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's why we now also check is_metadata_only_change when data hashes don't match

@izeigerman izeigerman force-pushed the chore-use-raw-query-in-fingerprint branch 3 times, most recently from 0ad4b2b to e4f3f1e Compare August 29, 2025 21:11

@property
def query(self) -> t.Union[exp.Query, d.JinjaQuery]:
return t.cast(t.Union[exp.Query, d.JinjaQuery], self.query_.parse(self.dialect))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this called multiple times? should we cache this somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, you use a class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's cached inside ParsableQuery

_parsed: t.Optional[exp.Expression] = None
_parsed_dialect: t.Optional[str] = None

def parse(self, dialect: str) -> exp.Expression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the dialect ever change for a model outside of a test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't, why? I'd rather implement this correctly in case circumstances change.

cls, parsed_expression: exp.Expression, dialect: str, use_meta_sql: bool = False
) -> ParsableSql:
sql = (
parsed_expression.meta.get("sql") or parsed_expression.sql(dialect=dialect)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the situation where we wouldn't want to use meta sql?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I'm using a custom loader and do create_sql_model directly with a query. I don't think we can trust the correctness of the meta sql in that case.

@izeigerman izeigerman force-pushed the chore-use-raw-query-in-fingerprint branch from 0602afa to 26226d9 Compare September 2, 2025 16:18
@izeigerman izeigerman merged commit 75f825e into main Sep 2, 2025
45 of 46 checks passed
@izeigerman izeigerman deleted the chore-use-raw-query-in-fingerprint branch September 2, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants