Skip to content

Fix!: don't consume connected tokens when parsing quoted identifiers#5357

Merged
georgesittas merged 2 commits intomainfrom
jo/fix_identifier_parsing
Sep 12, 2025
Merged

Fix!: don't consume connected tokens when parsing quoted identifiers#5357
georgesittas merged 2 commits intomainfrom
jo/fix_identifier_parsing

Conversation

@georgesittas
Copy link
Contributor

@georgesittas georgesittas commented Sep 11, 2025

This PR addresses two bugs:

  1. In SQLMesh's override of _parse_id_var, once we parse an identifier we check whether the next token in the stream is "connected" to the identifier, i.e., there's no whitespace between the two tokens. This is done in order to support our custom macro syntax (example). Unfortunately, the condition was too lax, resulting in consuming both tokens in cases like "foo"bar, when we needed to consume only one.
  2. There is a subtle bug, where _schema_tmp can be appended to a quoted identifier like "a"."b"."c", resulting in "a"."b"."c"_schema_tmp. Due to the parser being lax, we were somehow able to parse this.. we've succeeded by failing in this case :)

@georgesittas georgesittas requested review from a team, izeigerman and treysp September 11, 2025 19:14
@georgesittas georgesittas force-pushed the jo/fix_identifier_parsing branch from 77ad2e0 to 84238cf Compare September 11, 2025 19:15
@georgesittas georgesittas changed the title Fix: don't consume connected tokens when parsing quoted identifiers Fix!: don't consume connected tokens when parsing quoted identifiers Sep 11, 2025
>>> sql = "SELECT date_day, @PIVOT(status, ['cancelled', 'completed']) FROM rides GROUP BY 1"
>>> MacroEvaluator().transform(parse_one(sql)).sql()
'SELECT date_day, SUM(CASE WHEN status = \\'cancelled\\' THEN 1 ELSE 0 END) AS "\\'cancelled\\'", SUM(CASE WHEN status = \\'completed\\' THEN 1 ELSE 0 END) AS "\\'completed\\'" FROM rides GROUP BY 1'
'SELECT date_day, SUM(CASE WHEN status = \\'cancelled\\' THEN 1 ELSE 0 END) AS "cancelled", SUM(CASE WHEN status = \\'completed\\' THEN 1 ELSE 0 END) AS "completed" FROM rides GROUP BY 1'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what led to this decision, i.e. including the SQL-representation of the value as-is in the alias.

"test_valid_to",
TRUE AS "_exists"
FROM ""__temp_target_efgh""
FROM "__temp_target_efgh"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was incorrect, the quotes here are off.

tmp_path,
pathlib.Path(audits_dir, "audit_1.sql"),
"AUDIT(name assert_positive_id, dialect 'duckdb'); SELECT * FROM @this_model WHERE \"CaseSensitive\"_item_id < 0;",
"AUDIT(name assert_positive_id, dialect 'duckdb'); SELECT * FROM @this_model WHERE \"CaseSensitive_item_id\" < 0;",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was also incorrect, it's like trying to do SELECT * FROM t WHERE "foo"_bar < 0.

Copy link
Collaborator

@erindru erindru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this PR highlights quite some jankiness!

@georgesittas georgesittas merged commit 0e9c8a0 into main Sep 12, 2025
36 checks passed
@georgesittas georgesittas deleted the jo/fix_identifier_parsing branch September 12, 2025 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants