Skip to content

planner: fix ONLY_FULL_GROUP_BY check for NATURAL JOIN columns#66868

Open
fixdb wants to merge 1 commit intopingcap:masterfrom
fixdb:issue65387
Open

planner: fix ONLY_FULL_GROUP_BY check for NATURAL JOIN columns#66868
fixdb wants to merge 1 commit intopingcap:masterfrom
fixdb:issue65387

Conversation

@fixdb
Copy link
Contributor

@fixdb fixdb commented Mar 10, 2026

What problem does this PR solve?

Issue Number: close #65387

Problem Summary:

What changed and how does it work?

When using NATURAL JOIN or JOIN ... USING, the output schema is coalesced and may not contain all original column references. The ONLY_FULL_GROUP_BY validation was only checking OutputNames() which doesn't include the coalesced-away columns from FullSchema.

This caused queries like:
SELECT t0.c0 FROM t0 NATURAL RIGHT JOIN t1 GROUP BY NULL

to bypass the ONLY_FULL_GROUP_BY check, allowing non-deterministic results when firstrow() picked different values from parallel hash aggregation workers.

The fix checks FullNames in addition to OutputNames() when the plan is a LogicalJoin with a FullSchema, ensuring columns from NATURAL JOINs are properly validated.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

planner: fix wrong result for ONLY_FULL_GROUP_BY check for NATURAL JOIN columns.

Summary by CodeRabbit

  • Tests

    • Added comprehensive test coverage for NATURAL JOIN queries with ONLY_FULL_GROUP_BY mode, including validation of error handling and aggregation behavior.
  • Bug Fixes

    • Improved column name resolution for NATURAL JOIN and USING clause scenarios to properly handle complex join schemas.

When using NATURAL JOIN or JOIN ... USING, the output schema is
coalesced and may not contain all original column references. The
ONLY_FULL_GROUP_BY validation was only checking OutputNames() which
doesn't include the coalesced-away columns from FullSchema.

This caused queries like:
  SELECT t0.c0 FROM t0 NATURAL RIGHT JOIN t1 GROUP BY NULL

to bypass the ONLY_FULL_GROUP_BY check, allowing non-deterministic
results when firstrow() picked different values from parallel hash
aggregation workers.

The fix checks FullNames in addition to OutputNames() when the plan
is a LogicalJoin with a FullSchema, ensuring columns from NATURAL
JOINs are properly validated.
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-tests-checked labels Mar 10, 2026
@pantheon-ai
Copy link

pantheon-ai bot commented Mar 10, 2026

Failed to Create Project

We encountered an error while creating your project. Please try again or contact support if the issue persists.

@ti-chi-bot ti-chi-bot bot added sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 10, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 10, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign qw4990 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiprow
Copy link

tiprow bot commented Mar 10, 2026

Hi @fixdb. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hawkingrei
Copy link
Member

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Mar 10, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

This change fixes undeterministic query results for NATURAL JOIN operations under ONLY_FULL_GROUP_BY constraints by adding fallback column name resolution logic and corresponding test coverage for edge cases involving coalesced join columns.

Changes

Cohort / File(s) Summary
Test Coverage for NATURAL JOIN Edge Cases
pkg/planner/core/issuetest/planner_issue_test.go
Added test blocks verifying ONLY_FULL_GROUP_BY behavior with NATURAL JOIN, including error cases for coalesced column selection, aggregated query success paths, and normal join operations.
Column Resolution Fallback for NATURAL/USING Joins
pkg/planner/core/logical_plan_builder.go
Implemented fallback resolution in colNameResolver.Leave to check join.FullNames when columns are not found in current output names for coalesced NATURAL JOIN / USING join scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • PR #66273: Implements redundant-column remapping/resolution for USING/NATURAL joins with overlapping column-resolution behavior modifications for join-based name handling.

Suggested labels

lgtm, AI-Testing

Suggested reviewers

  • winoros
  • AilinKid
  • guo-shaoge

Poem

🐰 A natural join that danced with flair,
With columns coalesced here and there,
Now resolution finds its place,
No more undeterministic grace—
Tests and logic hand in hand,
Make queries stable, safe, and grand! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing ONLY_FULL_GROUP_BY validation for NATURAL JOIN columns.
Description check ✅ Passed The description includes issue reference, clear problem summary, explanation of the fix, test coverage checkbox marked, and release note.
Linked Issues check ✅ Passed Code changes [logical_plan_builder.go] implement the fix by checking FullNames for NATURAL JOIN columns, and test changes [planner_issue_test.go] verify ONLY_FULL_GROUP_BY behavior with NATURAL JOIN, addressing issue #65387.
Out of Scope Changes check ✅ Passed All changes are focused on fixing ONLY_FULL_GROUP_BY validation for NATURAL JOIN columns; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 10, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-tests-checked label, please finished the tests then check the finished items in description.

For example:

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

📖 For more info, you can check the "Contribute Code" section in the development guide.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/issuetest/planner_issue_test.go`:
- Around line 149-171: Add a sibling regression that mirrors the NATURAL JOIN
cases but uses JOIN ... USING(c0): inside the same test block (the one that
calls prepareSharedTestKit and creates t0/t1), add queries that assert the USING
path behaves the same — e.g. use tk.MustContainErrMsg for "select c0 from t0
right join t1 using(c0) group by null" and for "select t0.c0 from t0 right join
t1 using(c0) group by null" with the same ONLY_FULL_GROUP_BY error messages,
confirm aggregation works with "select max(t0.c0) from t0 right join t1
using(c0) group by null" and confirm the normal non-GROUP BY query "select
t0.c0, t1.c0 from t0 right join t1 using(c0) order by t1.c0" returns the
expected rows; place these next to the existing NATURAL JOIN assertions so the
coalesced-column path for USING is covered.

In `@pkg/planner/core/logical_plan_builder.go`:
- Around line 3500-3506: addGbyOrSingleValueColName() currently only checks
p.OutputNames() to mark grouped/single-value columns, so when the parent is a
logicalop.LogicalJoin with coalesced OutputNames it misses columns present in
join.FullNames; update addGbyOrSingleValueColName() to mirror the fallback used
elsewhere: if c.p is a *logicalop.LogicalJoin and join.FullNames != nil, call
expression.FindFieldName(join.FullNames, v.Name) (same way as in the referenced
block) and, when found, record the corresponding join.FullNames[index] into the
grouped-name set (the same map where OutputNames results are stored) so GROUP BY
columns that only appear in FullNames are recognized as grouped.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 694f01ee-34b9-4147-9073-a0473cc3a630

📥 Commits

Reviewing files that changed from the base of the PR and between 88db262 and 4d660a2.

📒 Files selected for processing (2)
  • pkg/planner/core/issuetest/planner_issue_test.go
  • pkg/planner/core/logical_plan_builder.go

Comment on lines +149 to +171
// Test ONLY_FULL_GROUP_BY with NATURAL JOIN.
// For NATURAL JOIN, the columns are coalesced in the output schema,
// but we should still detect non-aggregated columns from the FullSchema.
// only-full-group-by-natural-join
{
tk := prepareSharedTestKit(t)
tk.MustExec("create table t0(c0 int)")
tk.MustExec("create table t1(c0 int)")
tk.MustExec("insert into t0 values (1), (2)")
tk.MustExec("insert into t1 values (1), (3)")
tk.MustExec("set @@sql_mode = default")
tk.MustQuery("select @@sql_mode REGEXP 'ONLY_FULL_GROUP_BY'").Check(testkit.Rows("1"))
// Select the coalesced column should error
tk.MustContainErrMsg("select c0 from t0 natural right join t1 group by null",
"[planner:1055]Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.t1.c0' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by")
// Select the explicit t0.c0 column should also error (was a bug before fix)
tk.MustContainErrMsg("select t0.c0 from t0 natural right join t1 group by null",
"[planner:1055]Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.t0.c0' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by")
// Using aggregation should work
tk.MustQuery("select max(t0.c0) from t0 natural right join t1 group by null").Check(testkit.Rows("1"))
// Normal query without GROUP BY should work
tk.MustQuery("select t0.c0, t1.c0 from t0 natural right join t1 order by t1.c0").Check(testkit.Rows("1 1", "<nil> 3"))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a JOIN ... USING regression alongside this NATURAL JOIN case.

The planner fix and PR scope cover both NATURAL JOIN and JOIN ... USING, but this block only protects the NATURAL JOIN syntax. A small sibling case like ... JOIN t1 USING(c0) GROUP BY NULL would keep the shared coalesced-column path covered too.

As per coding guidelines, "**/*_test.go: MUST add a regression test and verify it fails before fix and passes after fix for bug fixes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/issuetest/planner_issue_test.go` around lines 149 - 171, Add
a sibling regression that mirrors the NATURAL JOIN cases but uses JOIN ...
USING(c0): inside the same test block (the one that calls prepareSharedTestKit
and creates t0/t1), add queries that assert the USING path behaves the same —
e.g. use tk.MustContainErrMsg for "select c0 from t0 right join t1 using(c0)
group by null" and for "select t0.c0 from t0 right join t1 using(c0) group by
null" with the same ONLY_FULL_GROUP_BY error messages, confirm aggregation works
with "select max(t0.c0) from t0 right join t1 using(c0) group by null" and
confirm the normal non-GROUP BY query "select t0.c0, t1.c0 from t0 right join t1
using(c0) order by t1.c0" returns the expected rows; place these next to the
existing NATURAL JOIN assertions so the coalesced-column path for USING is
covered.

Comment on lines +3500 to +3506
} else if join, isJoin := c.p.(*logicalop.LogicalJoin); isJoin && join.FullSchema != nil {
// For NATURAL JOIN or JOIN ... USING, the output schema is coalesced and
// may not contain all original columns. We need to check FullNames as well.
idx, err = expression.FindFieldName(join.FullNames, v.Name)
if err == nil && idx >= 0 {
c.names[join.FullNames[idx]] = struct{}{}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mirror this FullNames fallback when recording grouped columns too.

This only fixes the “referenced column” side of the old ONLY_FULL_GROUP_BY checker. addGbyOrSingleValueColName() still resolves GROUP BY / single-value columns through p.OutputNames() only, so valid queries like SELECT t0.c0 FROM t0 NATURAL RIGHT JOIN t1 GROUP BY t0.c0 will now be rejected: t0.c0 is found here, but never recognized as grouped.

Suggested follow-up
func addGbyOrSingleValueColName(p base.LogicalPlan, colName *ast.ColumnName, gbyOrSingleValueColNames map[*types.FieldName]struct{}) {
	idx, err := expression.FindFieldName(p.OutputNames(), colName)
-	if err != nil || idx < 0 {
-		return
-	}
-	gbyOrSingleValueColNames[p.OutputNames()[idx]] = struct{}{}
+	if err == nil && idx >= 0 {
+		gbyOrSingleValueColNames[p.OutputNames()[idx]] = struct{}{}
+		return
+	}
+	if join, ok := p.(*logicalop.LogicalJoin); ok && join.FullSchema != nil {
+		idx, err = expression.FindFieldName(join.FullNames, colName)
+		if err == nil && idx >= 0 {
+			gbyOrSingleValueColNames[join.FullNames[idx]] = struct{}{}
+		}
+	}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/logical_plan_builder.go` around lines 3500 - 3506,
addGbyOrSingleValueColName() currently only checks p.OutputNames() to mark
grouped/single-value columns, so when the parent is a logicalop.LogicalJoin with
coalesced OutputNames it misses columns present in join.FullNames; update
addGbyOrSingleValueColName() to mirror the fallback used elsewhere: if c.p is a
*logicalop.LogicalJoin and join.FullNames != nil, call
expression.FindFieldName(join.FullNames, v.Name) (same way as in the referenced
block) and, when found, record the corresponding join.FullNames[index] into the
grouped-name set (the same map where OutputNames results are stored) so GROUP BY
columns that only appear in FullNames are recognized as grouped.

@tiprow
Copy link

tiprow bot commented Mar 10, 2026

@fixdb: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow 4d660a2 link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 10, 2026

@fixdb: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev_2 4d660a2 link true /test check-dev2
idc-jenkins-ci-tidb/mysql-test 4d660a2 link true /test mysql-test
idc-jenkins-ci-tidb/check_dev 4d660a2 link true /test check-dev
pull-integration-e2e-test 4d660a2 link true /test pull-integration-e2e-test
idc-jenkins-ci-tidb/unit-test 4d660a2 link true /test unit-test
pull-unit-test-next-gen 4d660a2 link true /test pull-unit-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-tests-checked ok-to-test Indicates a PR is ready to be tested. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Undeterministic result of deterministic query

2 participants