Skip to content

planner: fix correlated subquery resolution failure in window-containing expressions#66814

Closed
qw4990 wants to merge 2 commits intopingcap:masterfrom
qw4990:fix-57594
Closed

planner: fix correlated subquery resolution failure in window-containing expressions#66814
qw4990 wants to merge 2 commits intopingcap:masterfrom
qw4990:fix-57594

Conversation

@qw4990
Copy link
Contributor

@qw4990 qw4990 commented Mar 9, 2026

What problem does this PR solve?

Issue Number: close #57594

Problem Summary: planner: fix correlated subquery resolution failure in window-containing expressions

What changed and how does it work?

Root cause

buildProjection is invoked twice when window functions are present:

  1. First pass (considerWindow=false): fields containing window functions are entirely skipped with a zero placeholder — correlated subqueries inside them are not processed.
  2. buildWindowFunctions then builds a LogicalWindow on top, whose schema only retains the placeholder columns. The original table columns are lost.
  3. Second pass (considerWindow=true): the full expression (including the correlated subquery) is now processed, but the plan's schema no longer contains the original table columns, so correlated column resolution fails.

Solution

In the first buildProjection pass, when window-containing fields are deferred, also carry through the child plan's columns into the projection output. This preserves them through to LogicalWindow so they remain visible for correlated subquery resolution in the second pass. Column pruning removes any that are ultimately unused.

What changed

  • pkg/planner/core/logical_plan_builder.go: After the field loop in buildProjection, when !considerWindow and there are window fields, append the child plan's columns that aren't already in the projection schema.
  • pkg/planner/core/casetest/correlated/correlated_test.go: Added TestCorrelatedExistsWithWindowFunction regression test covering empty table, data correctness, and the original reproduction case.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Bug Fixes

    • Correlated subqueries inside window functions now correctly resolve outer-query columns.
  • Tests

    • Added a comprehensive test validating correlated EXISTS behavior with window functions across multiple scenarios.
  • Chores

    • Increased test sharding to run correlated tests with higher parallelism.

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Mar 9, 2026
@pantheon-ai
Copy link

pantheon-ai bot commented Mar 9, 2026

Review Complete

Findings: 1 issues
Posted: 1
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot bot added sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 9, 2026
@tiprow
Copy link

tiprow bot commented Mar 9, 2026

Hi @qw4990. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ba141015-9f2b-4b81-9787-9468f91a181d

📥 Commits

Reviewing files that changed from the base of the PR and between fc6fbe9 and 7dbbda6.

📒 Files selected for processing (1)
  • pkg/planner/core/testdata/plan_suite_unexported_out.json

📝 Walkthrough

Walkthrough

Preserves child plan columns during a two-pass projection build so correlated subqueries inside window-function-containing expressions can resolve outer references; adds a test covering correlated EXISTS with window functions and increases test sharding.

Changes

Cohort / File(s) Summary
Test configuration
pkg/planner/core/casetest/correlated/BUILD.bazel
Increased shard_count in the go_test rule from 2 to 3.
New correlated-window test
pkg/planner/core/casetest/correlated/correlated_test.go
Added TestCorrelatedExistsWithWindowFunction covering correlated EXISTS combined with window functions across several scenarios (empty table, NULLs, PARTITION BY, text-column edge case, aliased outer column).
Planner: projection & windows
pkg/planner/core/logical_plan_builder.go
Introduce hasWindowField handling: defer window fields on first pass using placeholders, then append missing child columns to projection so a second pass can resolve correlated references inside window-containing expressions while preserving alias/name visibility.
Test expectations updated
pkg/planner/core/testdata/plan_suite_unexported_out.json
Adjusted expected plan string literals to reflect updated column indices and window/aggregate projection positions after the projection carry-through change.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

release-note

Suggested reviewers

  • terry1purcell
  • guo-shaoge

Poem

🐰 I hopped through plans where window-flowers grow,
Carried columns softly so correlations flow.
EXISTS found its partner, partitions sang in tune,
Tests now hop merrier beneath the moon. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'planner: fix correlated subquery resolution failure in window-containing expressions' clearly and specifically summarizes the main change—fixing correlated subquery resolution in window function contexts.
Description check ✅ Passed The PR description includes all required sections: issue number (close #57594), detailed problem summary with root cause and solution explanation, What changed section, and completed checklist with unit test marked.
Linked Issues check ✅ Passed The code changes directly address issue #57594 by fixing correlated subquery resolution in window-containing expressions, including a regression test covering the original reproduction case.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the identified issue: logical_plan_builder.go fix, regression test, test file updates, and BUILD.bazel shard count adjustment for test parallelism.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/planner/core/casetest/correlated/correlated_test.go (1)

74-107: Reuse the existing cascades test harness here.

This regression is in shared planner construction code, so wrapping it in testkit.RunTestUnderCascades(...) like TestCorrelatedSubquery would keep both planner modes covered by the same fixture.

♻️ Suggested shape
 func TestCorrelatedExistsWithWindowFunction(t *testing.T) {
-	store := testkit.CreateMockStore(t)
-	tk := testkit.NewTestKit(t, store)
-	tk.MustExec("use test")
-	tk.MustExec("create table t1 (c1 int)")
-	...
+	testkit.RunTestUnderCascades(t, func(t *testing.T, tk *testkit.TestKit, _, _ string) {
+		tk.MustExec("use test")
+		tk.MustExec("create table t1 (c1 int)")
+		...
+	})
 }

As per coding guidelines "Prefer extending existing test suites and fixtures over creating new scaffolding".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/casetest/correlated/correlated_test.go` around lines 74 -
107, TestCorrelatedExistsWithWindowFunction should be executed under both
planners using the cascades test harness: wrap the existing test logic inside
testkit.RunTestUnderCascades(...) so the same body runs under both planner modes
(similar to how TestCorrelatedSubquery is exercised); locate the test function
TestCorrelatedExistsWithWindowFunction and replace its direct use of
testkit.NewTestKit/tk.MustExec/tk.MustQuery calls with a closure passed to
testkit.RunTestUnderCascades that receives a store and runs the same assertions,
ensuring the test fixture and statements (including table creation, inserts, and
queries) are executed inside that closure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/planner/core/casetest/correlated/correlated_test.go`:
- Around line 74-107: TestCorrelatedExistsWithWindowFunction should be executed
under both planners using the cascades test harness: wrap the existing test
logic inside testkit.RunTestUnderCascades(...) so the same body runs under both
planner modes (similar to how TestCorrelatedSubquery is exercised); locate the
test function TestCorrelatedExistsWithWindowFunction and replace its direct use
of testkit.NewTestKit/tk.MustExec/tk.MustQuery calls with a closure passed to
testkit.RunTestUnderCascades that receives a store and runs the same assertions,
ensuring the test fixture and statements (including table creation, inserts, and
queries) are executed inside that closure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9051ac2c-fa63-4710-b198-6d25679ef583

📥 Commits

Reviewing files that changed from the base of the PR and between 960e7c5 and 76e77e0.

📒 Files selected for processing (3)
  • pkg/planner/core/casetest/correlated/BUILD.bazel
  • pkg/planner/core/casetest/correlated/correlated_test.go
  • pkg/planner/core/logical_plan_builder.go

@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.5084%. Comparing base (960e7c5) to head (76e77e0).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #66814        +/-   ##
================================================
- Coverage   77.6702%   77.5084%   -0.1619%     
================================================
  Files          2009       1930        -79     
  Lines        549988     538415     -11573     
================================================
- Hits         427177     417317      -9860     
+ Misses       121102     121096         -6     
+ Partials       1709          2      -1707     
Flag Coverage Δ
integration 41.1556% <100.0000%> (-7.0322%) ⬇️
unit 76.6550% <100.0000%> (+0.4043%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.8990% <ø> (-12.0267%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

// processes those fields, so correlated subqueries inside them can still resolve
// references to the outer query's columns. Column pruning removes any that end
// up unused.
if !considerWindow && hasWindowField {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Aliased outer columns can still fail correlated resolution in the deferred window pass

Why: The carry-through logic deduplicates by UniqueID via schema.Contains(col), so if a child column is already projected under an alias (e.g. a AS x), the original FieldName for a is not preserved in the projection output. Correlated subquery resolution is purely name-based via expression.FindFieldName(outerName, v), which matches only FieldName.ColName — not OrigColName. This means a query like:

SELECT a AS x, row_number() OVER () + (SELECT a FROM dual) FROM t;

can fail at planning with [planner:1054] Unknown column 'a' in 'field list', even though this PR is intended to fix exactly this class of correlated subquery + window function errors.

Evidence:

  • if !schema.Contains(col) dedup at pkg/planner/core/logical_plan_builder.go:1386 uses UniqueID-based containment (pkg/expression/schema.go:193), blocking the alias-original-name carry-through.
  • Name-based lookup at pkg/planner/core/expression_rewriter.go:2619: idx, err = expression.FindFieldName(outerName, v) only matches FieldName.ColName, which is set to the alias by pkg/planner/core/logical_plan_builder.go:1000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed.

Copy link

@pantheon-ai pantheon-ai bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this! Once the fix is pushed, feel free to re-request review and I'll verify the aliased-column resolution path is covered.

ℹ️ Learn more details on Pantheon AI.

@qw4990 qw4990 added the AI-Correction Bugfix by AI label Mar 9, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/logical_plan_builder.go`:
- Around line 1409-1412: The cloned column created via
col.Clone().(*expression.Column) has its UniqueID reassigned via
b.ctx.GetSessionVars().AllocPlanColumnID() but its cached hash is not
invalidated; call the column's CleanHashCode() after resetting UniqueID to clear
the old hash cache so downstream hash-based operations and deduplication work
correctly (follow the same pattern as in getResultCTESchema). Locate the
clone/UniqueID reassignment in the block that appends to proj.Exprs and
schema.Append and add the CleanHashCode() call on the cloned *expression.Column
immediately after setting UniqueID.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f2a26ff7-2121-47f6-bf5d-5727bfd35a35

📥 Commits

Reviewing files that changed from the base of the PR and between 76e77e0 and fc6fbe9.

📒 Files selected for processing (2)
  • pkg/planner/core/casetest/correlated/correlated_test.go
  • pkg/planner/core/logical_plan_builder.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/planner/core/casetest/correlated/correlated_test.go

Comment on lines +1409 to +1412
cloned := col.Clone().(*expression.Column)
cloned.UniqueID = b.ctx.GetSessionVars().AllocPlanColumnID()
proj.Exprs = append(proj.Exprs, col)
schema.Append(cloned)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== expression.Column.Clone implementation ==\n'
rg -n -C6 'func .*Column.*Clone\(' pkg/expression

printf '\n== Existing hash-cache invalidation sites ==\n'
rg -n -C3 'CleanHashCode\(' pkg/planner/core/logical_plan_builder.go pkg/expression

printf '\n== UniqueID reassignment after cloning in this file ==\n'
rg -n -C3 'Clone\(\)|UniqueID = .*AllocPlanColumnID\(' pkg/planner/core/logical_plan_builder.go

Repository: pingcap/tidb

Length of output: 11304


Add CleanHashCode() after reassigning UniqueID to maintain hash consistency.

Line 1410 changes UniqueID on a cloned column without invalidating its hash cache. The Clone() method preserves the cached hashcode, and your file already establishes this pattern in getResultCTESchema (line 7413) where the comment states: "Since you have reallocated unique id here, the old-cloned-cached hash code is not valid anymore." Inconsistency here can cause the clone to hash identically to the original, defeating deduplication or leading to key collisions in hash-based lookups.

Suggested fix
 			cloned := col.Clone().(*expression.Column)
 			cloned.UniqueID = b.ctx.GetSessionVars().AllocPlanColumnID()
+			cloned.CleanHashCode()
 			proj.Exprs = append(proj.Exprs, col)
 			schema.Append(cloned)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cloned := col.Clone().(*expression.Column)
cloned.UniqueID = b.ctx.GetSessionVars().AllocPlanColumnID()
proj.Exprs = append(proj.Exprs, col)
schema.Append(cloned)
cloned := col.Clone().(*expression.Column)
cloned.UniqueID = b.ctx.GetSessionVars().AllocPlanColumnID()
cloned.CleanHashCode()
proj.Exprs = append(proj.Exprs, col)
schema.Append(cloned)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/logical_plan_builder.go` around lines 1409 - 1412, The
cloned column created via col.Clone().(*expression.Column) has its UniqueID
reassigned via b.ctx.GetSessionVars().AllocPlanColumnID() but its cached hash is
not invalidated; call the column's CleanHashCode() after resetting UniqueID to
clear the old hash cache so downstream hash-based operations and deduplication
work correctly (follow the same pattern as in getResultCTESchema). Locate the
clone/UniqueID reassignment in the block that appends to proj.Exprs and
schema.Append and add the CleanHashCode() call on the cloned *expression.Column
immediately after setting UniqueID.

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 9, 2026
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 9, 2026
@qw4990
Copy link
Contributor Author

qw4990 commented Mar 10, 2026

/test unit-test

@tiprow
Copy link

tiprow bot commented Mar 10, 2026

@qw4990: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 10, 2026
@AilinKid
Copy link
Contributor

AilinKid commented Mar 10, 2026

looks good to me, while #66869 is superset of this resolution which cloud fully close close #56532

@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guo-shaoge, hawkingrei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 10, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 10, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-10 10:00:28.050508949 +0000 UTC m=+344259.562566600: ☑️ agreed by hawkingrei.
  • 2026-03-10 12:06:42.193552534 +0000 UTC m=+351833.705610195: ☑️ agreed by guo-shaoge.

@qw4990
Copy link
Contributor Author

qw4990 commented Mar 10, 2026

Duplicated with #66869

@qw4990 qw4990 closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI-Correction Bugfix by AI approved lgtm release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected syntax error: Unknown column 'ref_0.c1' in 'where clause'

4 participants