planner: enable cd-c algorithm by default for join reorder | tidb-test=pr/2676 by guo-shaoge · Pull Request #66349 · pingcap/tidb

guo-shaoge · 2026-02-24T10:01:04Z

What problem does this PR solve?

Issue Number: close #66088

Problem Summary:

enable new join reorder impl by default
clean all the cases

What changed and how does it work?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

Bug Fixes
- Improved query execution plan optimization for complex joins, particularly with outer join reordering and index range predicate handling.
- Fixed join predicate ordering in query plans to ensure correct optimization and execution.
Tests
- Enhanced test coverage for query optimization scenarios and join reordering strategies.
- Updated test assertions for improved reliability of plan validation tests.

Signed-off-by: guo-shaoge <shaoge1994@163.com>

pantheon-ai · 2026-02-24T10:01:09Z

I've accepted your request and will start reviewing the pull request. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{Learn more about Pantheon AI}

ti-chi-bot · 2026-02-24T10:01:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign 0xpoe for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS
pkg/planner/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

guo-shaoge · 2026-02-24T11:08:12Z

remainder: add inTest check if has remaining edge

pantheon-ai

P2: join reorder fallback behavior

pantheon-ai · 2026-02-24T11:12:52Z

pkg/planner/core/rule_join_reorder.go

-		}
-	})
+	if p.SCtx().GetSessionVars().EnableOuterJoinReorder {
+		p, err := joinorder.Optimize(p)


P2: joinorder.Optimize can hit a fallback where it logs WARN and returns the original join tree when it can’t consume all predicate-carrying edges (pkg/planner/core/joinorder/join_order.go:458). With CD-C becoming default-on here, this could be triggered by complex join graphs and silently skip reordering (correctness preserved, but potential perf regressions + noisy WARNs). Consider downgrading the log / adding a metric, or providing a more deterministic fallback path.

pantheon-ai

Requesting changes due to a P1 cost bug and two operability/behavior changes surfaced by making CD-C join reorder the default. See inline comments for details.

pantheon-ai · 2026-02-24T11:13:35Z

pkg/planner/core/rule_join_reorder.go

-		}
-	})
+	if p.SCtx().GetSessionVars().EnableOuterJoinReorder {
+		p, err := joinorder.Optimize(p)


P1 (see pkg/planner/core/joinorder/join_order.go:507): With default tidb_opt_cartesian_join_order_threshold=0 (CartesianJoinOrderThreshold), CD-C’s 2nd pass forces allowNoEQ=true and then multiplies cumCost by cartesianFactor (0), making cartesian/no-EQ joins cost 0. This can drive severe plan regressions/OOM once this path is default; please ensure the penalty never reduces cost (e.g., clamp factor >=1 / additive penalty) and/or don’t run the 2nd pass when cartesian is disabled.

pantheon-ai · 2026-02-24T11:13:35Z

pkg/planner/core/rule_join_reorder.go

-	})
+	if p.SCtx().GetSessionVars().EnableOuterJoinReorder {
+		p, err := joinorder.Optimize(p)
+		return p, false, err


P2 (see pkg/planner/core/joinorder/join_order.go:273): tidb_opt_join_reorder_threshold is effectively ignored (threshold check commented out; useGreedy := true), making DP unreachable. This is a compatibility/operability regression for users relying on DP join reorder with threshold > 0.

pantheon-ai · 2026-02-24T11:13:35Z

pkg/planner/core/rule_join_reorder.go

-			failpoint.Return(p2, false, err)
-		}
-	})
+	if p.SCtx().GetSessionVars().EnableOuterJoinReorder {


P2: EnableOuterJoinReorder now selects which join reorder engine runs (joinorder.Optimize vs optimizeRecursive), impacting pure-inner-join queries too. That changes the var’s operational meaning beyond “whether outer joins participate”; consider a separate session var/flag for engine selection and keep this var scoped to outer-join behavior.

ti-chi-bot · 2026-02-24T11:13:46Z

@pantheon-ai[bot]: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Requesting changes due to a P1 cost bug and two operability/behavior changes surfaced by making CD-C join reorder the default. See inline comments for details.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pantheon-ai · 2026-02-24T13:16:45Z

pkg/planner/core/rule_join_reorder.go

-			failpoint.Return(p2, false, err)
-		}
-	})
+	if p.SCtx().GetSessionVars().EnableOuterJoinReorder {


P2: Routing the default join-reorder through joinorder.Optimize (via EnableOuterJoinReorder) bypasses the legacy extractJoinGroup path that checks tidb_opt_join_reorder_through_sel (TiDBOptJoinReorderThroughSel, around L53). This makes that knob ineffective in default configs. Should the new joinorder path honor it (or document/rename it as legacy-only)?

guo-shaoge · 2026-03-02T09:24:15Z

/retest

Signed-off-by: guo-shaoge <shaoge1994@163.com>

coderabbitai · 2026-03-02T09:25:10Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces schema-aligned join condition handling and replaces failpoint-based control flow with session variable gating for outer join reordering. Core planner logic is enhanced with condition alignment functions, and test expectations across multiple suites are updated to reflect the resulting query plan changes.

Changes

Cohort / File(s)	Summary
Join Reorder Core Logic `pkg/planner/core/rule_join_reorder.go`, `pkg/planner/core/joinorder/conflict_detector.go`	Replaced failpoint-based conditional with session variable gating for outer join reordering; introduced alignment functions (`alignScalarFuncsNotNullWithSchema`, `alignExprsNotNullWithSchema`, `alignExprNotNullWithSchema`) to align EQ and non-EQ conditions with joined schema; modified `makeNonInnerJoin` to use aligned conditions for predicate distribution.
Build Configuration `pkg/planner/core/joinorder/BUILD.bazel`, `pkg/planner/core/casetest/enforcempp/BUILD.bazel`	Added dependencies: mysql parser package to joinorder library; testify/assert to enforcempp test suite.
Test Infrastructure `pkg/planner/core/casetest/enforcempp/enforce_mpp_test.go`	Replaced `require.Eventually` with `assert.Eventually` capturing actual plan for each iteration; added conditional failure with diagnostic output including cascades value, SQL statement, expected plan, and actual plan.
Planner Testdata Updates `pkg/planner/core/casetest/binaryplan/testdata/`, `pkg/planner/core/casetest/cbotest/testdata/`, `pkg/planner/core/casetest/hint/testdata/`, `pkg/planner/core/casetest/mpp/testdata/`, `pkg/planner/core/casetest/rule/testdata/`, `pkg/planner/core/casetest/tpcds/testdata/`, `pkg/planner/core/casetest/testdata/*`	Updated JSON plan expectations across multiple test suites; changes include node identifier shifts, predicate reordering (e.g., `gt`/`lt` precedence changes), removal of redundant warning messages, and structural plan reorganization (projection placement, exchange node reordering, join tree reshaping).
Integration Test Results `tests/integrationtest/r/planner/core/*`, `tests/integrationtest/r/select.result`, `tests/integrationtest/r/tpch.result`, `tests/integrationtest/r/join_reorder2.result`, `tests/integrationtest/r/plan_cache.result`	Updated explain plan outputs reflecting new join ordering strategy: HashJoin topology changes, table scan reordering, join predicate restructuring, predicate pushdown adjustments, and Build/Probe side reassignments across complex multi-join scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

planner: LEADING hint works with non-equijoin + outer join | tidb-test=pr/2688 #66213 — Modifies join-reorder logic and conflict detection in the same planner module with overlapping changes to condition handling.

Suggested labels

ok-to-test

Suggested reviewers

qw4990
terry1purcell

Poem

🐰 The planner hops with joy today,
Conditions align along the way,
Join orders bloom in schema's light,
Outer joins reorder left and right!
With testify's assert, we diagnose,
No more failpoints—the session knows! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description lacks critical required information including a proper issue link, detailed problem statement, and explanation of changes.	Add a proper issue link (close `#66088`), provide a detailed problem summary explaining why the cd-c algorithm is being enabled by default, and explain what changed and how it works in detail.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly identifies the main change: enabling the cd-c algorithm by default for join reorder in the planner. This is the primary objective of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

pkg/planner/core/rule_join_reorder.go (2)

295-300: Remove stale failpoint placeholder block.

The commented failpoint block is dead code and now just adds confusion around the active control path.

🧹 Proposed cleanup

-	// failpoint.Inject("enableCDCJoinReorder", func(val failpoint.Value) {
-	// 	if val.(bool) {
-	// 		p2, err := joinorder.Optimize(p)
-	// 		failpoint.Return(p2, false, err)
-	// 	}
-	// })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/rule_join_reorder.go` around lines 295 - 300, Remove the
stale commented failpoint block that references
failpoint.Inject("enableCDCJoinReorder") and the inner call to
joinorder.Optimize(p) (variables p and p2); simply delete the entire
commented-out block so the active control flow is not obscured by dead code, and
ensure no other references to enableCDCJoinReorder remain in this scope.

291-303: Please attach targeted planner test evidence for this optimizer-path switch.

Given this changes join reorder dispatch logic, include the exact planner and integration test commands/results in the PR thread before merge.

As per coding guidelines pkg/planner/**: Run targeted planner unit tests (go test -run <TestName> -tags=intest,deadlock) and update rule testdata when needed for changes to planner rules or logical/physical plans.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/rule_join_reorder.go` around lines 291 - 303, Add targeted
planner unit and integration tests that exercise both branches of the join
reorder dispatch (when p.SCtx().GetSessionVars().EnableOuterJoinReorder is true
and false) so CI and reviewers can see exact planner outputs; create/modify
tests (e.g., in a new or existing pkg/planner/*_test.go such as
rule_join_reorder_test.go) that invoke the planner on representative queries and
assert or capture the produced logical/physical plans for the Optimize and
optimizeRecursive paths, update any rule testdata expected outputs under
pkg/planner/testdata as needed, and run the commands to produce evidence: go
test -run <TestName> -tags=intest,deadlock (and the relevant unit test runs)
then paste the exact command lines and their outputs (planner and integration
results) into the PR thread for review.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/planner/core/rule_join_reorder.go`:
- Around line 295-300: Remove the stale commented failpoint block that
references failpoint.Inject("enableCDCJoinReorder") and the inner call to
joinorder.Optimize(p) (variables p and p2); simply delete the entire
commented-out block so the active control flow is not obscured by dead code, and
ensure no other references to enableCDCJoinReorder remain in this scope.
- Around line 291-303: Add targeted planner unit and integration tests that
exercise both branches of the join reorder dispatch (when
p.SCtx().GetSessionVars().EnableOuterJoinReorder is true and false) so CI and
reviewers can see exact planner outputs; create/modify tests (e.g., in a new or
existing pkg/planner/*_test.go such as rule_join_reorder_test.go) that invoke
the planner on representative queries and assert or capture the produced
logical/physical plans for the Optimize and optimizeRecursive paths, update any
rule testdata expected outputs under pkg/planner/testdata as needed, and run the
commands to produce evidence: go test -run <TestName> -tags=intest,deadlock (and
the relevant unit test runs) then paste the exact command lines and their
outputs (planner and integration results) into the PR thread for review.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 255b3fb and 631165e.

📒 Files selected for processing (26)

pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json
pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_xut.json
pkg/planner/core/casetest/cbotest/testdata/analyze_suite_out.json
pkg/planner/core/casetest/cbotest/testdata/analyze_suite_xut.json
pkg/planner/core/casetest/enforcempp/BUILD.bazel
pkg/planner/core/casetest/enforcempp/enforce_mpp_test.go
pkg/planner/core/casetest/hint/testdata/integration_suite_out.json
pkg/planner/core/casetest/hint/testdata/integration_suite_xut.json
pkg/planner/core/casetest/mpp/testdata/integration_suite_out.json
pkg/planner/core/casetest/mpp/testdata/integration_suite_xut.json
pkg/planner/core/casetest/rule/rule_outer2inner_test.go
pkg/planner/core/casetest/rule/testdata/join_reorder_suite_out.json
pkg/planner/core/casetest/rule/testdata/outer2inner_out.json
pkg/planner/core/casetest/rule/testdata/outer2inner_xut.json
pkg/planner/core/casetest/testdata/json_plan_suite_out.json
pkg/planner/core/casetest/testdata/json_plan_suite_xut.json
pkg/planner/core/casetest/tpcds/testdata/tpcds_suite_out.json
pkg/planner/core/casetest/tpcds/testdata/tpcds_suite_xut.json
pkg/planner/core/rule_join_reorder.go
tests/integrationtest/r/planner/core/casetest/index/index.result
tests/integrationtest/r/planner/core/casetest/rule/rule_join_reorder.result
tests/integrationtest/r/planner/core/issuetest/planner_issue.result
tests/integrationtest/r/planner/core/join_reorder2.result
tests/integrationtest/r/planner/core/plan_cache.result
tests/integrationtest/r/planner/core/rule_join_reorder.result
tests/integrationtest/r/select.result

Signed-off-by: guo-shaoge <shaoge1994@163.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integrationtest/r/tpch.result`:
- Line 447: The EXPLAIN predicate printer is dropping relation/alias qualifiers,
producing unsatisfiable text like eq(tpch50.nation.n_name, "JAPAN") for joined
instances; update the predicate/expr formatter (e.g., the function that formats
scalar expressions / column references used by the HashJoin explain path) to
include the per-source alias or qualifier (the join input alias such as n1/n2)
when rendering ColumnRef nodes so the predicate shows eq(n1.n_name, "JAPAN") and
eq(n2.n_name, "INDIA"); locate the formatter used by HashJoin explain output and
change it to prefer the relation alias (or join alias) over the fully qualified
catalog.table name when an alias is present.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 631165e and 0364655.

📒 Files selected for processing (1)

tests/integrationtest/r/tpch.result

coderabbitai · 2026-03-02T12:18:08Z

tests/integrationtest/r/tpch.result

-          ├─TableReader(Build)	root		data:TableFullScan
-          │ └─TableFullScan	cop[tikv]	table:customer	keep order:false
-          └─HashJoin(Probe)	root		inner join, equal:[eq(tpch50.lineitem.l_orderkey, tpch50.orders.o_orderkey)]
+        └─HashJoin	root		inner join, equal:[eq(tpch50.customer.c_nationkey, tpch50.nation.n_nationkey)], other cond:or(and(eq(tpch50.nation.n_name, "JAPAN"), eq(tpch50.nation.n_name, "INDIA")), and(eq(tpch50.nation.n_name, "INDIA"), eq(tpch50.nation.n_name, "JAPAN")))


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the tpch.result file cd /repo 2>/dev/null || cd . fd -t f "tpch.result" | head -5

Repository: pingcap/tidb

Length of output: 91

🏁 Script executed:

# Get context around line 447 - show lines 430-470 to see full segment if [ -f "tests/integrationtest/r/tpch.result" ]; then sed -n '430,470p' tests/integrationtest/r/tpch.result | cat -n fi

Repository: pingcap/tidb

Length of output: 3462

🏁 Script executed:

# Find Q7 SQL query - typically before the EXPLAIN output # Look backwards from line 447 for the SQL query if [ -f "tests/integrationtest/r/tpch.result" ]; then sed -n '1,450p' tests/integrationtest/r/tpch.result | grep -n "Q7" -A 30 | head -50 fi

Repository: pingcap/tidb

Length of output: 1152

🏁 Script executed:

# Search for the Q7 SQL definition in the codebase rg "SELECT.*nation.*n1.*nation.*n2" -A 5 -B 5 2>/dev/null | head -100

Repository: pingcap/tidb

Length of output: 38

🏁 Script executed:

# Look for TPCHqueries or test data defining Q7 rg "from.*nation.*n1.*from.*nation.*n2\|from.*nation.*n1.*,.*nation.*n2" 2>/dev/null | head -20

Repository: pingcap/tidb

Length of output: 38

EXPLAIN output loses alias distinction in predicate rendering.

Line 447 displays other cond:or(and(eq(tpch50.nation.n_name, "JAPAN"), eq(tpch50.nation.n_name, "INDIA")), ...), which appears textually unsatisfiable. While the planner correctly executes the self-join via distinct table:n1 and table:n2 accesses, the predicate rendering should preserve alias information to clarify that the comparisons apply to different nation instances (e.g., eq(n1.n_name, "JAPAN"), eq(n2.n_name, "INDIA")). The current formatting masks the actual join semantics.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/integrationtest/r/tpch.result` at line 447, The EXPLAIN predicate printer is dropping relation/alias qualifiers, producing unsatisfiable text like eq(tpch50.nation.n_name, "JAPAN") for joined instances; update the predicate/expr formatter (e.g., the function that formats scalar expressions / column references used by the HashJoin explain path) to include the per-source alias or qualifier (the join input alias such as n1/n2) when rendering ColumnRef nodes so the predicate shows eq(n1.n_name, "JAPAN") and eq(n2.n_name, "INDIA"); locate the formatter used by HashJoin explain output and change it to prefer the relation alias (or join alias) over the fully qualified catalog.table name when an alias is present.

Signed-off-by: guo-shaoge <shaoge1994@163.com>

codecov · 2026-03-02T12:40:35Z

Codecov Report

❌ Patch coverage is 80.85106% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.0531%. Comparing base (999e8f4) to head (9805e41).
⚠️ Report is 17 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #66349        +/-   ##
================================================
- Coverage   77.6833%   77.0531%   -0.6303%     
================================================
  Files          2008       1930        -78     
  Lines        549801     539861      -9940     
================================================
- Hits         427104     415980     -11124     
- Misses       120989     123863      +2874     
+ Partials       1708         18      -1690

Flag	Coverage Δ
integration	`41.1245% <73.4042%> (-7.0632%)`	⬇️
unit	`75.9567% <80.8510%> (-0.2855%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.7974% <ø> (+0.0929%)`	⬆️
parser	`∅ <ø> (∅)`
br	`48.6028% <ø> (-12.2972%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: guo-shaoge <shaoge1994@163.com>

pantheon-ai · 2026-03-03T11:56:52Z

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

_{ℹ️ Learn more details on Pantheon AI.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/joinorder/conflict_detector.go`:
- Around line 785-802: The cloned ScalarFunction's cached hash becomes stale
after you mutate its arguments in the block handling *expression.ScalarFunction
(where you call e.Clone(), assign newFunc.GetArgs()[i] = newArgs[i] after
alignExprNotNullWithSchema), so update the code to invalidate/recompute the hash
cache: after the mutation loop call expression.ReHashCode(newFunc) (or rebuild
the function via expression.NewFunctionInternal(...) instead of mutating the
clone) so the canonicalhashcode reflects the new arguments.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0364655 and 0b357bc.

📒 Files selected for processing (5)

pkg/planner/core/casetest/rule/testdata/outer_to_semi_join_suite_out.json
pkg/planner/core/casetest/rule/testdata/outer_to_semi_join_suite_xut.json
pkg/planner/core/casetest/tpch/testdata/tpch_suite_out.json
pkg/planner/core/casetest/tpch/testdata/tpch_suite_xut.json
pkg/planner/core/joinorder/conflict_detector.go

coderabbitai · 2026-03-03T12:11:11Z

pkg/planner/core/joinorder/conflict_detector.go

+	case *expression.ScalarFunction:
+		args := e.GetArgs()
+		changed := false
+		newArgs := make([]expression.Expression, len(args))
+		for i := range args {
+			newArgs[i] = alignExprNotNullWithSchema(args[i], schema)
+			if newArgs[i] != args[i] {
+				changed = true
+			}
+		}
+		if !changed {
+			return e
+		}
+		newFunc := e.Clone().(*expression.ScalarFunction)
+		for i := range newArgs {
+			newFunc.GetArgs()[i] = newArgs[i]
+		}
+		return newFunc


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect ScalarFunction struct definition and Clone() implementation echo "=== ScalarFunction struct and Clone() implementation ===" rg -n -A20 'type ScalarFunction struct' pkg/expression/scalar_function.go echo -e "\n=== ScalarFunction Clone() method ===" rg -n -A30 'func \(sf \*ScalarFunction\) Clone\(' pkg/expression/scalar_function.go echo -e "\n=== ScalarFunction GetArgs() method ===" rg -n -A15 'func \(sf \*ScalarFunction\) GetArgs\(' pkg/expression/scalar_function.go echo -e "\n=== Hash-related fields/methods in ScalarFunction ===" rg -n '(hashcode|canonicalhashcode)' pkg/expression/scalar_function.go echo -e "\n=== Expression interface hashcode/canonicalhashcode ===" rg -n -A5 'func \(sf \*ScalarFunction\) (HashCode|CanonicalHashCode)\(' pkg/expression/

Repository: pingcap/tidb

Length of output: 1723

🌐 Web query:

TiDB ScalarFunction Clone() implementation arg mutation safety

💡 Result:

TiDB’s (*ScalarFunction).Clone() is arg-mutation safe (with one important caveat about RetType).

Why args are safe to mutate after Clone()

ScalarFunction.Clone() builds a new ScalarFunction and clones the underlying builtinFunc via sf.Function.Clone(). (raw.githubusercontent.com)

All builtin signatures embed baseBuiltinFunc, whose cloneFrom() deep-clones the argument expressions:

it allocates a new []Expression

and appends arg.Clone() for each original arg (raw.githubusercontent.com)

So after cloning:

the clone’s GetArgs() slice is independent

each argument expression node is also cloned

replacing/rewriting args in the clone (e.g. clone.GetArgs()[i] = newExpr) will not affect the original

Caveat: RetType is shallow-copied

Both ScalarFunction.RetType and baseBuiltinFunc.tp are copied by pointer (explicitly marked shallow in code via struct tags / assignments). (raw.githubusercontent.com)

So do not mutate clone.RetType (or the builtin ret type) in-place unless you first clone the FieldType; otherwise you risk affecting other references to the same *types.FieldType.

Citations:

1: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/scalar_function.go

2: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/builtin.go

3: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/scalar_function.go

🌐 Web query:

TiDB expression.ScalarFunction cached hash Clone behavior

💡 Result:

In TiDB (pkg/expression), ScalarFunction caches its hash in two fields: hashcode (used by HashCode()) and canonicalhashcode (used by CanonicalHashCode()). (raw.githubusercontent.com)

Clone behavior (cached hash):

(*ScalarFunction).Clone() does not copy the cached hashcode into the clone; it only clones the underlying builtin Function and copies metadata. So a cloned ScalarFunction will typically have an empty hashcode cache and will recompute it on first HashCode() call. (raw.githubusercontent.com)

Clone() does copy canonicalhashcode if it is already present (slices.Clone(sf.canonicalhashcode)). (raw.githubusercontent.com)

How the cache is (re)computed:

HashCode() returns the cached sf.hashcode if it’s already set; otherwise it calls ReHashCode(sf) to rebuild it. (raw.githubusercontent.com)

If you mutate a scalar function’s arguments “in place”, TiDB expects you to call ReHashCode(sf) afterward (“used after we change the argument in place”). (raw.githubusercontent.com)

Related gotcha:

RemapColumn() explicitly clears the cache (newSf.hashcode = nil) after rewriting args, ensuring the hash will be recomputed. (raw.githubusercontent.com)

Citations:

1: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/scalar_function.go

2: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/scalar_function.go

3: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/scalar_function.go

4: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/scalar_function.go

5: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/scalar_function.go

6: https://raw.githubusercontent.com/pingcap/tidb/master/pkg/expression/scalar_function.go

🌐 Web query:

TiDB pkg/expression ScalarFunction architecture design

💡 Result:

TiDB pkg/expression ScalarFunction: architecture/design (high level)

1) Where ScalarFunction sits in the expression system

TiDB builds an expression tree (columns/constants/scalar-functions, etc.) and evaluates it during execution. The “refactor builtin” design emphasizes compile-time selection of typed function signatures so execution avoids repeated type-branching and Datum conversions. [5]

ScalarFunction is the expression node that represents “a function returning a value”, and it mainly wraps:

FuncName (case-insensitive string)

RetType (static return FieldType)

Function (a builtinFunc implementation that does the real work) [1]

2) Creation path: name → function class → concrete signature (builtinFunc)

The central factory is newFunctionImpl(...), used by NewFunction, NewFunctionBase, NewFunctionTryFold, etc. [1]

Flow (simplified):

Handle a few special forms (e.g., CAST, GET_VAR, sysdate-now rewrite). [1]

Lookup funcName in the global funcs registry (and extension function registry). If missing, return “function not exists”. [1]

Do some argument/type preprocessing (e.g., infer NULL constant types for most functions). [1]

Call functionClass.getFunction(ctx, args) to select/build the best-matching builtin signature (the actual builtinFunc). [1][4][5]

Construct ScalarFunction{ FuncName, RetType, Function }. Optionally run an init/check callback. [1]

Optionally run constant folding (default NewFunction folds). [1]

3) builtinFunc and “signature structs” (how a function is implemented)

TiDB models a SQL function as:

a function class (does arg-count checks, return-type inference, overload resolution), and

one or more signatures (concrete implementations), each implementing typed evalXX methods. This pattern is explicitly described in the refactor blog and dev guide. [4][5]

Best-practice implementation steps (from the dev guide):

Register the function in funcs map (name → functionClass)

Implement getFunction(...) to build a baseBuiltinFunc (via helpers like newBaseBuiltinFuncWithTp) and return a signature struct

Implement evalString/evalInt/evalReal/… depending on return type. [4][5]

4) Common embedding: baseBuiltinFunc carries shared metadata and behavior

Most builtin signature structs embed baseBuiltinFunc, which stores:

args []Expression, return type tp *types.FieldType

pushdown signature id pbCode tipb.ScalarFuncSig

collation/coercibility metadata

vectorization and thread-safety/shareability flags, plus notes that new fields must be thread-safe/immutable if the expression can be shared across sessions. [2]

5) Execution: row-based and vectorized evaluation

ScalarFunction exposes vectorized evaluation entrypoints like VecEvalInt/Real/String/Decimal/Time/Duration/JSON/..., which delegate to the underlying builtinFunc’s vecEvalXX methods. [1]

A ScalarFunction reports itself vectorizable only if both the function and its children are vectorizable (Vectorized() checks function + children). [1]

Builtin vectorization code is partly generator-driven (//go:generate ... *_vec.go). [2]

6) Pushdown architecture hook: protobuf signature + PB conversion

Builtin functions carry a tipb.ScalarFuncSig (pbCode) used for coprocessor/TiKV/TiFlash pushdown encoding. [2]

PbConverter converts an expression tree into tipb.Expr protobuf; if conversion fails, TiDB treats the expression as not pushdown-able (internal error “cannot be pushed down”). [3]

Key takeaway: ScalarFunction is a thin expression wrapper around a selected, typed builtinFunc signature. Most “architecture” lives in (a) the factory (newFunctionImpl) that selects a signature from the funcs registry via functionClass.getFunction, (b) the shared baseBuiltinFunc metadata (types/collation/pushdown id/vectorization), and (c) the execution/pushdown adapters (VecEval* and PB conversion). [1][2][3][4][5]

Clear hash caches after mutating cloned ScalarFunction arguments.

On lines 798–801, the code mutates newFunc.GetArgs()[i] after Clone(). While the argument slice itself is safely deep-cloned, the ScalarFunction carries a cached canonicalhashcode that is copied into the clone. Mutating args without invalidating this cache leaves it stale and inconsistent with the new argument values.

Add expression.ReHashCode(newFunc) after the mutation loop to clear and recompute the hash caches, or alternatively use expression.NewFunctionInternal(...) to reconstruct the function (as suggested in the original review).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/planner/core/joinorder/conflict_detector.go` around lines 785 - 802, The cloned ScalarFunction's cached hash becomes stale after you mutate its arguments in the block handling *expression.ScalarFunction (where you call e.Clone(), assign newFunc.GetArgs()[i] = newArgs[i] after alignExprNotNullWithSchema), so update the code to invalidate/recompute the hash cache: after the mutation loop call expression.ReHashCode(newFunc) (or rebuild the function via expression.NewFunctionInternal(...) instead of mutating the clone) so the canonicalhashcode reflects the new arguments.

…ion_compatible

Signed-off-by: guo-shaoge <shaoge1994@163.com>

…/tidb into post_cdc_impl Signed-off-by: guo-shaoge <shaoge1994@163.com>

Signed-off-by: guo-shaoge <shaoge1994@163.com>

guo-shaoge · 2026-03-08T12:53:13Z

/retest

Signed-off-by: guo-shaoge <shaoge1994@163.com>

qw4990 · 2026-03-10T06:44:21Z

pkg/planner/core/casetest/hint/testdata/integration_suite_out.json

-        "Warn": [
-          "Warning 1815 leading hint is inapplicable, check if the leading hint table has join conditions with other tables"
-        ]
+        "Warn": null


Great, also an improvement for leading hint.

Signed-off-by: guo-shaoge <shaoge1994@163.com>

qw4990 · 2026-03-10T06:48:53Z

pkg/planner/core/casetest/rule/testdata/outer_to_semi_join_suite_xut.json

-          "Selection root  isnull(test.t3.i)",
-          "└─HashJoin root  left outer join, left side:TableReader, equal:[eq(test.t1.i, test.t2.i)]",
+          "Projection root  test.t1.i, test.t2.i, <nil>->test.t3.i",
+          "└─HashJoin root  anti semi join, left side:HashJoin, equal:[eq(test.t2.i, test.t3.i)]",


Is this expected? outer join --> anti semi

because we have this rule of converting outer to anti semi #64959,
and for this case, the first scenrio matches(IS NULL on the join condition column), that's why the outer join is converted to anti semi join.

qw4990 · 2026-03-10T06:50:16Z

pkg/planner/core/joinorder/conflict_detector.go

+	alignedEQConds = alignScalarFuncsNotNullWithSchema(alignedEQConds, join.Schema())
+	alignedNonEQConds := alignExprsNotNullWithSchema(e.nonEQConds, join.Schema())


Seems tricky, could you add some comments for these 2 lines

Signed-off-by: guo-shaoge <shaoge1994@163.com>

tiprow · 2026-03-10T08:59:43Z

@guo-shaoge: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
fast_test_tiprow	`9805e41`	link	true	`/test fast_test_tiprow`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ti-chi-bot · 2026-03-10T09:25:07Z

@guo-shaoge: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
idc-jenkins-ci-tidb/check_dev_2	`42b1dae`	link	true	`/test check-dev2`
pull-build-next-gen	`9805e41`	link	true	`/test pull-build-next-gen`
idc-jenkins-ci-tidb/build	`9805e41`	link	true	`/test build`
idc-jenkins-ci-tidb/unit-test	`9805e41`	link	true	`/test unit-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

planner: enable cd-c algorithm by default for join reorder

257850e

Signed-off-by: guo-shaoge <shaoge1994@163.com>

ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. labels Feb 24, 2026

ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/planner SIG: Planner labels Feb 24, 2026

guo-shaoge changed the title ~~planner: enable cd-c algorithm by default for join reorder~~ [WIP] planner: enable cd-c algorithm by default for join reorder Feb 24, 2026

ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2026

pantheon-ai bot reviewed Feb 24, 2026

View reviewed changes

pantheon-ai bot requested changes Feb 24, 2026

View reviewed changes

pantheon-ai bot reviewed Feb 24, 2026

View reviewed changes

Merge branch 'master' of github.com:pingcap/tidb into post_cdc_impl

85bd49b

guo-shaoge changed the title ~~[WIP] planner: enable cd-c algorithm by default for join reorder~~ [WIP] planner: enable cd-c algorithm by default for join reorder | pr=tidb-test/2676 Mar 2, 2026

guo-shaoge changed the title ~~[WIP] planner: enable cd-c algorithm by default for join reorder | pr=tidb-test/2676~~ planner: enable cd-c algorithm by default for join reorder | pr=tidb-test/2676 Mar 2, 2026

ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2026

update case

631165e

Signed-off-by: guo-shaoge <shaoge1994@163.com>

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

update

0364655

Signed-off-by: guo-shaoge <shaoge1994@163.com>

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

update case

3cf4390

Signed-off-by: guo-shaoge <shaoge1994@163.com>

add code to align not null flag

0b357bc

Signed-off-by: guo-shaoge <shaoge1994@163.com>

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

guo-shaoge added 6 commits March 5, 2026 19:56

Merge branch 'master' of github.com:pingcap/tidb into cdc_impl_select…

44f5cd0

…ion_compatible

refine case

cd9b594

Signed-off-by: guo-shaoge <shaoge1994@163.com>

fix

bdd3813

Signed-off-by: guo-shaoge <shaoge1994@163.com>

refine more comments

3b335a7

Signed-off-by: guo-shaoge <shaoge1994@163.com>

Merge branch 'cdc_impl_selection_compatible' of github.com:guo-shaoge…

f695198

…/tidb into post_cdc_impl Signed-off-by: guo-shaoge <shaoge1994@163.com>

Merge branch 'master' of github.com:pingcap/tidb into post_cdc_impl

d95f53a

Signed-off-by: guo-shaoge <shaoge1994@163.com>

ti-chi-bot bot removed the do-not-merge/needs-tests-checked label Mar 8, 2026

update code

2d85b13

Signed-off-by: guo-shaoge <shaoge1994@163.com>

guo-shaoge force-pushed the post_cdc_impl branch from 4d69d5c to 2d85b13 Compare March 8, 2026 03:09

guo-shaoge added 5 commits March 8, 2026 12:16

fmt

bfea189

Signed-off-by: guo-shaoge <shaoge1994@163.com>

update code(calcCumCost)

03b562c

Signed-off-by: guo-shaoge <shaoge1994@163.com>

update

30ab376

Signed-off-by: guo-shaoge <shaoge1994@163.com>

update case

b88b8f7

Signed-off-by: guo-shaoge <shaoge1994@163.com>

update case

55cd770

Signed-off-by: guo-shaoge <shaoge1994@163.com>

guo-shaoge mentioned this pull request Mar 9, 2026

sessionctx/variable: add sysvar for new join reorder impl #66792

Open

13 tasks

refine sysvar(previous commit is ok for ci)

6e80f05

Signed-off-by: guo-shaoge <shaoge1994@163.com>

guo-shaoge force-pushed the post_cdc_impl branch from 5cd0122 to 6e80f05 Compare March 9, 2026 02:28

guo-shaoge requested review from AilinKid, qw4990 and winoros March 9, 2026 04:05

qw4990 reviewed Mar 10, 2026

View reviewed changes

update rule table

4c1520d

Signed-off-by: guo-shaoge <shaoge1994@163.com>

qw4990 reviewed Mar 10, 2026

View reviewed changes

guo-shaoge added 3 commits March 10, 2026 15:05

update rule table

42b1dae

Signed-off-by: guo-shaoge <shaoge1994@163.com>

update

14c04a3

Signed-off-by: guo-shaoge <shaoge1994@163.com>

update

9805e41

Signed-off-by: guo-shaoge <shaoge1994@163.com>

		alignedEQConds = alignScalarFuncsNotNullWithSchema(alignedEQConds, join.Schema())
		alignedNonEQConds := alignExprsNotNullWithSchema(e.nonEQConds, join.Schema())

Conversation

guo-shaoge commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ti-chi-bot bot commented Feb 24, 2026

Uh oh!

guo-shaoge commented Feb 24, 2026

Uh oh!

pantheon-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

pantheon-ai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

pantheon-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

pantheon-ai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

pantheon-ai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

pantheon-ai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Feb 24, 2026

Uh oh!

pantheon-ai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

guo-shaoge commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pantheon-ai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Why args are safe to mutate after Clone()

Caveat: RetType is shallow-copied

TiDB pkg/expression ScalarFunction: architecture/design (high level)

1) Where ScalarFunction sits in the expression system

2) Creation path: name → function class → concrete signature (builtinFunc)

3) builtinFunc and “signature structs” (how a function is implemented)

guo-shaoge commented Feb 24, 2026 •

edited

Loading

pantheon-ai bot commented Feb 24, 2026 •

edited

Loading

coderabbitai bot commented Mar 2, 2026 •

edited

Loading

codecov bot commented Mar 2, 2026 •

edited

Loading

pantheon-ai bot commented Mar 3, 2026 •

edited

Loading

Why args are safe to mutate after `Clone()`

Caveat: `RetType` is shallow-copied

TiDB `pkg/expression` ScalarFunction: architecture/design (high level)

1) Where `ScalarFunction` sits in the expression system

2) Creation path: name → function class → concrete signature (`builtinFunc`)

3) `builtinFunc` and “signature structs” (how a function is implemented)

4) Common embedding: `baseBuiltinFunc` carries shared metadata and behavior

guo-shaoge Mar 10, 2026 •

edited

Loading

ti-chi-bot bot commented Mar 10, 2026 •

edited

Loading