Skip to content

*: fix potential bug by ai #10312

Open
bufferflies wants to merge 2 commits intotikv:masterfrom
bufferflies:bugfix/ai
Open

*: fix potential bug by ai #10312
bufferflies wants to merge 2 commits intotikv:masterfrom
bufferflies:bugfix/ai

Conversation

@bufferflies
Copy link
Contributor

@bufferflies bufferflies commented Mar 7, 2026

What problem does this PR solve?

Issue Number: Ref #10326

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

  • Bug Fixes

    • Stricter input validation and clearer client-facing errors for scheduler config updates (engine, rule, alias, timeouts, store IDs, ranges), safer control flow to avoid runtime errors and unintended pauses/resumes, and more accurate HTTP status codes for invalid input.
  • Tests

    • Added tests for invalid field types, malformed ranges, wrong scheduler types, and config update error cases to prevent regressions.

Signed-off-by: tongjian <1045931706@qq.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 7, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds defensive input validation and type checks across multiple scheduler handlers, tightens HTTP error responses and control flow to avoid panics/nil derefs, and introduces HTTP/unit tests covering invalid input scenarios.

Changes

Cohort / File(s) Summary
Balance Range Scheduler
pkg/schedule/schedulers/balance_range.go, pkg/schedule/schedulers/balance_range_test.go
Validate engine, rule, alias as strings and restrict engine to allowed values; convert/validate rule via core.NewRule; parse optional timeout string to duration with positivity check; update error messages. Adds test TestAddBalanceRangeJobWithInvalidFieldType.
Evict Leader Scheduler
pkg/schedule/schedulers/evict_leader.go, pkg/schedule/schedulers/evict_leader_test.go
pauseLeaderTransferIfStoreNotExist early-returns when store tracked and surfaces PauseLeaderTransfer errors. updateConfig accepts ranges as []string or []any with per-element validation and falls back to existing ranges when id present. Adds HTTP tests for valid string-array ranges and invalid element types.
Grant Hot Region Scheduler
pkg/schedule/schedulers/grant_hot_region.go, pkg/schedule/schedulers/grant_hot_region_test.go
Validate "store-leader-id" is a string before parsing to uint64; return 400 on wrong type. Adds test asserting 400 Bad Request for invalid store-leader-id type.
Transfer Witness Leader Scheduler
pkg/schedule/schedulers/transfer_witness_leader.go, pkg/schedule/schedulers/transfer_witness_leader_test.go
RecvRegionInfo now safely checks type assertion and returns nil on mismatch to avoid panics. Adds test TestRecvRegionInfoWithWrongSchedulerType.
Scheduler Controller
pkg/schedule/schedulers/scheduler_controller.go
Cache RecvRegionInfo result in CheckTransferWitnessLeader and return early on nil to prevent nil channel usage.
Scheduler API
server/api/scheduler.go
Change error responses: use BadRequest for invalid client inputs, preserve default timeout when option missing, use upstream response status on redirect delete errors, and ensure early returns after sending error responses.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • okJiang
  • JmPotato

Poem

🐰 I hopped through fields with twitching nose,

engines checked, rules lined in tidy rows,
timeouts parsed, and ranges neatly typed,
no panics, no nils — my work's well-typed,
carrot cake later, for careful code that glows.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title '*: fix potential bug by ai' is overly vague and generic, using non-descriptive phrasing like 'bug' without specifying what issues are being fixed or what areas are affected. Revise the title to be more specific and descriptive, e.g., 'Add input validation and nil checks to scheduler handlers' or 'Fix nil dereference and type assertion bugs in schedulers'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description follows the required template structure with sections for issue number, changes, checklist, and release notes. However, the 'What is changed and how does it work?' section lacks a detailed commit message explaining the actual implementation details and rationale for the changes across multiple scheduler files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/api/scheduler.go (1)

145-151: ⚠️ Potential issue | 🟠 Major

Malformed timeout values still fall back to 30m.

apiutil.CollectStringOption returns ErrOptionNotExist both when timeout is absent and when it is present but not a string (pkg/utils/apiutil/apiutil.go:337-345). With this branch, a payload like {"timeout":123} still appends the default and creates the scheduler instead of returning 400. Check raw key presence in input before applying the default.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/scheduler.go` around lines 145 - 151, The branch treating
ErrOptionNotExist as "use default" is incorrect because
apiutil.CollectStringOption returns ErrOptionNotExist both for a missing key and
for a key with a non-string value; update the logic in the scheduler handler
around apiutil.CollectStringOption("timeout", input, collector) to first check
raw presence of the "timeout" key in input (e.g., via a map lookup on input) and
only call collector(defaultTimeout) when the key is absent; if the key exists
but CollectStringOption returns ErrOptionNotExist (meaning the value is
malformed), return a 400 using h.r.JSON with the error message instead of
falling back to default; keep the existing behavior for other errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/schedule/schedulers/balance_range.go`:
- Around line 351-353: The early-nil-start guard in checkJob is letting an
invalid running job be marked finished but then allows checkJob to continue and
reach prepare/Schedule; fix this by detecting job == nil or job.Start == nil,
calling finishLocked(job) to mark it finished/clean up, and then immediately
return (stop further processing) from checkJob so it cannot proceed to prepare
or Schedule; update the branch around job == nil || job.Start == nil in checkJob
to perform finishLocked and return immediately.

In `@pkg/schedule/schedulers/evict_leader.go`:
- Around line 176-182: The helper currently conflates "store already configured"
with "we paused it", causing rollbacks in updateConfig to resume stores that
this request never paused; change the helper used in evict_leader logic (the
function that checks conf.StoreIDWithRanges and calls
conf.cluster.PauseLeaderTransfer) to return separate flags (e.g., exists and
pausedByRequest or a small result struct) instead of a single boolean, set
pausedByRequest only when conf.cluster.PauseLeaderTransfer(id, constant.In)
succeeds, and update all callers (including resumeLeaderTransferIfExist and the
rollback paths in updateConfig) to only call resumeLeaderTransfer when
pausedByRequest is true (leave stores that merely existed untouched).

In `@pkg/schedule/schedulers/grant_leader.go`:
- Around line 268-293: The error branches in the ranges parsing block always
call handler.config.cluster.ResumeLeaderTransfer(id, constant.Out) which can
unpause existing stores; change the logic in
pkg/schedule/schedulers/grant_leader.go so ResumeLeaderTransfer is only called
if this request previously paused the store. Introduce a local boolean (e.g.,
paused) set to true where PauseLeaderTransfer is invoked for new stores (the
existing PauseLeaderTransfer call around lines ~257-263) and then check that
paused flag before calling handler.config.cluster.ResumeLeaderTransfer(id,
constant.Out) in the ranges validation error/default branches inside the ranges
handling switch.

---

Outside diff comments:
In `@server/api/scheduler.go`:
- Around line 145-151: The branch treating ErrOptionNotExist as "use default" is
incorrect because apiutil.CollectStringOption returns ErrOptionNotExist both for
a missing key and for a key with a non-string value; update the logic in the
scheduler handler around apiutil.CollectStringOption("timeout", input,
collector) to first check raw presence of the "timeout" key in input (e.g., via
a map lookup on input) and only call collector(defaultTimeout) when the key is
absent; if the key exists but CollectStringOption returns ErrOptionNotExist
(meaning the value is malformed), return a 400 using h.r.JSON with the error
message instead of falling back to default; keep the existing behavior for other
errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d0e9d009-8d30-4d17-8018-37955c835928

📥 Commits

Reviewing files that changed from the base of the PR and between 95cde21 and db0ecac.

📒 Files selected for processing (12)
  • pkg/schedule/schedulers/balance_range.go
  • pkg/schedule/schedulers/balance_range_test.go
  • pkg/schedule/schedulers/evict_leader.go
  • pkg/schedule/schedulers/evict_leader_test.go
  • pkg/schedule/schedulers/grant_hot_region.go
  • pkg/schedule/schedulers/grant_hot_region_test.go
  • pkg/schedule/schedulers/grant_leader.go
  • pkg/schedule/schedulers/grant_leader_test.go
  • pkg/schedule/schedulers/scheduler_controller.go
  • pkg/schedule/schedulers/transfer_witness_leader.go
  • pkg/schedule/schedulers/transfer_witness_leader_test.go
  • server/api/scheduler.go

Comment on lines +351 to +353
if job == nil || job.Start == nil {
return true
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Nil-start jobs are still eligible for one scheduling round.

This avoids the panic, but checkJob keeps going after finishLocked (Lines 470-496). A running job restored without Start will be marked finished here and can still reach prepare/Schedule once in the same tick. Return immediately after finishing that invalid job state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/schedulers/balance_range.go` around lines 351 - 353, The
early-nil-start guard in checkJob is letting an invalid running job be marked
finished but then allows checkJob to continue and reach prepare/Schedule; fix
this by detecting job == nil or job.Start == nil, calling finishLocked(job) to
mark it finished/clean up, and then immediately return (stop further processing)
from checkJob so it cannot proceed to prepare or Schedule; update the branch
around job == nil || job.Start == nil in checkJob to perform finishLocked and
return immediately.

Comment on lines +176 to +182
if _, exist := conf.StoreIDWithRanges[id]; exist {
return true, nil
}
if err := conf.cluster.PauseLeaderTransfer(id, constant.In); err != nil {
return false, err
}
return true, nil
return false, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bad updates can now unpause an existing evict-leader store.

Because this helper now returns true for stores that are already configured, the rollback paths in updateConfig still call resumeLeaderTransferIfExist(id) on 400s (Lines 424, 442, 449, 454, and 464) even though this request never paused that store. A request like {"store_id":1,"batch":0} will now fail validation and also clear the scheduler’s pause state for store 1. Track pausedByRequest separately from exists, and only resume when this request actually called PauseLeaderTransfer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/schedulers/evict_leader.go` around lines 176 - 182, The helper
currently conflates "store already configured" with "we paused it", causing
rollbacks in updateConfig to resume stores that this request never paused;
change the helper used in evict_leader logic (the function that checks
conf.StoreIDWithRanges and calls conf.cluster.PauseLeaderTransfer) to return
separate flags (e.g., exists and pausedByRequest or a small result struct)
instead of a single boolean, set pausedByRequest only when
conf.cluster.PauseLeaderTransfer(id, constant.In) succeeds, and update all
callers (including resumeLeaderTransferIfExist and the rollback paths in
updateConfig) to only call resumeLeaderTransfer when pausedByRequest is true
(leave stores that merely existed untouched).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/schedule/schedulers/balance_range.go (1)

350-351: ⚠️ Potential issue | 🔴 Critical

shouldFinished still panics on nil job or missing Start.

This still dereferences job.Start unconditionally, so the new regression case with job == nil / job.Start == nil will panic instead of returning safely.

Suggested fix
 func (job *balanceRangeSchedulerJob) shouldFinished() bool {
+	if job == nil || job.Start == nil {
+		return true
+	}
 	return time.Since(*job.Start) > job.Timeout
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/schedulers/balance_range.go` around lines 350 - 351, The
shouldFinished method currently dereferences job.Start unconditionally causing a
panic when job or job.Start is nil; update
balanceRangeSchedulerJob.shouldFinished to first check for job != nil and
job.Start != nil and return false (not finished) if either is nil, then perform
the time.Since(*job.Start) > job.Timeout comparison; ensure you reference the
balanceRangeSchedulerJob receiver, its Start field and Timeout field when adding
the nil-guards so the behavior is safe for the regression case.
♻️ Duplicate comments (1)
pkg/schedule/schedulers/balance_range.go (1)

467-475: ⚠️ Potential issue | 🟠 Major

Return immediately after timing out the job.

Once this branch marks the job as finished, checkJob still falls through into prepare and can schedule one more operator in the same tick. Stop here after cleanup.

Suggested fix
 	if job.shouldFinished() {
 		if err := s.conf.finishLocked(index); err != nil {
 			balancePersistFailedCounter.Inc()
 			return false
 		}
 		s.cleanJobStatus(cluster, job)
 		balanceRangeExpiredCounter.Inc()
 		log.Info("balance key range job finished due to timeout", zap.String("alias", job.Alias), zap.Uint64("job-id", job.JobID))
+		return false
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/schedulers/balance_range.go` around lines 467 - 475, The
checkJob branch that handles timeout needs to return immediately after marking
the job finished to avoid falling through into prepare; update the block inside
checkJob where it calls job.shouldFinished(), s.conf.finishLocked(index),
s.cleanJobStatus(cluster, job), balanceRangeExpiredCounter.Inc(), and the
log.Info call so that it returns false (or otherwise exits) right after cleanup
and logging, preventing prepare/prepareRange from scheduling another operator in
the same tick.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/schedule/schedulers/balance_range.go`:
- Around line 350-351: The shouldFinished method currently dereferences
job.Start unconditionally causing a panic when job or job.Start is nil; update
balanceRangeSchedulerJob.shouldFinished to first check for job != nil and
job.Start != nil and return false (not finished) if either is nil, then perform
the time.Since(*job.Start) > job.Timeout comparison; ensure you reference the
balanceRangeSchedulerJob receiver, its Start field and Timeout field when adding
the nil-guards so the behavior is safe for the regression case.

---

Duplicate comments:
In `@pkg/schedule/schedulers/balance_range.go`:
- Around line 467-475: The checkJob branch that handles timeout needs to return
immediately after marking the job finished to avoid falling through into
prepare; update the block inside checkJob where it calls job.shouldFinished(),
s.conf.finishLocked(index), s.cleanJobStatus(cluster, job),
balanceRangeExpiredCounter.Inc(), and the log.Info call so that it returns false
(or otherwise exits) right after cleanup and logging, preventing
prepare/prepareRange from scheduling another operator in the same tick.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c16d46e1-b0bf-4117-a0bd-4a0dfe8d55ed

📥 Commits

Reviewing files that changed from the base of the PR and between 9d2b70f and 65ccf3a.

📒 Files selected for processing (3)
  • pkg/schedule/schedulers/balance_range.go
  • pkg/schedule/schedulers/balance_range_test.go
  • pkg/schedule/schedulers/evict_leader.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/schedule/schedulers/evict_leader.go

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 10, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/schedule/schedulers/evict_leader.go (1)

176-180: ⚠️ Potential issue | 🟠 Major

Keep alreadyConfigured separate from pausedByRequest.

This helper now returns the same true value for two different states, but the rollback paths at Line 424, Line 442, Line 449, Line 454, and Line 464 still treat it as “this request paused the store”. A rejected update can therefore clear the existing pause on an already-configured store even though this request never called PauseLeaderTransfer.

💡 One way to fix it
+type pauseLeaderTransferResult struct {
+	alreadyConfigured bool
+	pausedByRequest   bool
+}
+
-func (conf *evictLeaderSchedulerConfig) pauseLeaderTransferIfStoreNotExist(id uint64) (bool, error) {
+func (conf *evictLeaderSchedulerConfig) pauseLeaderTransferIfStoreNotExist(id uint64) (pauseLeaderTransferResult, error) {
 	conf.RLock()
 	defer conf.RUnlock()
 	if _, exist := conf.StoreIDWithRanges[id]; exist {
-		return true, nil
+		return pauseLeaderTransferResult{alreadyConfigured: true}, nil
 	}
 	if err := conf.cluster.PauseLeaderTransfer(id, constant.In); err != nil {
-		return false, err
+		return pauseLeaderTransferResult{}, err
 	}
-	return true, nil
+	return pauseLeaderTransferResult{pausedByRequest: true}, nil
 }
-	exist, err = handler.config.pauseLeaderTransferIfStoreNotExist(id)
+	pauseState, err = handler.config.pauseLeaderTransferIfStoreNotExist(id)
 	...
-		handler.config.resumeLeaderTransferIfExist(id)
+		if pauseState.pausedByRequest {
+			handler.config.resumeLeaderTransferIfExist(id)
+		}
 		...
-	} else if exist {
+	} else if pauseState.alreadyConfigured {
 		ranges = handler.config.getRanges(id)
 	}

Also applies to: 423-465

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/schedulers/evict_leader.go` around lines 176 - 180, The helper
that checks/conflicts and calls conf.cluster.PauseLeaderTransfer conflates
"alreadyConfigured" and "pausedByRequest" into a single boolean; change it to
return two distinct booleans (e.g., alreadyConfigured, pausedByRequest) so
callers can distinguish stores that were paused by this request from stores that
were already configured; update the helper (the code that reads
conf.StoreIDWithRanges and calls conf.cluster.PauseLeaderTransfer) to set
alreadyConfigured=true when the store exists in conf.StoreIDWithRanges and set
pausedByRequest=true only when PauseLeaderTransfer is actually invoked
successfully; then update all rollback paths that currently assume the single
true to only call the resume/unpause logic when pausedByRequest is true (leave
alreadyConfigured alone).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/schedule/schedulers/balance_range.go`:
- Around line 121-135: The parsed timeout can be negative which causes
shouldFinished() to immediately return true; after parsing timeoutStr with
time.ParseDuration in the block surrounding timeoutStr and err, add a validation
that the resulting timeout is strictly positive (e.g., timeout > 0) and if not,
return a 400 via handler.rd.JSON with a clear message (e.g., "timeout must be
positive") and do not assign job.Timeout; keep existing handling of empty string
and parse errors intact and reference timeoutStr, time.ParseDuration,
job.Timeout and shouldFinished() when locating the change.

---

Duplicate comments:
In `@pkg/schedule/schedulers/evict_leader.go`:
- Around line 176-180: The helper that checks/conflicts and calls
conf.cluster.PauseLeaderTransfer conflates "alreadyConfigured" and
"pausedByRequest" into a single boolean; change it to return two distinct
booleans (e.g., alreadyConfigured, pausedByRequest) so callers can distinguish
stores that were paused by this request from stores that were already
configured; update the helper (the code that reads conf.StoreIDWithRanges and
calls conf.cluster.PauseLeaderTransfer) to set alreadyConfigured=true when the
store exists in conf.StoreIDWithRanges and set pausedByRequest=true only when
PauseLeaderTransfer is actually invoked successfully; then update all rollback
paths that currently assume the single true to only call the resume/unpause
logic when pausedByRequest is true (leave alreadyConfigured alone).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0dfccb36-3fe0-44a2-9af9-d9cd001700a9

📥 Commits

Reviewing files that changed from the base of the PR and between 65ccf3a and 1d91e5f.

📒 Files selected for processing (3)
  • pkg/schedule/schedulers/balance_range.go
  • pkg/schedule/schedulers/balance_range_test.go
  • pkg/schedule/schedulers/evict_leader.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/schedule/schedulers/balance_range_test.go

Comment on lines +121 to 135
if timeoutVal, exists := input["timeout"]; exists {
timeoutStr, ok := timeoutVal.(string)
if !ok {
handler.rd.JSON(w, http.StatusBadRequest, "timeout must be a string")
return
}
job.Timeout = timeout
if len(timeoutStr) > 0 {
timeout, err := time.ParseDuration(timeoutStr)
if err != nil {
handler.rd.JSON(w, http.StatusBadRequest, fmt.Sprintf("timeout:%s is invalid", timeoutStr))
return
}
job.Timeout = timeout
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider validating that timeout is positive.

time.ParseDuration accepts negative durations (e.g., "-5m"). A negative timeout would cause shouldFinished() to return true immediately, finishing the job without performing any work. Add a check to reject non-positive timeouts.

🛡️ Suggested fix
 	if len(timeoutStr) > 0 {
 		timeout, err := time.ParseDuration(timeoutStr)
 		if err != nil {
 			handler.rd.JSON(w, http.StatusBadRequest, fmt.Sprintf("timeout:%s is invalid", timeoutStr))
 			return
 		}
+		if timeout <= 0 {
+			handler.rd.JSON(w, http.StatusBadRequest, "timeout must be positive")
+			return
+		}
 		job.Timeout = timeout
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/schedule/schedulers/balance_range.go` around lines 121 - 135, The parsed
timeout can be negative which causes shouldFinished() to immediately return
true; after parsing timeoutStr with time.ParseDuration in the block surrounding
timeoutStr and err, add a validation that the resulting timeout is strictly
positive (e.g., timeout > 0) and if not, return a 400 via handler.rd.JSON with a
clear message (e.g., "timeout must be positive") and do not assign job.Timeout;
keep existing handling of empty string and parse errors intact and reference
timeoutStr, time.ParseDuration, job.Timeout and shouldFinished() when locating
the change.

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 37.97468% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.82%. Comparing base (525524f) to head (1d91e5f).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10312      +/-   ##
==========================================
+ Coverage   78.78%   78.82%   +0.04%     
==========================================
  Files         525      527       +2     
  Lines       70824    70973     +149     
==========================================
+ Hits        55796    55944     +148     
- Misses      11004    11012       +8     
+ Partials     4024     4017       -7     
Flag Coverage Δ
unittests 78.82% <37.97%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@liyishuai liyishuai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tesed on my side. The AI review is worth considering. Otherwise LGTM.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 10, 2026

@liyishuai: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Tesed on my side. The AI review is worth considering. Otherwise LGTM.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 10, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: liyishuai
Once this PR has been reviewed and has the lgtm label, please assign bufferflies for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: tongjian <1045931706@qq.com>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 10, 2026
@bufferflies bufferflies changed the title *: fix potential panic by ai *: fix potential bug by ai Mar 10, 2026
@bufferflies bufferflies requested a review from okJiang March 10, 2026 08:26
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 10, 2026

@bufferflies: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen-1 c4f9234 link true /test pull-unit-test-next-gen-1
pull-unit-test-next-gen-2 c4f9234 link true /test pull-unit-test-next-gen-2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@okJiang okJiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link the corresponding issue and explain your changes in the PR description.

Even when using AI to fix bugs, we should still follow the process of submitting a bug issue -> pull request. The process of submitting a bug issue can also be completed by AI, and we should try to follow atomic commits, with one PR fixing one issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants