tests: fix flaky TestTransferLeaderForScheduler#10306
tests: fix flaky TestTransferLeaderForScheduler#10306JmPotato wants to merge 2 commits intotikv:masterfrom
Conversation
Replace `time.Sleep(time.Second)` followed by a direct `re.True(IsPrepared())` assertion with `testutil.Eventually` to properly wait for the raft cluster to become prepared after leader transfer. This matches the pattern already used in the third leader transfer block of the same test. Under CI resource pressure, a fixed 1-second sleep may not be enough for the new leader to fully prepare, causing the subsequent scheduler count check to time out. Close tikv#10305 Signed-off-by: JmPotato <github@ipotato.me>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReplaces a fixed sleep and immediate readiness check after region heartbeat with a polling-based Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/retest |
[LGTM Timeline notifier]Timeline:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10306 +/- ##
==========================================
+ Coverage 78.78% 78.89% +0.11%
==========================================
Files 525 527 +2
Lines 70824 70920 +96
==========================================
+ Hits 55796 55950 +154
+ Misses 11004 10964 -40
+ Partials 4024 4006 -18
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
That said, it does not fully justify issue #10305. The reported failure is at the scheduler-count check on line 1658. If the new leader were not prepared yet, the old test should have already failed at the previous A more targeted fix would be to increase the wait at line 1658/1659: |
Replace `IsPrepared()` with `AreSchedulersInitialized()` in the post-leader-transfer wait. `IsPrepared()` only indicates that the coordinator's prepare checker has collected enough region info, but the Run() goroutine still needs to call `InitSchedulers(true)` afterwards to actually register schedulers. Under CI resource pressure, the gap between `IsPrepared()` becoming true and `InitSchedulers` completing can exceed the default 20-second Eventually timeout. `AreSchedulersInitialized()` is set at the end of `InitSchedulers`, so waiting on it directly ensures all schedulers are registered before the subsequent scheduler-count assertion. Signed-off-by: kaijikikou <kaijikikou@gmail.com> Signed-off-by: JmPotato <github@ipotato.me>
|
@JmPotato: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@YuhaoZhang00: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: okJiang, YuhaoZhang00 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: Close #10305
What is changed and how does it work?
Check List
Tests
Release note
Summary by CodeRabbit