tests: stabilize TestSwitchModeDuringWorkload#10316
tests: stabilize TestSwitchModeDuringWorkload#10316okJiang wants to merge 1 commit intotikv:masterfrom
Conversation
Signed-off-by: okjiang <819421878@qq.com>
|
Skipping CI for Draft Pull Request. |
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis change adds a recovery verification step to a resource manager integration test. After switching deployment modes, the test now waits for the ResourceGroupController to transition from a degraded state by polling its status for up to 30 seconds, ensuring the controller has stabilized before proceeding. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10316 +/- ##
==========================================
+ Coverage 78.78% 78.88% +0.10%
==========================================
Files 527 527
Lines 70916 70920 +4
==========================================
+ Hits 55870 55945 +75
+ Misses 11026 10975 -51
+ Partials 4020 4000 -20
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
@okJiang: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: ref #10154
TestSwitchModeDuringWorkload/pd-to-standaloneis flaky.Root-cause evidence chain
pkg/utils/testutil/testutil.go:68tests/integrations/mcs/resourcemanager/resource_manager_test.go:492Condition never satisfied.22856930709/66299693231) show transient RM discovery/connectivity errors around the switch window, e.g. repeatedresource manager errorwithconnection refused, while the test already starts counting post-switch success.switched=true, but the controller may still be in transient degraded mode until the first successful token-bucket response after endpoint switch.okAftercan stay below threshold in the bounded wait even though switch eventually recovers.Historical analog
flaky_stabilization+test_harness_alignmentfrom flaky fix playbook.What is changed and how does it work?
TestSwitchModeDuringWorkload, after switch + leader/service routing checks and before enabling post-switch counting, add an explicit wait:testutil.Eventually(... !rgController.IsDegraded() ...)Risk
Verification
cd tests/integrations && make gotest GOTEST_ARGS='-tags without_dashboard ./mcs/resourcemanager -run TestSwitchModeDuringWorkload -count=3'ok github.com/tikv/pd/tests/integrations/mcs/resourcemanager 431.983smake basic-testpkg/gctuner: goleak (unexpected goroutine inmemory_limit_tuner)pkg/storage/endpoint:TestDataPhysicalRepresentationexpected path mismatch (/pd/0/...vs/pd/<cluster-id>/...)Summary by CodeRabbit