*: upgrade testify from v1.10.0 to v1.11.1#10295
*: upgrade testify from v1.10.0 to v1.11.1#10295JmPotato wants to merge 17 commits intotikv:masterfrom
Conversation
Remove the `replace` directive that pinned testify to v1.10.0 and fix all Eventually condition functions that used testify assertions (re.NoError, re.Equal, ReadGetJSON, etc.) inside the condition, which permanently marks tests as failed on first check with testify v1.11+. The key changes are: - Remove `replace` directives in root, tools, and tests/integrations go.mod files - Upgrade client/go.mod from v1.9.0 to v1.11.1 - Add Try* utility functions to pkg/utils/testutil/api_check.go (TryReadGetJSON, TryCheckGetJSON, TryCheckPostJSON, etc.) that return errors instead of using testify assertions - Replace all testify assertions inside Eventually condition functions with pure Go boolean expressions or Try* variants across ~50 test files Issue Number: close tikv#10164 Signed-off-by: JmPotato <github@ipotato.me>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughTests and test utilities were changed to use non-panicking, error-returning "try" helpers and guarded predicates inside Eventually loops; multiple temporary testify replace directives were removed, normalizing the testify dependency version across modules and tightening some test error-propagation checks. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
tools/pd-ctl/tests/member/member_test.go (1)
53-53:⚠️ Potential issue | 🟠 MajorDuplicate
defer cluster.Destroy()call.Line 40 already has
defer cluster.Destroy(). This duplicate defer at line 53 will causeDestroy()to be called twice when the function exits, which could lead to double-close errors or undefined behavior.Proposed fix
client := cluster.GetServer("pd1").GetEtcdClient() - defer cluster.Destroy()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/pd-ctl/tests/member/member_test.go` at line 53, The test contains a duplicate deferred call to cluster.Destroy() (the variable cluster and its Destroy() method are deferred twice); remove the redundant defer so Destroy() is only scheduled once—leave the original defer cluster.Destroy() (the one at line ~40) and delete the second defer cluster.Destroy() at line ~53 to avoid double destruction and potential double-close errors.tests/server/member/member_test.go (1)
112-137:⚠️ Potential issue | 🟡 MinorRemove unused
checkMemberListfunction.The
checkMemberListfunction at lines 112-137 is dead code and not called anywhere in the codebase. It has been superseded bytryCheckMemberListin the same file. Remove the function to maintain code cleanliness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/server/member/member_test.go` around lines 112 - 137, Delete the unused function checkMemberList from the file (the entire function block starting with func checkMemberList(...)) since it's dead code and replaced by tryCheckMemberList; remove any now-unused imports that were only required by checkMemberList (e.g., io, json, pdpb, errors, or tests.TestDialClient if they become unused), and run go vet/go build to ensure no remaining references to checkMemberList exist.tools/pd-ctl/tests/config/config_test.go (1)
660-668:⚠️ Potential issue | 🟡 MinorTighten the
Eventuallysuccess condition before exiting.On Line 665,
!strings.Contains(..., "404")can return true for non-JSON/transient responses, then Line 667 fails outside the retry loop. Keep JSON validation inside the predicate.🔧 Suggested fix
var group placement.RuleGroup testutil.Eventually(re, func() bool { // wait for the config to be synced to the scheduling server output, err = tests.ExecuteCommand(cmd, "-u", pdAddr, "config", "placement-rules", "rule-group", "show", placement.DefaultGroupID) if err != nil { return false } - return !strings.Contains(string(output), "404") + if strings.Contains(string(output), "404") { + return false + } + if json.Unmarshal(output, &group) != nil { + return false + } + return group.ID == placement.DefaultGroupID }) - re.NoError(json.Unmarshal(output, &group), string(output)) re.Equal(placement.RuleGroup{ID: placement.DefaultGroupID}, group)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/pd-ctl/tests/config/config_test.go` around lines 660 - 668, The Eventually predicate is too lax—it only checks for absence of "404" and lets invalid or non-JSON responses escape the retry loop causing json.Unmarshal and re.Equal on group to fail; update the testutil.Eventually predicate (the anonymous func passed to testutil.Eventually that calls tests.ExecuteCommand with cmd and pdAddr) to attempt json.Unmarshal(output, &group) and verify the unmarshaled value equals placement.RuleGroup{ID: placement.DefaultGroupID} inside the predicate, returning true only when unmarshalling succeeds and the group matches; leave the external re.NoError/re.Equal assertions out of the outer scope since the predicate will now ensure valid JSON and the expected group before exiting.pkg/mcs/discovery/register_test.go (1)
73-92:⚠️ Potential issue | 🟠 MajorAvoid
time.SleepinsideEventuallypredicate—can make the test extremely slow.At lines 89–92,
testutil.EventuallycallstryGetKeyAfterLeaseExpired, which sleeps forDefaultLeaseInSeconds(5 seconds) plus 500 ms on every predicate evaluation. IfEventuallyretries N times, the total sleep is multiplied by N, potentially pushing this test into the 30–60+ second range, especially given the 3-iteration loop context.Refactor to compute a deadline once before
Eventually, then returnfalseuntil that deadline passes, before doing a fastclient.Getpoll:Restructure with deadline gate
+func tryGetKey(ctx context.Context, client *clientv3.Client, key string) (string, error) { + resp, err := client.Get(ctx, key) + if err != nil { + return "", err + } + if len(resp.Kvs) == 0 { + return "", nil + } + return string(resp.Kvs[0].Value), nil +} ... <-etcd.Server.ReadyNotify() + // Wait once for the lease-expiry window, then poll quickly. + expireAt := time.Now().Add(DefaultLeaseInSeconds*time.Second + 500*time.Millisecond) testutil.Eventually(re, func() bool { - val, err := tryGetKeyAfterLeaseExpired(ctx, client, sr.key) + if time.Now().Before(expireAt) { + return false + } + val, err := tryGetKey(ctx, client, sr.key) return err == nil && val == "127.0.0.1:2" })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/mcs/discovery/register_test.go` around lines 73 - 92, The predicate passed to testutil.Eventually must not call tryGetKeyAfterLeaseExpired (which sleeps DefaultLeaseInSeconds+500ms); instead compute a single deadline before calling testutil.Eventually (e.g. deadline := time.Now().Add(time.Duration(DefaultLeaseInSeconds+500)*time.Millisecond)) and have the predicate return false until time.Now() >= deadline, after which do a fast non-sleeping client.Get loop (or a quick single Get) to check sr.key and compare the value to "127.0.0.1:2"; replace the current call to tryGetKeyAfterLeaseExpired inside testutil.Eventually with this deadline-gated check so retries don’t multiply the sleep.pkg/utils/etcdutil/etcdutil_test.go (1)
193-201:⚠️ Potential issue | 🟠 MajorAlways verify member convergence via list check, not removal RPC success.
The predicate currently returns
trueas soon asRemoveEtcdMembersucceeds without confirming the member list reflects the removal. Since the test wraps this inEventuallyspecifically to tolerate removal timing issues (member shutdown can cause transient errors), it's safer to always validate end-state via member list rather than trust the RPC return value.Suggested fix
testutil.Eventually(re, func() bool { - _, err := RemoveEtcdMember(client1, memIDToRemove) - if err == nil { - return true - } - // Verify if the member was actually removed by checking the member list - listResp, listErr := ListEtcdMembers(client1.Ctx(), client1) - return listErr == nil && len(listResp.Members) == 1 + _, _ = RemoveEtcdMember(client1, memIDToRemove) + // Always verify convergence via member list (covers both nil and non-nil remove errors). + listResp, listErr := ListEtcdMembers(client1.Ctx(), client1) + return listErr == nil && len(listResp.Members) == 1 })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/utils/etcdutil/etcdutil_test.go` around lines 193 - 201, The predicate in the test currently returns true immediately when RemoveEtcdMember succeeds without verifying the cluster state; change the predicate used in testutil.Eventually to always call ListEtcdMembers(client1.Ctx(), client1) and only return true when the returned member list reflects that memIDToRemove is gone (e.g., len(listResp.Members) == 1 or the specific member ID is absent), treating RemoveEtcdMember errors as transient and not as a success condition; update the code around RemoveEtcdMember, ListEtcdMembers, client1 and memIDToRemove so the end-state check (member list) determines success.tests/server/api/rule_test.go (1)
548-559:⚠️ Potential issue | 🟠 MajorReturn
falsewhen the expected rule is missing.In the success path, the predicate currently returns
trueif no rule withGroupID == "e"is present, which can make this test pass even when the target rule is absent.🔧 Suggested fix
if testCase.success { testutil.Eventually(re, func() bool { if testutil.TryReadGetJSON(tests.TestDialClient, url, &resp) != nil { return false } for _, r := range resp { if r.GroupID == "e" { return compareRule(r, &rule) } } - return true + return false }) } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/server/api/rule_test.go` around lines 548 - 559, The predicate passed to testutil.Eventually incorrectly returns true when no rule with GroupID == "e" is found, allowing the success path to pass incorrectly; update the closure used in the testCase.success branch (the anonymous func passed to testutil.Eventually that calls testutil.TryReadGetJSON and iterates resp) so that if no r with r.GroupID == "e" is found it returns false (and only returns the result of compareRule(r, &rule) when a matching rule is present), ensuring Eventually keeps polling until the expected rule appears and compareRule returns true.
🧹 Nitpick comments (7)
tests/integrations/tso/consistency_test.go (1)
139-141: Preserve stream close/send errors for timeout diagnostics.At Line 140, dropping
CloseSend()errors makes eventual timeout root-cause analysis harder. Consider tracking alastErrin outer scope (also for Line 136/142 failures) and assert it afterEventuallyfails.💡 Suggested refactor
- var resp *tsopb.TsoResponse + var ( + resp *tsopb.TsoResponse + lastErr error + ) testutil.Eventually(re, func() bool { tsoClient, err := suite.tsoClient.Tso(ctx) if err != nil { + lastErr = err return false } defer func() { - _ = tsoClient.CloseSend() + if err := tsoClient.CloseSend(); err != nil && lastErr == nil { + lastErr = err + } }() if err := tsoClient.Send(req); err != nil { + lastErr = err return false } resp, err = tsoClient.Recv() - return err == nil && resp != nil + if err != nil || resp == nil { + lastErr = err + return false + } + lastErr = nil + return true }) + re.NoError(lastErr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integrations/tso/consistency_test.go` around lines 139 - 141, The defer currently swallows errors from tsoClient.CloseSend(), which loses useful failure context; introduce an outer-scope variable (e.g., lastErr) and assign CloseSend()'s error to it instead of discarding, also set lastErr for the other failure points referenced (the checks at the other failing calls around the Eventually block), and update the test teardown/assertion so that if the Eventually assertion fails you surface lastErr (assert or t.Log/fatal with lastErr) to preserve the root-cause error for timeout diagnostics; reference the tsoClient.CloseSend() call, the Eventually assertion, and the new lastErr variable when making changes.client/clients/tso/stream_test.go (1)
361-365: Prefer accepting either error path in the Eventually predicate.At Line 365, requiring both
processErrandcallbackErrcan make this check brittle if closure semantics return only one error path (sync return or callback). Consider loosening to either-path success.Suggested change
- return processErr != nil && callbackErr != nil + return processErr != nil || callbackErr != nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/clients/tso/stream_test.go` around lines 361 - 365, The test's Eventually predicate currently requires both processErr and callbackErr to be non-nil, which is brittle; update the predicate used around s.stream.processRequests (referencing processErr and callbackErr and the callback passed to stream.processRequests) to accept either error path by returning processErr != nil || callbackErr != nil so the test succeeds if the error is delivered synchronously or via the callback.tests/integrations/mcs/tso/keyspace_group_manager_test.go (1)
242-244: Please validate with failpoint-aware test targets.Run touched tests with
make basic-test/make gotest(not plaingo test) so failpoints are correctly managed.Based on learnings: Applies to **/*_test.go : Tests must handle failpoints using github.com/pingcap/failpoint. Do not run
go testdirectly unless failpoints are enabled/disabled correctly. Usemake test,make basic-test, ormake gotestcommands to auto enable/disable failpoints.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integrations/mcs/tso/keyspace_group_manager_test.go` around lines 242 - 244, The test in tests/integrations/mcs/tso/keyspace_group_manager_test.go relies on failpoints and must be executed with failpoint-aware targets; run it via make basic-test or make gotest (not plain go test). Ensure the test file imports github.com/pingcap/failpoint and that failpoints are enabled/disabled properly (e.g., enable required failpoints in TestMain or setup and call failpoint.Disable in cleanup) so the branch guarded by the if err != nil || tam == nil check behaves deterministically under test; re-run the touched *_test.go with the make targets to validate the fix.tests/server/apiv2/handlers/testutil.go (1)
230-252: Refactor recommendation: deduplicate keyspace-group load paths.
tryLoadKeyspaceGroupByIDandTryLoadKeyspaceGroupByIDcurrently duplicate request/read/unmarshal logic. Consolidating them will reduce drift risk.♻️ Proposed refactor
-func tryLoadKeyspaceGroupByID(server *tests.TestServer, id uint32) (*endpoint.KeyspaceGroup, error) { +func tryLoadKeyspaceGroupByID(server *tests.TestServer, id uint32) (*endpoint.KeyspaceGroup, int, error) { httpReq, err := http.NewRequest(http.MethodGet, server.GetAddr()+keyspaceGroupsPrefix+fmt.Sprintf("/%d", id), http.NoBody) if err != nil { - return nil, err + return nil, 0, err } resp, err := tests.TestDialClient.Do(httpReq) if err != nil { - return nil, err + return nil, 0, err } defer resp.Body.Close() data, err := io.ReadAll(resp.Body) if err != nil { - return nil, err + return nil, resp.StatusCode, err } if resp.StatusCode != http.StatusOK { - return nil, fmt.Errorf("unexpected status code: %d", resp.StatusCode) + return nil, resp.StatusCode, fmt.Errorf("unexpected status code: %d", resp.StatusCode) } var kg endpoint.KeyspaceGroup if err := json.Unmarshal(data, &kg); err != nil { - return nil, err + return nil, resp.StatusCode, err } - return &kg, nil + return &kg, resp.StatusCode, nil } func MustLoadKeyspaceGroupByID(re *require.Assertions, server *tests.TestServer, id uint32) *endpoint.KeyspaceGroup { var kg *endpoint.KeyspaceGroup testutil.Eventually(re, func() bool { - var err error - kg, err = tryLoadKeyspaceGroupByID(server, id) + var err error + kg, _, err = tryLoadKeyspaceGroupByID(server, id) return err == nil && kg != nil }) return kg } func TryLoadKeyspaceGroupByID(re *require.Assertions, server *tests.TestServer, id uint32) (*endpoint.KeyspaceGroup, int) { - httpReq, err := http.NewRequest(http.MethodGet, server.GetAddr()+keyspaceGroupsPrefix+fmt.Sprintf("/%d", id), http.NoBody) - re.NoError(err) - resp, err := tests.TestDialClient.Do(httpReq) - re.NoError(err) - defer resp.Body.Close() - data, err := io.ReadAll(resp.Body) - re.NoError(err) - if resp.StatusCode != http.StatusOK { - return nil, resp.StatusCode - } - - var kg endpoint.KeyspaceGroup - re.NoError(json.Unmarshal(data, &kg)) - return &kg, resp.StatusCode + kg, code, err := tryLoadKeyspaceGroupByID(server, id) + if err != nil { + return nil, code + } + return kg, code }Also applies to: 255-270
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/server/apiv2/handlers/testutil.go` around lines 230 - 252, tryLoadKeyspaceGroupByID and TryLoadKeyspaceGroupByID duplicate the same HTTP request, response read and json.Unmarshal logic; extract the shared logic into a single helper (e.g., a new function like doGetAndUnmarshal or loadJSONInto) that takes the request URL (or http.Request) and a target interface{} to unmarshal into, then reimplement both tryLoadKeyspaceGroupByID and TryLoadKeyspaceGroupByID to call that helper and return the resulting *endpoint.KeyspaceGroup; update error propagation so behavior remains identical.tools/pd-ctl/tests/helper.go (1)
108-123: Include command output in JSON unmarshal errors for better debuggability.When JSON decoding fails, callers who pass a struct argument lose the raw output that typically explains the failure. The current implementation returns
("", err)without the output context. Since most callers in scheduler_test.go don't capture output when passing a struct for unmarshal, wrapping the error with the actual output would significantly aid debugging.Proposed fix
func TryExec(cmd *cobra.Command, args []string, v any) (string, error) { output, err := ExecuteCommand(cmd, args...) if err != nil { return "", err } if v == nil { return string(output), nil } - if err := json.Unmarshal(output, v); err != nil { - return "", err + if err := json.Unmarshal(output, v); err != nil { + return "", fmt.Errorf("unmarshal pd-ctl output: %w: %s", err, string(output)) } return "", nil }Requires adding
fmtto imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/pd-ctl/tests/helper.go` around lines 108 - 123, TryExec's JSON unmarshal branch currently returns the unmodified error without the command output, losing useful debug info; update TryExec to wrap the json.Unmarshal error with the raw command output (e.g. using fmt.Errorf("json unmarshal failed: %w; output: %s", err, string(output))) so callers receive the output in the error message, and add fmt to the imports; reference the TryExec function and the json.Unmarshal error branch when making this change.tests/testutil.go (1)
463-562: Please run failpoint-safe test commands for this change set.Given these retry/Eventually changes in test infrastructure, validate with failpoint-aware targets (
make checkplus the narrowestmake basic-test/make gotestscopes for touched packages).Based on learnings, "Before PR: run
make checkand the narrowest relevantgo test(with tags). At minimum runmake basic-testfor touched packages."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testutil.go` around lines 463 - 562, Run failpoint-aware test targets to validate the test changes: execute make check, then run the narrowest package-level tests (e.g., make basic-test or make gotest for the packages that contain TestDialClient, testutil.Eventually, TryCheckGetJSON, CheckPostJSON and related testhelpers). Ensure these commands use the failpoint-enabled test configuration (the project’s failpoint build/test flags or tags) so the retry/Eventually changes are exercised under failpoints.tests/server/api/region_test.go (1)
449-455: Use the current response variable in this polling block.This predicate fetches into
r8but callsr4.Adjust(). That looks like a stale copy-paste and makes the check harder to read.🔧 Suggested cleanup
testutil.Eventually(re, func() bool { if err := testutil.TryReadGetJSON(tests.TestDialClient, url, r8); err != nil { return false } - r4.Adjust() + r8.Adjust() return r8.Count == 1 && len(r8.Regions) > 0 && r.GetID() == r8.Regions[0].ID })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/server/api/region_test.go` around lines 449 - 455, The polling predicate is reading the response into r8 but mistakenly calls r4.Adjust(), which is a stale copy/paste; update the predicate to call r8.Adjust() so the check uses the current response variable (inside the testutil.Eventually closure that calls testutil.TryReadGetJSON and then compares r8.Count, r8.Regions, and r.GetID()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integrations/tso/server_test.go`:
- Around line 128-164: tryRequest can block forever on tsoClient.Recv/Send; wrap
each attempt in a short per-attempt context with timeout (e.g.
context.WithTimeout(ctx, shortDuration)) and use that derived context when
calling suite.pdClient.Tso(ctx) / suite.tsoClient.Tso(ctx) and any gRPC ops;
ensure you call cancel() in a defer, keep the existing CloseSend() cleanup, and
apply the same timeout-wrapped context for both the legacy (pdpb.TsoRequest) and
non-legacy (tsopb.TsoRequest) branches so each Send/Recv is bounded and the
Eventually loop cannot hang indefinitely.
In `@tests/server/api/api_test.go`:
- Around line 945-948: The test is unmarshalling a stale `output` variable
instead of the current HTTP response body for the polling iteration, which can
hide failures; update the assertion to read and unmarshal the fresh response
body (the bytes you just got from the request in this loop, e.g., the current
`resp.Body`/`body` variable) into `p` and then assert p.Action == "removing" &&
p.Progress == 0.0 && p.CurrentSpeed == 0.0 && p.LeftSeconds == math.MaxFloat64
so the check uses the latest response rather than the previously captured
`output`.
In `@tests/server/api/store_test.go`:
- Around line 528-546: The function tryRequestStatusBody leaks resp.Body when
io.ReadAll returns an error; update tryRequestStatusBody to ensure resp.Body is
always closed (use defer resp.Body.Close() immediately after client.Do succeeds)
and handle any Close() error if needed before returning the read error, so every
early return after client.Do cleans up the response body.
In `@tools/pd-ctl/tests/config/config_test.go`:
- Around line 856-870: The polling predicate used with testutil.Eventually (in
checkLoadRuleBundle) only checks length/presence which can hide
stale-but-same-length data; change the predicate to validate bundle contents
inside Eventually: in the anonymous func passed to testutil.Eventually (the code
that calls ExecuteCommand, os.ReadFile and json.Unmarshal into bundles), perform
a deterministic deep comparison between bundles and expectValues (e.g.,
reflect.DeepEqual or the test helper used by assertBundles) and return true only
when they match; keep all error cases returning false so the retry continues,
and remove the separate assertBundles call outside Eventually (or keep it as a
final assertion but ensure the predicate already guarantees content equality).
In `@tools/pd-ctl/tests/scheduler/scheduler_test.go`:
- Around line 385-392: The Eventually loop calls json.Unmarshal into a reused
result map without checking errors, which can leave stale data from prior
iterations and cause false positives; update the closure used by
testutil.Eventually (the one invoking tests.ExecuteCommand and json.Unmarshal)
to reinitialize or reset the result destination (e.g., create a fresh local map
variable) at the start of each iteration and capture the error returned by
json.Unmarshal, returning false on unmarshal failure so only successfully
parsed, current command output can satisfy the predicate. Ensure the change is
applied to the same pattern found around testutil.Eventually usages that call
tests.ExecuteCommand and json.Unmarshal.
---
Outside diff comments:
In `@pkg/mcs/discovery/register_test.go`:
- Around line 73-92: The predicate passed to testutil.Eventually must not call
tryGetKeyAfterLeaseExpired (which sleeps DefaultLeaseInSeconds+500ms); instead
compute a single deadline before calling testutil.Eventually (e.g. deadline :=
time.Now().Add(time.Duration(DefaultLeaseInSeconds+500)*time.Millisecond)) and
have the predicate return false until time.Now() >= deadline, after which do a
fast non-sleeping client.Get loop (or a quick single Get) to check sr.key and
compare the value to "127.0.0.1:2"; replace the current call to
tryGetKeyAfterLeaseExpired inside testutil.Eventually with this deadline-gated
check so retries don’t multiply the sleep.
In `@pkg/utils/etcdutil/etcdutil_test.go`:
- Around line 193-201: The predicate in the test currently returns true
immediately when RemoveEtcdMember succeeds without verifying the cluster state;
change the predicate used in testutil.Eventually to always call
ListEtcdMembers(client1.Ctx(), client1) and only return true when the returned
member list reflects that memIDToRemove is gone (e.g., len(listResp.Members) ==
1 or the specific member ID is absent), treating RemoveEtcdMember errors as
transient and not as a success condition; update the code around
RemoveEtcdMember, ListEtcdMembers, client1 and memIDToRemove so the end-state
check (member list) determines success.
In `@tests/server/api/rule_test.go`:
- Around line 548-559: The predicate passed to testutil.Eventually incorrectly
returns true when no rule with GroupID == "e" is found, allowing the success
path to pass incorrectly; update the closure used in the testCase.success branch
(the anonymous func passed to testutil.Eventually that calls
testutil.TryReadGetJSON and iterates resp) so that if no r with r.GroupID == "e"
is found it returns false (and only returns the result of compareRule(r, &rule)
when a matching rule is present), ensuring Eventually keeps polling until the
expected rule appears and compareRule returns true.
In `@tests/server/member/member_test.go`:
- Around line 112-137: Delete the unused function checkMemberList from the file
(the entire function block starting with func checkMemberList(...)) since it's
dead code and replaced by tryCheckMemberList; remove any now-unused imports that
were only required by checkMemberList (e.g., io, json, pdpb, errors, or
tests.TestDialClient if they become unused), and run go vet/go build to ensure
no remaining references to checkMemberList exist.
In `@tools/pd-ctl/tests/config/config_test.go`:
- Around line 660-668: The Eventually predicate is too lax—it only checks for
absence of "404" and lets invalid or non-JSON responses escape the retry loop
causing json.Unmarshal and re.Equal on group to fail; update the
testutil.Eventually predicate (the anonymous func passed to testutil.Eventually
that calls tests.ExecuteCommand with cmd and pdAddr) to attempt
json.Unmarshal(output, &group) and verify the unmarshaled value equals
placement.RuleGroup{ID: placement.DefaultGroupID} inside the predicate,
returning true only when unmarshalling succeeds and the group matches; leave the
external re.NoError/re.Equal assertions out of the outer scope since the
predicate will now ensure valid JSON and the expected group before exiting.
In `@tools/pd-ctl/tests/member/member_test.go`:
- Line 53: The test contains a duplicate deferred call to cluster.Destroy() (the
variable cluster and its Destroy() method are deferred twice); remove the
redundant defer so Destroy() is only scheduled once—leave the original defer
cluster.Destroy() (the one at line ~40) and delete the second defer
cluster.Destroy() at line ~53 to avoid double destruction and potential
double-close errors.
---
Nitpick comments:
In `@client/clients/tso/stream_test.go`:
- Around line 361-365: The test's Eventually predicate currently requires both
processErr and callbackErr to be non-nil, which is brittle; update the predicate
used around s.stream.processRequests (referencing processErr and callbackErr and
the callback passed to stream.processRequests) to accept either error path by
returning processErr != nil || callbackErr != nil so the test succeeds if the
error is delivered synchronously or via the callback.
In `@tests/integrations/mcs/tso/keyspace_group_manager_test.go`:
- Around line 242-244: The test in
tests/integrations/mcs/tso/keyspace_group_manager_test.go relies on failpoints
and must be executed with failpoint-aware targets; run it via make basic-test or
make gotest (not plain go test). Ensure the test file imports
github.com/pingcap/failpoint and that failpoints are enabled/disabled properly
(e.g., enable required failpoints in TestMain or setup and call
failpoint.Disable in cleanup) so the branch guarded by the if err != nil || tam
== nil check behaves deterministically under test; re-run the touched *_test.go
with the make targets to validate the fix.
In `@tests/integrations/tso/consistency_test.go`:
- Around line 139-141: The defer currently swallows errors from
tsoClient.CloseSend(), which loses useful failure context; introduce an
outer-scope variable (e.g., lastErr) and assign CloseSend()'s error to it
instead of discarding, also set lastErr for the other failure points referenced
(the checks at the other failing calls around the Eventually block), and update
the test teardown/assertion so that if the Eventually assertion fails you
surface lastErr (assert or t.Log/fatal with lastErr) to preserve the root-cause
error for timeout diagnostics; reference the tsoClient.CloseSend() call, the
Eventually assertion, and the new lastErr variable when making changes.
In `@tests/server/api/region_test.go`:
- Around line 449-455: The polling predicate is reading the response into r8 but
mistakenly calls r4.Adjust(), which is a stale copy/paste; update the predicate
to call r8.Adjust() so the check uses the current response variable (inside the
testutil.Eventually closure that calls testutil.TryReadGetJSON and then compares
r8.Count, r8.Regions, and r.GetID()).
In `@tests/server/apiv2/handlers/testutil.go`:
- Around line 230-252: tryLoadKeyspaceGroupByID and TryLoadKeyspaceGroupByID
duplicate the same HTTP request, response read and json.Unmarshal logic; extract
the shared logic into a single helper (e.g., a new function like
doGetAndUnmarshal or loadJSONInto) that takes the request URL (or http.Request)
and a target interface{} to unmarshal into, then reimplement both
tryLoadKeyspaceGroupByID and TryLoadKeyspaceGroupByID to call that helper and
return the resulting *endpoint.KeyspaceGroup; update error propagation so
behavior remains identical.
In `@tests/testutil.go`:
- Around line 463-562: Run failpoint-aware test targets to validate the test
changes: execute make check, then run the narrowest package-level tests (e.g.,
make basic-test or make gotest for the packages that contain TestDialClient,
testutil.Eventually, TryCheckGetJSON, CheckPostJSON and related testhelpers).
Ensure these commands use the failpoint-enabled test configuration (the
project’s failpoint build/test flags or tags) so the retry/Eventually changes
are exercised under failpoints.
In `@tools/pd-ctl/tests/helper.go`:
- Around line 108-123: TryExec's JSON unmarshal branch currently returns the
unmodified error without the command output, losing useful debug info; update
TryExec to wrap the json.Unmarshal error with the raw command output (e.g. using
fmt.Errorf("json unmarshal failed: %w; output: %s", err, string(output))) so
callers receive the output in the error message, and add fmt to the imports;
reference the TryExec function and the json.Unmarshal error branch when making
this change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 762c67aa-0c1e-4db7-b55d-2ebfdf84ad4d
⛔ Files ignored due to path filters (4)
client/go.sumis excluded by!**/*.sumgo.sumis excluded by!**/*.sumtests/integrations/go.sumis excluded by!**/*.sumtools/go.sumis excluded by!**/*.sum
📒 Files selected for processing (47)
client/clients/tso/stream_test.goclient/go.modgo.modpkg/election/leadership_test.gopkg/mcs/discovery/register_test.gopkg/mcs/resourcemanager/server/manager_test.gopkg/schedule/scatter/region_scatterer_test.gopkg/storage/leveldb_backend_test.gopkg/tso/keyspace_group_manager_test.gopkg/utils/etcdutil/etcdutil_test.gopkg/utils/testutil/api_check.gotests/integrations/client/client_test.gotests/integrations/client/http_client_test.gotests/integrations/client/router_client_test.gotests/integrations/go.modtests/integrations/mcs/resourcemanager/resource_manager_test.gotests/integrations/mcs/router/server_test.gotests/integrations/mcs/scheduling/api_test.gotests/integrations/mcs/scheduling/config_test.gotests/integrations/mcs/scheduling/rule_test.gotests/integrations/mcs/scheduling/server_test.gotests/integrations/mcs/tso/keyspace_group_manager_test.gotests/integrations/mcs/tso/server_test.gotests/integrations/realcluster/scheduler_test.gotests/integrations/tso/consistency_test.gotests/integrations/tso/server_test.gotests/server/api/admin_test.gotests/server/api/api_test.gotests/server/api/diagnostic_test.gotests/server/api/min_resolved_ts_test.gotests/server/api/region_test.gotests/server/api/rule_test.gotests/server/api/scheduler_test.gotests/server/api/store_test.gotests/server/apiv2/handlers/affinity_test.gotests/server/apiv2/handlers/testutil.gotests/server/cluster/cluster_test.gotests/server/config/config_test.gotests/server/member/member_test.gotests/testutil.gotools/go.modtools/pd-ctl/tests/config/config_test.gotools/pd-ctl/tests/helper.gotools/pd-ctl/tests/keyspace/keyspace_group_test.gotools/pd-ctl/tests/keyspace/keyspace_test.gotools/pd-ctl/tests/member/member_test.gotools/pd-ctl/tests/scheduler/scheduler_test.go
💤 Files with no reviewable changes (3)
- tools/go.mod
- tests/integrations/go.mod
- go.mod
- Fix confusing-naming: rename tryLoadKeyspaceGroupByID to loadKeyspaceGroupByIDNoAssert - Fix staticcheck S1008: simplify if-return patterns - Fix unparam: remove unused expectedCode parameter from tryCheckResp - Fix unused: delete checkMemberList and mustGetAffinityGroup that were superseded by try* variants - Fix nil panic in TestResourceGroupController: add nil check before calling err.Error() inside Eventually Issue Number: ref tikv#10164 Signed-off-by: JmPotato <github@ipotato.me>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/server/api/api_test.go (1)
945-947:⚠️ Potential issue | 🟠 MajorUse current response body instead of stale
outputin this polling iteration.Line 945 unmarshals
outputfrom outside this request path, so this branch can pass/fail against stale data instead of the response just fetched. That can mask real regressions.💡 Suggested fix
defer resp.Body.Close() if resp.StatusCode != http.StatusOK { return false } + body, err := io.ReadAll(resp.Body) + if err != nil { + return false + } // is not prepared - if json.Unmarshal(output, &p) != nil { + if json.Unmarshal(body, &p) != nil { return false } return p.Action == "removing" && p.Progress == 0.0 && p.CurrentSpeed == 0.0 && p.LeftSeconds == math.MaxFloat64🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/server/api/api_test.go` around lines 945 - 947, The test is unmarshalling the stale variable `output` instead of the current request's response bytes, so replace `json.Unmarshal(output, &p)` with `json.Unmarshal(<currentResponseBody>, &p)` using the local variable holding the latest response in this polling iteration (the byte slice produced by the current request), and propagate/check the returned error from json.Unmarshal; locate this change around the polling loop where `output`, `p`, and the HTTP response/body are handled (search for `output`, `p`, and `json.Unmarshal` in the test) and update the call to use the fresh response body variable.
🧹 Nitpick comments (1)
tests/server/member/member_test.go (1)
117-124: Wrap helper errors with operation context.
return errhere loses key context in Eventually logs. Wrapping these failures makes retries debuggable.🧩 Proposed refactor
res, err := tests.TestDialClient.Get(addr) if err != nil { - return err + return errors.Wrap(err, "get /pd/api/v1/members") } defer res.Body.Close() buf, err := io.ReadAll(res.Body) if err != nil { - return err + return errors.Wrap(err, "read members response body") } @@ data := &pdpb.GetMembersResponse{} if err = json.Unmarshal(buf, data); err != nil { - return err + return errors.Wrap(err, "unmarshal members response") }As per coding guidelines, "Wrap errors with
github.com/pingcap/errors(errors.Wrap/Annotate) orfmt.Errorf("...: %w", err)."Also applies to: 129-131
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/server/member/member_test.go` around lines 117 - 124, The helper currently returns raw errors from response operations (res.Body.Close(), io.ReadAll) which lose context; update the return statements in the helper in tests/server/member/member_test.go to wrap errors (e.g., use fmt.Errorf("closing response body: %w", err) and fmt.Errorf("reading response body: %w", err) or errors.Wrap/Annotate) so Eventually logs show operation context, and apply the same wrapping to the other occurrences referenced around the helper (the subsequent returns at the other read/close points).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/server/member/member_test.go`:
- Around line 132-146: The test can false-pass because the nested loops only
validate URLs when names randomly match; change the validation to build a map of
returned members keyed by member.GetName() (from data.GetMembers()), assert the
map size equals len(configs), then for each cfg in configs look up cfg.Name in
that map and return an error if missing; for each found member verify
len(member.ClientUrls)==1 && member.ClientUrls[0]==cfg.ClientUrls and
len(member.PeerUrls)==1 && member.PeerUrls[0]==cfg.PeerUrls, returning
descriptive errors on mismatch.
---
Duplicate comments:
In `@tests/server/api/api_test.go`:
- Around line 945-947: The test is unmarshalling the stale variable `output`
instead of the current request's response bytes, so replace
`json.Unmarshal(output, &p)` with `json.Unmarshal(<currentResponseBody>, &p)`
using the local variable holding the latest response in this polling iteration
(the byte slice produced by the current request), and propagate/check the
returned error from json.Unmarshal; locate this change around the polling loop
where `output`, `p`, and the HTTP response/body are handled (search for
`output`, `p`, and `json.Unmarshal` in the test) and update the call to use the
fresh response body variable.
---
Nitpick comments:
In `@tests/server/member/member_test.go`:
- Around line 117-124: The helper currently returns raw errors from response
operations (res.Body.Close(), io.ReadAll) which lose context; update the return
statements in the helper in tests/server/member/member_test.go to wrap errors
(e.g., use fmt.Errorf("closing response body: %w", err) and fmt.Errorf("reading
response body: %w", err) or errors.Wrap/Annotate) so Eventually logs show
operation context, and apply the same wrapping to the other occurrences
referenced around the helper (the subsequent returns at the other read/close
points).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 710ff17b-cce5-4019-a501-9a75ce9dbb0d
📒 Files selected for processing (6)
pkg/utils/testutil/api_check.gotests/integrations/mcs/resourcemanager/resource_manager_test.gotests/server/api/api_test.gotests/server/apiv2/handlers/affinity_test.gotests/server/apiv2/handlers/testutil.gotests/server/member/member_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/server/apiv2/handlers/affinity_test.go
- pkg/utils/testutil/api_check.go
- tests/server/apiv2/handlers/testutil.go
The function was superseded by tests.TryExec and tests.ExecuteCommand calls inside Eventually. Issue Number: ref tikv#10164 Signed-off-by: JmPotato <github@ipotato.me>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tools/pd-ctl/tests/scheduler/scheduler_test.go (1)
385-392:⚠️ Potential issue | 🟠 MajorHandle
json.Unmarshalerrors and reset decode destinations in retry loops.At Line 390, Line 568, Line 588, and Line 903, decode errors are ignored while reusing destination variables across retries. That can preserve stale values and let predicates pass on invalid current output.
Suggested fix (apply same pattern to all affected blocks)
result := make(map[string]any) testutil.Eventually(re, func() bool { - output, err := tests.ExecuteCommand(cmd, []string{"-u", pdAddr, "scheduler", "describe", "balance-leader-scheduler"}...) - if err != nil { - return false - } - json.Unmarshal(output, &result) + output, err := tests.ExecuteCommand(cmd, []string{"-u", pdAddr, "scheduler", "describe", "balance-leader-scheduler"}...) + if err != nil { + return false + } + result = make(map[string]any) + if err := json.Unmarshal(output, &result); err != nil { + return false + } return len(result) != 0 && result["status"] == "paused" && result["summary"] == "" }, testutil.WithWaitFor(30*time.Second))testutil.Eventually(re, func() bool { - output, err := tests.ExecuteCommand(cmd, []string{"-u", pdAddr, "scheduler", "config", "balance-range-scheduler"}...) - if err != nil { - return false - } - json.Unmarshal(output, &rangeConf) + output, err := tests.ExecuteCommand(cmd, []string{"-u", pdAddr, "scheduler", "config", "balance-range-scheduler"}...) + if err != nil { + return false + } + rangeConf = nil + if err := json.Unmarshal(output, &rangeConf); err != nil { + return false + } if len(rangeConf) == 0 { return false }#!/bin/bash set -euo pipefail FILE="tools/pd-ctl/tests/scheduler/scheduler_test.go" # Show Eventually blocks that still call json.Unmarshal without local error handling nearby. rg -n 'testutil\.Eventually|json\.Unmarshal\(' "$FILE" -C 3 # Inspect the previously flagged ranges directly. sed -n '380,395p' "$FILE" sed -n '558,592p' "$FILE" sed -n '891,908p' "$FILE"Also applies to: 563-574, 584-590, 896-907
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/pd-ctl/tests/scheduler/scheduler_test.go` around lines 385 - 392, The Eventually blocks calling tests.ExecuteCommand then json.Unmarshal into the reused variable result ignore decode errors and can leave stale values passing the predicate; update each failing loop (the testutil.Eventually closure around tests.ExecuteCommand for cmd and scheduler "describe" calls) to reset or reinitialize the decode destination (e.g., create a fresh local map/struct inside the closure) and check the error returned by json.Unmarshal, returning false on unmarshal error so the retry continues; ensure you handle and log or surface the unmarshal error locally rather than letting previous contents of result be used in the predicate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tools/pd-ctl/tests/scheduler/scheduler_test.go`:
- Around line 385-392: The Eventually blocks calling tests.ExecuteCommand then
json.Unmarshal into the reused variable result ignore decode errors and can
leave stale values passing the predicate; update each failing loop (the
testutil.Eventually closure around tests.ExecuteCommand for cmd and scheduler
"describe" calls) to reset or reinitialize the decode destination (e.g., create
a fresh local map/struct inside the closure) and check the error returned by
json.Unmarshal, returning false on unmarshal error so the retry continues;
ensure you handle and log or surface the unmarshal error locally rather than
letting previous contents of result be used in the predicate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 59e99d44-e1f3-4c0e-90af-ab2b30b486fa
📒 Files selected for processing (1)
tools/pd-ctl/tests/scheduler/scheduler_test.go
Fix remaining Eventually anti-patterns and timing issues found in CI: - scheduler_test.go (server/api): wrap scheduler existence check in Eventually with TryReadGetJSON/TryCheckPostJSON to handle API not ready in microservice mode - scheduler_test.go (tools): move status/timeout/ranges assertions inside Eventually condition using plain Go comparisons - config_test.go: rewrite assertTTLConfig to use checkTTLConfig with plain Go comparisons inside Eventually, fixing race where only one field was checked in Eventually while all fields were asserted outside - rule_test.go: extend Eventually condition to verify rule content (not just count) since count stays at 2 during rule transitions - tso_keyspace_group_test.go: add non-asserting getKeyspaceGroup method and use it in all Eventually blocks instead of tryGetKeyspaceGroup which uses re.NoError internally - operator_test.go: wrap operator add command in Eventually to handle scheduling server not ready in microservice mode Issue Number: ref tikv#10164 Signed-off-by: JmPotato <github@ipotato.me>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tools/pd-ctl/tests/scheduler/scheduler_test.go (1)
385-392:⚠️ Potential issue | 🟠 MajorUnchecked
json.UnmarshalinEventuallypredicates can produce stale-state false positives.Line 390, Line 568, Line 601, and Line 916 ignore unmarshal errors while reusing destination variables across retries. A bad JSON response can leave prior values and incorrectly satisfy the predicate.
✅ Proposed fix (use `TryExec` + fresh destination per retry)
@@ result := make(map[string]any) testutil.Eventually(re, func() bool { - output, err := tests.ExecuteCommand(cmd, []string{"-u", pdAddr, "scheduler", "describe", "balance-leader-scheduler"}...) - if err != nil { + curr := make(map[string]any) + if _, err := tests.TryExec(cmd, []string{"-u", pdAddr, "scheduler", "describe", "balance-leader-scheduler"}, &curr); err != nil { return false } - json.Unmarshal(output, &result) + result = curr return len(result) != 0 && result["status"] == "paused" && result["summary"] == "" }, testutil.WithWaitFor(30*time.Second)) @@ testutil.Eventually(re, func() bool { - output, err := tests.ExecuteCommand(cmd, []string{"-u", pdAddr, "scheduler", "config", "balance-range-scheduler"}...) - if err != nil { + curr := []map[string]any{} + if _, err := tests.TryExec(cmd, []string{"-u", pdAddr, "scheduler", "config", "balance-range-scheduler"}, &curr); err != nil { return false } - json.Unmarshal(output, &rangeConf) + rangeConf = curr if len(rangeConf) == 0 { return false } @@ testutil.Eventually(re, func() bool { - output, err := tests.ExecuteCommand(cmd, []string{"-u", pdAddr, "scheduler", "config", "balance-range-scheduler"}...) - if err != nil { + curr := []map[string]any{} + if _, err := tests.TryExec(cmd, []string{"-u", pdAddr, "scheduler", "config", "balance-range-scheduler"}, &curr); err != nil { return false } - json.Unmarshal(output, &rangeConf) + rangeConf = curr return len(rangeConf) == 2 }) @@ checkSchedulerDescribeCommand := func(schedulerName, expectedStatus, expectedSummary string) { result := make(map[string]any) testutil.Eventually(re, func() bool { - output, err := tests.ExecuteCommand(cmd, []string{"-u", pdAddr, "scheduler", "describe", schedulerName}...) - if err != nil { + curr := make(map[string]any) + if _, err := tests.TryExec(cmd, []string{"-u", pdAddr, "scheduler", "describe", schedulerName}, &curr); err != nil { return false } - json.Unmarshal(output, &result) + result = curr suite.T().Log(result) return len(result) != 0 && expectedStatus == result["status"] && expectedSummary == result["summary"] }, testutil.WithTickInterval(50*time.Millisecond)) }#!/bin/bash set -euo pipefail # Verify the remaining Eventually+ExecuteCommand+json.Unmarshal sites with unchecked unmarshal errors. f="tools/pd-ctl/tests/scheduler/scheduler_test.go" echo "=== candidate lines ===" rg -n 'testutil\.Eventually|ExecuteCommand\(|json\.Unmarshal\(' "$f" -C 2 echo echo "=== focused contexts ===" sed -n '380,395p' "$f" sed -n '560,605p' "$f" sed -n '905,920p' "$f"Also applies to: 563-592, 596-603, 909-919
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/pd-ctl/tests/scheduler/scheduler_test.go` around lines 385 - 392, The predicate in testutil.Eventually uses tests.ExecuteCommand + json.Unmarshal into a reused variable (result) and ignores Unmarshal errors, which can reuse stale data and cause false positives; change the predicate to use tests.TryExec (or ExecuteCommand but capture and return errors) and allocate a fresh local variable for unmarshalling on each retry, check and fail the predicate if json.Unmarshal returns an error, and only then evaluate fields like result["status"]/"summary" (apply this pattern to the predicates around testutil.Eventually, tests.ExecuteCommand, json.Unmarshal and the reused result variable for the scheduler tests referencing cmd and pdAddr).
🧹 Nitpick comments (2)
tests/integrations/mcs/keyspace/tso_keyspace_group_test.go (1)
563-584: Add context when propagating helper errors.
getKeyspaceGroupreturns raw errors from multiple operations; wrapping them will make CI/debug triage much faster.As per coding guidelines, "Wrap errors with `github.com/pingcap/errors` (`errors.Wrap`/`Annotate`) or `fmt.Errorf(\"...: %w\", err)`."♻️ Proposed refactor
func (suite *keyspaceGroupTestSuite) getKeyspaceGroup(id uint32) (*endpoint.KeyspaceGroup, int, error) { httpReq, err := http.NewRequest(http.MethodGet, suite.server.GetAddr()+keyspaceGroupsPrefix+fmt.Sprintf("/%d", id), http.NoBody) if err != nil { - return nil, 0, err + return nil, 0, fmt.Errorf("create get keyspace group request: %w", err) } resp, err := tests.TestDialClient.Do(httpReq) if err != nil { - return nil, 0, err + return nil, 0, fmt.Errorf("send get keyspace group request: %w", err) } defer resp.Body.Close() kg := &endpoint.KeyspaceGroup{} if resp.StatusCode == http.StatusOK { bodyBytes, err := io.ReadAll(resp.Body) if err != nil { - return nil, resp.StatusCode, err + return nil, resp.StatusCode, fmt.Errorf("read get keyspace group response body: %w", err) } if err := json.Unmarshal(bodyBytes, kg); err != nil { - return nil, resp.StatusCode, err + return nil, resp.StatusCode, fmt.Errorf("decode get keyspace group response: %w", err) } } return kg, resp.StatusCode, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integrations/mcs/keyspace/tso_keyspace_group_test.go` around lines 563 - 584, The getKeyspaceGroup helper currently returns raw errors from http.NewRequest, tests.TestDialClient.Do, io.ReadAll, and json.Unmarshal; update getKeyspaceGroup to wrap those errors with context (using fmt.Errorf("...: %w", err) or github.com/pingcap/errors.Wrap/Annotate) so callers see which operation failed and the id involved — wrap the error from http.NewRequest with the request context and id, wrap the TestDialClient.Do error indicating the GET to keyspaceGroupsPrefix and id, wrap io.ReadAll errors indicating reading the response body for that id, and wrap json.Unmarshal errors indicating decoding the KeyspaceGroup for that id.tests/server/api/scheduler_test.go (1)
916-930: Make cancelled-job assertion resilient to job-id ordering assumptions.Line 930 hardcodes
scheduler[1]after sorting. This is brittle if job-id allocation baseline changes. Match the cancelled record byjob-id(the request cancelsjob-id=1) instead of fixed index.Proposed change
- return scheduler[1]["status"] == "cancelled" + for _, job := range scheduler { + id, okID := job["job-id"].(float64) + status, okStatus := job["status"].(string) + if okID && okStatus && id == 1 && status == "cancelled" { + return true + } + } + return false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/server/api/scheduler_test.go` around lines 916 - 930, The test currently assumes the cancelled job is at scheduler[1] after sorting; instead iterate the scheduler slice (the variable scheduler) and find the map entry whose "job-id" (a float64 in the maps) equals 1 (or 1.0), then assert that that entry's "status" == "cancelled". Replace the hardcoded scheduler[1] check with a search over scheduler maps (checking map["job-id"] and then map["status"]) and fail the test if no matching job-id is found or its status is not "cancelled".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integrations/mcs/scheduling/rule_test.go`:
- Around line 205-214: The eventual predicate currently only verifies
labelRules[1].ID == labelRule.ID but doesn't assert removal of the old "rule1";
update the predicate that inspects labelRules (in the test around the sorted
slice check using labelRules and labelRule.ID) to explicitly assert both that no
element in labelRules has ID == "rule1" and that one element equals labelRule.ID
(the expected "rule2"); keep the existing length check and sorting (sort.Slice
on labelRules) and return true only when length==2, "rule1" is absent, and
labelRule.ID is present.
In `@tools/pd-ctl/tests/operator/operator_test.go`:
- Around line 197-201: The range variable testCase is captured by the closure
passed to testutil.Eventually; create a per-iteration local copy (e.g., tc :=
testCase) immediately inside the loop and use tc in the closure and subsequent
calls (including tests.ExecuteCommand and any checks) instead of testCase to
avoid capturing the loop variable. Ensure the new local name (tc) replaces
testCase everywhere inside that iteration's closure and following statements.
---
Duplicate comments:
In `@tools/pd-ctl/tests/scheduler/scheduler_test.go`:
- Around line 385-392: The predicate in testutil.Eventually uses
tests.ExecuteCommand + json.Unmarshal into a reused variable (result) and
ignores Unmarshal errors, which can reuse stale data and cause false positives;
change the predicate to use tests.TryExec (or ExecuteCommand but capture and
return errors) and allocate a fresh local variable for unmarshalling on each
retry, check and fail the predicate if json.Unmarshal returns an error, and only
then evaluate fields like result["status"]/"summary" (apply this pattern to the
predicates around testutil.Eventually, tests.ExecuteCommand, json.Unmarshal and
the reused result variable for the scheduler tests referencing cmd and pdAddr).
---
Nitpick comments:
In `@tests/integrations/mcs/keyspace/tso_keyspace_group_test.go`:
- Around line 563-584: The getKeyspaceGroup helper currently returns raw errors
from http.NewRequest, tests.TestDialClient.Do, io.ReadAll, and json.Unmarshal;
update getKeyspaceGroup to wrap those errors with context (using
fmt.Errorf("...: %w", err) or github.com/pingcap/errors.Wrap/Annotate) so
callers see which operation failed and the id involved — wrap the error from
http.NewRequest with the request context and id, wrap the TestDialClient.Do
error indicating the GET to keyspaceGroupsPrefix and id, wrap io.ReadAll errors
indicating reading the response body for that id, and wrap json.Unmarshal errors
indicating decoding the KeyspaceGroup for that id.
In `@tests/server/api/scheduler_test.go`:
- Around line 916-930: The test currently assumes the cancelled job is at
scheduler[1] after sorting; instead iterate the scheduler slice (the variable
scheduler) and find the map entry whose "job-id" (a float64 in the maps) equals
1 (or 1.0), then assert that that entry's "status" == "cancelled". Replace the
hardcoded scheduler[1] check with a search over scheduler maps (checking
map["job-id"] and then map["status"]) and fail the test if no matching job-id is
found or its status is not "cancelled".
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fa9ab7b3-d03d-416b-9305-1bce3d632b38
📒 Files selected for processing (6)
tests/integrations/mcs/keyspace/tso_keyspace_group_test.gotests/integrations/mcs/scheduling/rule_test.gotests/server/api/scheduler_test.gotests/server/config/config_test.gotools/pd-ctl/tests/operator/operator_test.gotools/pd-ctl/tests/scheduler/scheduler_test.go
| if len(labelRules) != 2 { | ||
| return false | ||
| } | ||
| sort.Slice(labelRules, func(i, j int) bool { | ||
| return labelRules[i].ID < labelRules[j].ID | ||
| }) | ||
| // After the patch, "rule1" should be deleted and "rule2" should be present. | ||
| // We must check the content, not just the count, because the count stays at 2 | ||
| // during the transition (old rules -> new rules). | ||
| return labelRules[1].ID == labelRule.ID |
There was a problem hiding this comment.
Eventually predicate doesn’t explicitly assert rule1 removal.
At Line 214, the check only confirms rule2 is at index 1 after sorting. It does not directly assert that rule1 is gone, which is what Lines 211-213 claim.
Suggested predicate update
testutil.Eventually(re, func() bool {
labelRules = regionLabeler.GetAllLabelRules()
if len(labelRules) != 2 {
return false
}
- sort.Slice(labelRules, func(i, j int) bool {
- return labelRules[i].ID < labelRules[j].ID
- })
- // After the patch, "rule1" should be deleted and "rule2" should be present.
- // We must check the content, not just the count, because the count stays at 2
- // during the transition (old rules -> new rules).
- return labelRules[1].ID == labelRule.ID
+ hasRule2 := false
+ hasRule1 := false
+ for _, r := range labelRules {
+ if r.ID == labelRule.ID {
+ hasRule2 = true
+ }
+ if r.ID == "rule1" {
+ hasRule1 = true
+ }
+ }
+ // After the patch, "rule1" should be deleted and "rule2" should be present.
+ return hasRule2 && !hasRule1
})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrations/mcs/scheduling/rule_test.go` around lines 205 - 214, The
eventual predicate currently only verifies labelRules[1].ID == labelRule.ID but
doesn't assert removal of the old "rule1"; update the predicate that inspects
labelRules (in the test around the sorted slice check using labelRules and
labelRule.ID) to explicitly assert both that no element in labelRules has ID ==
"rule1" and that one element equals labelRule.ID (the expected "rule2"); keep
the existing length check and sorting (sort.Slice on labelRules) and return true
only when length==2, "rule1" is absent, and labelRule.ID is present.
| for _, testCase := range testCases { | ||
| output, err = tests.ExecuteCommand(cmd, testCase.cmd...) | ||
| re.NoError(err) | ||
| re.NotContains(string(output), "Failed") | ||
| testutil.Eventually(re, func() bool { | ||
| output, err = tests.ExecuteCommand(cmd, testCase.cmd...) | ||
| return err == nil && !strings.Contains(string(output), "Failed") | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Checking Go version in go.mod ==="
head -5 go.mod
echo ""
echo "=== Reading the full context of the flagged code (lines 190-215) ==="
sed -n '190,215p' tools/pd-ctl/tests/operator/operator_test.goRepository: tikv/pd
Length of output: 1374
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Finding testutil.Eventually implementation ==="
find . -type f -name "*.go" | xargs grep -l "func Eventually"
echo ""
echo "=== Examining testutil.Eventually ==="
rg -n -A 15 'func Eventually\(' --type goRepository: tikv/pd
Length of output: 1979
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full loop context with line numbers (lines 195-215) ==="
sed -n '195,215p' tools/pd-ctl/tests/operator/operator_test.go | cat -n -v
echo ""
echo "=== Go version check (should be 1.22+) ==="
grep '^go ' go.mod
echo ""
echo "=== Verify: Are output and err declared outside the loop? ==="
sed -n '180,200p' tools/pd-ctl/tests/operator/operator_test.go | head -30Repository: tikv/pd
Length of output: 2248
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Searching for variable declarations before the loop (output, err) ==="
sed -n '1,196p' tools/pd-ctl/tests/operator/operator_test.go | grep -n "var \|output\|err"
echo ""
echo "=== Confirm: Check if Eventually modifies outer scope vars ==="
sed -n '197,210p' tools/pd-ctl/tests/operator/operator_test.go | cat -nRepository: tikv/pd
Length of output: 1204
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Context: Full function signature and variable declarations ==="
sed -n '75,120p' tools/pd-ctl/tests/operator/operator_test.go | head -40Repository: tikv/pd
Length of output: 1215
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check AGENTS.md for the exact guideline text ==="
cat AGENTS.md | grep -A 3 -B 3 "loop var"
echo ""
echo "=== Verify testCase is used outside the closure ==="
sed -n '197,212p' tools/pd-ctl/tests/operator/operator_test.go | cat -nRepository: tikv/pd
Length of output: 1192
Capture loop variable in a local copy before using in closure.
testCase is captured directly in the closure on line 198. Repository guidelines explicitly require making a local copy for range variables used in closures: "Do not capture loop vars by pointer; copy inside loop (v := val). Range-in-closure/address rules enforced; make local copies."
Use a per-iteration copy (tc := testCase) and reference tc inside and after the closure.
🔧 Proposed fix
for _, testCase := range testCases {
+ tc := testCase
testutil.Eventually(re, func() bool {
- output, err = tests.ExecuteCommand(cmd, testCase.cmd...)
- return err == nil && !strings.Contains(string(output), "Failed")
+ out, execErr := tests.ExecuteCommand(cmd, tc.cmd...)
+ return execErr == nil && !strings.Contains(string(out), "Failed")
})
- output, err = tests.ExecuteCommand(cmd, testCase.show...)
+ output, err = tests.ExecuteCommand(cmd, tc.show...)
re.NoError(err)
- re.Contains(string(output), testCase.expect)
+ re.Contains(string(output), tc.expect)
start := time.Now()
- _, err = tests.ExecuteCommand(cmd, testCase.reset...)
+ _, err = tests.ExecuteCommand(cmd, tc.reset...)
re.NoError(err)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/pd-ctl/tests/operator/operator_test.go` around lines 197 - 201, The
range variable testCase is captured by the closure passed to
testutil.Eventually; create a per-iteration local copy (e.g., tc := testCase)
immediately inside the loop and use tc in the closure and subsequent calls
(including tests.ExecuteCommand and any checks) instead of testCase to avoid
capturing the loop variable. Ensure the new local name (tc) replaces testCase
everywhere inside that iteration's closure and following statements.
…Change - affinity_test.go: wrap group list check in Eventually with tryGetAllAffinityGroups to handle sync delay in microservice mode - cluster_test.go: replace timing-based failpoint with deterministic channel-based synchronization using failpoint.InjectCall/EnableCall, eliminating the race between goroutine and main thread Issue Number: ref tikv#10164 Signed-off-by: JmPotato <github@ipotato.me>
Signed-off-by: JmPotato <github@ipotato.me>
- rule_test.go: wrap region fit check in Eventually with TryReadGetJSON since rules may not have propagated to the scheduling server yet - scheduler_test.go: rewrite checkStorePause to use Eventually with plain Go comparisons, ensuring store pause state has propagated to both the PD leader and the scheduling primary server Issue Number: ref tikv#10164 Signed-off-by: JmPotato <github@ipotato.me>
…terTransferLeader - TestGlobalMemoryTuner: increase require.Eventually timeout from 5s to 20s to accommodate slower GC cycles in CI environments. - TestTSOStreamBasic: fix impossible condition that checks both processErr and callbackErr. When the stream is in closing state, processRequests returns an error without invoking the callback, so callbackErr is always nil. Only check processErr != nil. - TestGetTSAfterTransferLeader: wrap GetTS call in testutil.Eventually to tolerate transient errors during TSO stream reconnection after leader transfer. Signed-off-by: JmPotato <github@ipotato.me>
- TestConfig: wrap config show assertion in testutil.Eventually to wait for the scheduling microservice to sync config from etcd. After MustPutStore triggers config persistence, the scheduling server's watcher needs time to process the update. - stream_test.go: rename unused callback parameter err to _err to satisfy revive unused-parameter lint rule. Signed-off-by: JmPotato <github@ipotato.me>
- TestGlobalMemoryTuner: add explicit runtime.GC() calls inside Eventually callbacks to force GC cycles. The test relies on the tuner's finalizer-based callback which only fires after GC, but CI environments may not trigger automatic GC promptly enough from just the soft memory limit. - TestHotRegionStorage: wrap read hot region assertions in testutil.Eventually. Read hot regions are processed asynchronously via CheckReadAsync, so they may not be flushed to storage by the time write hot regions are available. Signed-off-by: JmPotato <github@ipotato.me>
…rage The scheduling server's rule watcher deleteFn tried to load rule data from etcd storage when processing DELETE watch events. However, by the time the watcher processes the event, the rule has already been deleted from storage by the PD leader, causing "unexpected end of JSON input" errors. This left ghost rules in the scheduling server's cache, corrupting the rule state for subsequent operations. Fix by parsing groupID and ruleID directly from the etcd key (which uses hex-encoded format) instead of loading from storage. The suspect key ranges are obtained from the rule manager's cached copy. Signed-off-by: JmPotato <github@ipotato.me>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10295 +/- ##
==========================================
+ Coverage 78.78% 78.86% +0.08%
==========================================
Files 525 527 +2
Lines 70824 70956 +132
==========================================
+ Hits 55796 55959 +163
+ Misses 11004 10998 -6
+ Partials 4024 3999 -25
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/retest |
Code reviewFound 2 issues:
pd/pkg/utils/testutil/api_check.go Lines 183 to 210 in da9cbbb
pd/tests/server/api/store_test.go Lines 527 to 544 in da9cbbb 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
- Remove duplicate TryCheckGetJSON/TryReadGetJSONWithBody functions, consolidate all call sites to use TryReadGetJSON - Fix resp.Body leak in tryRequestStatusBody by using defer - Fix stale output variable in TestRemovingProgress by reading resp.Body - Fix ignored json.Unmarshal errors in scheduler_test.go by checking return value and moving destination variables inside closures - Add per-attempt timeout to tryRequest gRPC calls to prevent blocking - Strengthen config_test.go predicates by moving content checks inside Eventually closures to prevent false passes on stale data - Fix tryCheckMemberList false-pass by tracking matched members Signed-off-by: JmPotato <github@ipotato.me>
The RegionFit response may contain more RuleFits than expected values. Use >= instead of == to match the original behavior. Signed-off-by: JmPotato <github@ipotato.me>
- requestStatusBody now wraps tryRequestStatusBody - TryLoadKeyspaceGroupByID now wraps loadKeyspaceGroupByIDNoAssert - tryGetKeyspaceGroup now wraps getKeyspaceGroup Signed-off-by: JmPotato <github@ipotato.me>
Extract parseRuleStoreKey helper from the deleteFn closure so it can be unit-tested. Add TestParseRuleStoreKey covering roundtrip with Rule.StoreKey(), invalid format, and invalid hex inputs. Signed-off-by: JmPotato <github@ipotato.me>
|
/retest |
- register_test: remove sleep from tryGetKey since Eventually's 20s timeout covers the 5s lease expiry; rename for clarity - watcher: use explicit ok bool return from parseRuleStoreKey to match original deleteFn behavior exactly (only skip on missing separator) - api_check: remove dead TryCheckPatchJSON (zero call sites) Signed-off-by: JmPotato <github@ipotato.me>
Unify the two remaining call sites to use Eventually polling instead of blocking sleep, then delete the now-unused getKeyAfterLeaseExpired. Signed-off-by: JmPotato <github@ipotato.me>
|
@JmPotato: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: Close #10164
What is changed and how does it work?
Check List
Tests
Release note
Summary by CodeRabbit
Tests
Chores