Skip to content

RU: AcquireTokenBuckets fallback#10296

Open
ystaticy wants to merge 2 commits intotikv:masterfrom
ystaticy:ystaticy/use_ru_fallback
Open

RU: AcquireTokenBuckets fallback#10296
ystaticy wants to merge 2 commits intotikv:masterfrom
ystaticy:ystaticy/use_ru_fallback

Conversation

@ystaticy
Copy link
Contributor

@ystaticy ystaticy commented Mar 4, 2026

What problem does this PR solve?

Issue Number: Close #xxx

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

  • New Features

    • Added degraded fallback mode for token requests during RPC/transport errors to allow graceful request handling and informational logging when used.
  • Bug Fixes

    • Improved classification and handling of transport-level errors (Unavailable, DeadlineExceeded, Canceled, Internal, ResourceExhausted) to trigger fallback only when appropriate while preserving existing latency metrics.
  • Tests

    • Added tests covering error classification, degraded fallback responses, and fallback vs. logical error behavior.

Signed-off-by: ystaticy <y_static_y@sina.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Mar 4, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andremouche for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Mar 4, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 4, 2026

Hi @ystaticy. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 4, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

Adds RPC error classification and a local degraded-token-bucket fallback path used when AcquireTokenBuckets RPCs fail; integrates degraded response generation into the token-bucket request flow and adds tests for classification and fallback behavior.

Changes

Cohort / File(s) Summary
Token Bucket Error Handling & Degraded Fallback
client/resource_group/controller/global_controller.go
Added isAcquireTokenBucketsRPCError() to classify RPC/transport errors and buildDegradedTokenBucketResponses() to synthesize local degraded TokenBucketResponses. Integrated fallback usage into sendTokenBucketRequests() with logging and preserved existing metrics and non-fallback behavior.
Test Coverage for Degraded Fallback
client/resource_group/controller/global_controller_test.go
Added tests: TestIsAcquireTokenBucketsRPCError, TestBuildDegradedTokenBucketResponses, TestAcquireTokenBucketsRPCErrorUsesDegradedFallback, TestAcquireTokenBucketsLogicalErrorNoFallback. Introduced gRPC status/codes imports and coverage for degraded vs logical-error paths.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ResourceGroupsController as Controller
  participant TokenBucketService as RPC_Server
  participant LocalFallback as Fallback_Generator

  Client->>Controller: Request token buckets
  Controller->>RPC_Server: AcquireTokenBuckets RPC
  alt RPC success
    RPC_Server-->>Controller: TokenBucketResponses
    Controller-->>Client: Return responses
  else RPC transport error (Unavailable/DeadlineExceeded/...)
    RPC_Server--x Controller: RPC error
    Controller->>Fallback_Generator: buildDegradedTokenBucketResponses(requests, degradedSettings)
    Fallback_Generator-->>Controller: Degraded TokenBucketResponses
    Controller-->>Client: Return degraded responses (logs info, records latency)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • rleungx
  • JmPotato

Poem

🐇 I sniffed the RPC winds that blew,

When servers hiccuped, I brewed something new.
Degraded buckets trickle but keep the pace,
Tests hop around to guard the grace.
Hooray — tokens still find their place!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete, containing only the template structure with placeholder values and no actual details filled in (e.g., 'Issue Number: Close #xxx' unlinked, empty problem statement, empty 'What is changed' section, empty commit-message block, unchecked checklist items). Add a linked issue number, describe the problem being solved, explain the implementation details including the degraded fallback mechanism, check relevant checklist items, and provide a meaningful commit message.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'RU: AcquireTokenBuckets fallback' is directly related to the main change: implementing a fallback mechanism for AcquireTokenBuckets with degraded RU settings.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
client/resource_group/controller/global_controller_test.go (1)

539-540: Replace fixed sleeps with eventual assertions to reduce flakiness.

Line 539 and Line 591 use hard sleeps for async state propagation; this can be unstable in slower CI environments.

Example refactor pattern
-	time.Sleep(500 * time.Millisecond)
+	require.Eventually(t, func() bool {
+		reqCtx, reqCancel := context.WithTimeout(ctx, 200*time.Millisecond)
+		defer reqCancel()
+		requestInfo := NewTestRequestInfo(false, 0, 1, AccessUnknown)
+		_, _, _, _, err := controller.OnRequestWait(reqCtx, defaultResourceGroupName, requestInfo)
+		return err == nil
+	}, 3*time.Second, 50*time.Millisecond)
-	time.Sleep(500 * time.Millisecond)
+	require.Eventually(t, func() bool {
+		return gc.run.requestInProgress
+	}, 3*time.Second, 50*time.Millisecond)

Also applies to: 591-592

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/resource_group/controller/global_controller_test.go` around lines 539
- 540, Replace the fixed time.Sleep(500 * time.Millisecond) (and the similar
sleeps at lines ~591-592) with an eventual/polling assertion that repeatedly
checks the async state change until a timeout; for example use
wait.PollImmediate/require.Eventually (or your test util) to poll the relevant
condition (the resource status update or controller side-effect the test
expects) instead of sleeping, referencing the test in global_controller_test.go
that currently uses time.Sleep so the loop queries the same function/field used
in the later assertion and fails only after a bounded timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/resource_group/controller/global_controller_test.go`:
- Around line 539-540: Replace the fixed time.Sleep(500 * time.Millisecond) (and
the similar sleeps at lines ~591-592) with an eventual/polling assertion that
repeatedly checks the async state change until a timeout; for example use
wait.PollImmediate/require.Eventually (or your test util) to poll the relevant
condition (the resource status update or controller side-effect the test
expects) instead of sleeping, referencing the test in global_controller_test.go
that currently uses time.Sleep so the loop queries the same function/field used
in the later assertion and fails only after a bounded timeout.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0388f83-0d4f-4b3a-851d-39cc48871be1

📥 Commits

Reviewing files that changed from the base of the PR and between 4253bee and ce39296.

📒 Files selected for processing (2)
  • client/resource_group/controller/global_controller.go
  • client/resource_group/controller/global_controller_test.go

Signed-off-by: ystaticy <y_static_y@sina.com>
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 4, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456, multiple issues should use full syntax for each issue and be separated by a comma, like: Issue Number: close #123, ref #456.

📖 For more info, you can check the "Linking issues" section in the CONTRIBUTING.md.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
client/resource_group/controller/global_controller_test.go (1)

453-484: LGTM!

Good test coverage for buildDegradedTokenBucketResponses. The nil-settings case is tested, and the response shape is validated including settings propagation and TrickleTimeMs.

💡 Optional: Additional nil edge cases

Consider adding tests for partial nil cases (nil RU or nil RU.Settings) to fully exercise the guard at line 527 of the implementation:

// Nil RU -> nil response
re.Nil(buildDegradedTokenBucketResponses(requests, &rmpb.GroupRequestUnitSettings{RU: nil}))

// Nil Settings -> nil response  
re.Nil(buildDegradedTokenBucketResponses(requests, &rmpb.GroupRequestUnitSettings{
    RU: &rmpb.TokenBucket{Settings: nil},
}))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/resource_group/controller/global_controller_test.go` around lines 453
- 484, Add two additional unit cases to TestBuildDegradedTokenBucketResponses to
cover partial-nil inputs: call buildDegradedTokenBucketResponses with
&rmpb.GroupRequestUnitSettings{RU: nil} and with
&rmpb.GroupRequestUnitSettings{RU: &rmpb.TokenBucket{Settings: nil}} and assert
the result is nil for each; these tests target the nil-guard logic inside
buildDegradedTokenBucketResponses and ensure it properly handles missing RU or
missing TokenBucket.Settings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/resource_group/controller/global_controller_test.go`:
- Around line 453-484: Add two additional unit cases to
TestBuildDegradedTokenBucketResponses to cover partial-nil inputs: call
buildDegradedTokenBucketResponses with &rmpb.GroupRequestUnitSettings{RU: nil}
and with &rmpb.GroupRequestUnitSettings{RU: &rmpb.TokenBucket{Settings: nil}}
and assert the result is nil for each; these tests target the nil-guard logic
inside buildDegradedTokenBucketResponses and ensure it properly handles missing
RU or missing TokenBucket.Settings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e22cdc0-a012-44e6-bec2-e63f11de97a9

📥 Commits

Reviewing files that changed from the base of the PR and between ce39296 and 7f3ed33.

📒 Files selected for processing (2)
  • client/resource_group/controller/global_controller.go
  • client/resource_group/controller/global_controller_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/needs-linked-issue needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant