Skip to content

metrics: align store used panel with storage definition#10277

Open
nolouch wants to merge 1 commit intotikv:masterfrom
nolouch:fix-used
Open

metrics: align store used panel with storage definition#10277
nolouch wants to merge 1 commit intotikv:masterfrom
nolouch:fix-used

Conversation

@nolouch
Copy link
Contributor

@nolouch nolouch commented Mar 3, 2026

Update the Grafana store used query to use capacity minus available so the panel reflects actual disk usage semantics, and adjust the panel description accordingly.

What problem does this PR solve?

Issue Number: Close #10276

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

  • Bug Fixes

    • Refined Store used capacity metric calculation to ensure accurate monitoring. The metric now derives from total capacity minus available storage.
  • Documentation

    • Updated monitoring dashboard descriptions to clarify metric calculation methodology for improved transparency.

Update the Grafana store used query to use capacity minus available so the panel reflects actual disk usage semantics, and adjust the panel description accordingly.

Signed-off-by: nolouch <nolouch@gmail.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/needs-triage-completed labels Mar 3, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yisaer for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 3, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

A Grafana dashboard configuration file was updated to correct the "Store used" metric calculation. The panel's description was refined, and its PromQL expression was modified to derive used storage from capacity minus available capacity, rather than directly querying a dedicated store_used metric.

Changes

Cohort / File(s) Summary
Grafana Dashboard Configuration
metrics/grafana/pd.json
Updated "Store used" panel: refined description text and changed PromQL expression from type="store_used" to type="store_capacity"} - {...type="store_available"} for correct capacity-based calculation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 Hops with glee, a fix so neat,
Capacity minus available—the story's complete!
No more confusion in Grafana's display,
The metrics now match what we meant them to say! 📊✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'metrics: align store used panel with storage definition' accurately describes the main change: updating the Grafana Store used panel query to align with correct storage semantics.
Description check ✅ Passed The PR description fills required sections (problem statement, issue link, change explanation) and follows the template structure. Issue #10276 is properly referenced with 'Close' keyword. Description clearly explains the PromQL query change.
Linked Issues check ✅ Passed The PR successfully addresses issue #10276 by changing the Store used panel query from pd_scheduler_store_status{type='store_used'} to the expected pd_scheduler_store_status{type='store_capacity'} minus pd_scheduler_store_status{type='store_available'}, aligning with capacity-minus-available semantics.
Out of Scope Changes check ✅ Passed All changes are in-scope: the PromQL expression and panel description update directly address issue #10276's requirements. No unrelated modifications are present in the metrics/grafana/pd.json file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@metrics/grafana/pd.json`:
- Line 3848: The PromQL subtraction uses pd_scheduler_store_status with
differing type labels ("store_capacity" vs "store_available") which prevents
matching; update the expression that currently subtracts
pd_scheduler_store_status{k8s_cluster="$k8s_cluster",
tidb_cluster="$tidb_cluster", store=~"$store", type="store_capacity"} -
pd_scheduler_store_status{..., type="store_available"} to perform vector
matching ignoring the type label (e.g. add ignoring(type)) so rows are matched
by shared labels (k8s_cluster, tidb_cluster, store) rather than the differing
type.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27ee39b and 7021367.

📒 Files selected for processing (1)
  • metrics/grafana/pd.json

"targets": [
{
"expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_used\"}",
"expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check file existence and line count
wc -l metrics/grafana/pd.json

Repository: tikv/pd

Length of output: 80


🏁 Script executed:

# View the specific line mentioned (3848) with context
sed -n '3845,3851p' metrics/grafana/pd.json | cat -n

Repository: tikv/pd

Length of output: 595


🏁 Script executed:

# Run the verification patterns from the review comment
echo "=== Pattern 1: Looking for store_capacity - store_available without matching ==="
rg -nP 'type="store_capacity"\}[^"]*-[^"]*type="store_available"' metrics/grafana/pd.json

echo "=== Pattern 2: Looking for existing ignoring/on modifiers for type label ==="
rg -nP 'ignoring\s*\(\s*type\s*\)|on\s*\([^)]*type[^)]*\)' metrics/grafana/pd.json

Repository: tikv/pd

Length of output: 208


🏁 Script executed:

# Search for other subtraction patterns in the file to see common patterns
rg -n '\} - ' metrics/grafana/pd.json | head -20

Repository: tikv/pd

Length of output: 1796


🏁 Script executed:

# Search for existing uses of ignoring() or on() modifiers in the file
rg -n 'ignoring\|on\(' metrics/grafana/pd.json | head -20

Repository: tikv/pd

Length of output: 33


🏁 Script executed:

# Get more context around line 3848 to understand the dashboard panel
sed -n '3840,3855p' metrics/grafana/pd.json | cat -n

Repository: tikv/pd

Length of output: 885


🏁 Script executed:

# Count subtraction patterns that might have the same vector matching issue
# Look for patterns with different label values being subtracted
rg -n 'type="[^"]*"\}[^"]*-.*type="[^"]*"\}' metrics/grafana/pd.json | wc -l

Repository: tikv/pd

Length of output: 52


🏁 Script executed:

# Search for metric definition or other references to pd_scheduler_store_status
rg -n 'pd_scheduler_store_status' metrics/grafana/pd.json | head -20

Repository: tikv/pd

Length of output: 3767


🏁 Script executed:

# Check if there are any uses of matching modifiers (ignoring, on, group_left, group_right) anywhere
rg -n '\b(ignoring|on|group_left|group_right)\s*\(' metrics/grafana/pd.json | wc -l

Repository: tikv/pd

Length of output: 52


Add vector matching modifier to PromQL subtraction at line 3848.

This expression subtracts vectors with mismatched type labels; PromQL's default one-to-one matching won't pair store_capacity with store_available, resulting in an empty output. Use ignoring(type) to match on shared labels (cluster, store) while ignoring the type difference:

-              "expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}",
+              "expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - ignoring(type) pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}",
"expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - ignoring(type) pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@metrics/grafana/pd.json` at line 3848, The PromQL subtraction uses
pd_scheduler_store_status with differing type labels ("store_capacity" vs
"store_available") which prevents matching; update the expression that currently
subtracts pd_scheduler_store_status{k8s_cluster="$k8s_cluster",
tidb_cluster="$tidb_cluster", store=~"$store", type="store_capacity"} -
pd_scheduler_store_status{..., type="store_available"} to perform vector
matching ignoring the type label (e.g. add ignoring(type)) so rows are matched
by shared labels (k8s_cluster, tidb_cluster, store) rather than the differing
type.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 3, 2026

@nolouch: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-realcluster-test 7021367 link true /test pull-integration-realcluster-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Grafana Store used panel uses store_used instead of capacity-available

1 participant