metrics: align store used panel with storage definition#10277
metrics: align store used panel with storage definition#10277nolouch wants to merge 1 commit intotikv:masterfrom
Conversation
Update the Grafana store used query to use capacity minus available so the panel reflects actual disk usage semantics, and adjust the panel description accordingly. Signed-off-by: nolouch <nolouch@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughA Grafana dashboard configuration file was updated to correct the "Store used" metric calculation. The panel's description was refined, and its PromQL expression was modified to derive used storage from capacity minus available capacity, rather than directly querying a dedicated store_used metric. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@metrics/grafana/pd.json`:
- Line 3848: The PromQL subtraction uses pd_scheduler_store_status with
differing type labels ("store_capacity" vs "store_available") which prevents
matching; update the expression that currently subtracts
pd_scheduler_store_status{k8s_cluster="$k8s_cluster",
tidb_cluster="$tidb_cluster", store=~"$store", type="store_capacity"} -
pd_scheduler_store_status{..., type="store_available"} to perform vector
matching ignoring the type label (e.g. add ignoring(type)) so rows are matched
by shared labels (k8s_cluster, tidb_cluster, store) rather than the differing
type.
| "targets": [ | ||
| { | ||
| "expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_used\"}", | ||
| "expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check file existence and line count
wc -l metrics/grafana/pd.jsonRepository: tikv/pd
Length of output: 80
🏁 Script executed:
# View the specific line mentioned (3848) with context
sed -n '3845,3851p' metrics/grafana/pd.json | cat -nRepository: tikv/pd
Length of output: 595
🏁 Script executed:
# Run the verification patterns from the review comment
echo "=== Pattern 1: Looking for store_capacity - store_available without matching ==="
rg -nP 'type="store_capacity"\}[^"]*-[^"]*type="store_available"' metrics/grafana/pd.json
echo "=== Pattern 2: Looking for existing ignoring/on modifiers for type label ==="
rg -nP 'ignoring\s*\(\s*type\s*\)|on\s*\([^)]*type[^)]*\)' metrics/grafana/pd.jsonRepository: tikv/pd
Length of output: 208
🏁 Script executed:
# Search for other subtraction patterns in the file to see common patterns
rg -n '\} - ' metrics/grafana/pd.json | head -20Repository: tikv/pd
Length of output: 1796
🏁 Script executed:
# Search for existing uses of ignoring() or on() modifiers in the file
rg -n 'ignoring\|on\(' metrics/grafana/pd.json | head -20Repository: tikv/pd
Length of output: 33
🏁 Script executed:
# Get more context around line 3848 to understand the dashboard panel
sed -n '3840,3855p' metrics/grafana/pd.json | cat -nRepository: tikv/pd
Length of output: 885
🏁 Script executed:
# Count subtraction patterns that might have the same vector matching issue
# Look for patterns with different label values being subtracted
rg -n 'type="[^"]*"\}[^"]*-.*type="[^"]*"\}' metrics/grafana/pd.json | wc -lRepository: tikv/pd
Length of output: 52
🏁 Script executed:
# Search for metric definition or other references to pd_scheduler_store_status
rg -n 'pd_scheduler_store_status' metrics/grafana/pd.json | head -20Repository: tikv/pd
Length of output: 3767
🏁 Script executed:
# Check if there are any uses of matching modifiers (ignoring, on, group_left, group_right) anywhere
rg -n '\b(ignoring|on|group_left|group_right)\s*\(' metrics/grafana/pd.json | wc -lRepository: tikv/pd
Length of output: 52
Add vector matching modifier to PromQL subtraction at line 3848.
This expression subtracts vectors with mismatched type labels; PromQL's default one-to-one matching won't pair store_capacity with store_available, resulting in an empty output. Use ignoring(type) to match on shared labels (cluster, store) while ignoring the type difference:
- "expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}",
+ "expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - ignoring(type) pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}", | |
| "expr": "pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_capacity\"} - ignoring(type) pd_scheduler_store_status{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", store=~\"$store\", type=\"store_available\"}", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@metrics/grafana/pd.json` at line 3848, The PromQL subtraction uses
pd_scheduler_store_status with differing type labels ("store_capacity" vs
"store_available") which prevents matching; update the expression that currently
subtracts pd_scheduler_store_status{k8s_cluster="$k8s_cluster",
tidb_cluster="$tidb_cluster", store=~"$store", type="store_capacity"} -
pd_scheduler_store_status{..., type="store_available"} to perform vector
matching ignoring the type label (e.g. add ignoring(type)) so rows are matched
by shared labels (k8s_cluster, tidb_cluster, store) rather than the differing
type.
|
@nolouch: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Update the Grafana store used query to use capacity minus available so the panel reflects actual disk usage semantics, and adjust the panel description accordingly.
What problem does this PR solve?
Issue Number: Close #10276
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs/pingcap/docs-cn:pingcap/tiup:Release note
Summary by CodeRabbit
Bug Fixes
Documentation