Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在通过优化数据库查询和引入新的索引来提升供应商缓存命中率排行榜的性能。核心改动在于将原有的双层数据库聚合查询精简为单层模型级别聚合,并在应用层进行数据汇总,从而减少了数据库负载。同时,通过模型字段的归一化处理,提升了排行榜数据的准确性和可用性。 Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthrough新增针对此次缓存相关条件且未被阻塞行的部分索引并在 schema 中导出;将 provider 聚合改为先按模型聚合再在内存合并,更新相应单元测试与迁移日记条目。 Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This PR aims to optimize the query performance of the cache hit rate leaderboard. No security vulnerabilities were found. The changes are excellent, effectively reducing database pressure by optimizing two database aggregate queries into one and summarizing data at the application layer. The new partial index idx_usage_ledger_cache_required_created_at for usage_ledger is well-designed, and model names are normalized using TRIM and NULLIF. The use of cached getCachedSystemSettings instead of getSystemSettings is a good practice, and updated unit tests ensure the correctness of the refactored logic.
src/repository/leaderboard.ts
Outdated
| agg.totalInputTokens += row.totalInputTokens; | ||
| providerAggById.set(row.providerId, agg); | ||
|
|
||
| const model = row.model?.trim(); |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/repository/leaderboard.ts (1)
575-576: 可选优化:冗余的空值检查。
row.model?.trim()的空值检查在数据库层已通过NULLIF(TRIM(...), '')处理,此处的?.trim()和if (!model) continue作为防御性代码是可接受的,但理论上在正常情况下row.model要么是非空字符串,要么是null(不会是空字符串)。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/repository/leaderboard.ts` around lines 575 - 576, The DB already applies NULLIF(TRIM(...), ''), so remove the redundant optional chaining and trimming: replace the use of "row.model?.trim()" with direct "row.model" and change the guard to check explicitly for null (e.g., "if (model === null) continue") or remove the guard entirely if callers already assume non-null; update references to the "model" variable and the existing "if (!model) continue" guard in the block to use this simplified check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/repository/leaderboard.ts`:
- Around line 575-576: The DB already applies NULLIF(TRIM(...), ''), so remove
the redundant optional chaining and trimming: replace the use of
"row.model?.trim()" with direct "row.model" and change the guard to check
explicitly for null (e.g., "if (model === null) continue") or remove the guard
entirely if callers already assume non-null; update references to the "model"
variable and the existing "if (!model) continue" guard in the block to use this
simplified check.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
📒 Files selected for processing (6)
drizzle/0078_dear_alex_power.sqldrizzle/meta/0078_snapshot.jsondrizzle/meta/_journal.jsonsrc/drizzle/schema.tssrc/repository/leaderboard.tstests/unit/repository/leaderboard-provider-metrics.test.ts
There was a problem hiding this comment.
Code Review Summary
This PR optimizes the cache hit rate leaderboard query by consolidating two database aggregations (provider-level + model-level) into a single model-level aggregation with application-level provider summarization. The changes also add a partial index to accelerate time-range queries for cache-required requests.
PR Size: L
- Lines changed: 4,149 (additions: 4,048, deletions: 101)
- Files changed: 6
Note: The large line count is primarily due to the generated Drizzle snapshot file (0078_snapshot.json - ~3,900 lines). The actual functional changes are small (~160 lines across schema, leaderboard, and tests).
Review Coverage
- Logic and correctness - Clean
- Security (OWASP Top 10) - Clean
- Error handling - Clean
- Type safety - Clean
- Documentation accuracy - Clean
- Test coverage - Adequate
- Code clarity - Good
Technical Assessment
Optimization Strategy: The approach of reducing from 2 DB queries to 1 with in-memory aggregation is sound for this use case.
Index Design: The partial index idx_usage_ledger_cache_required_created_at correctly targets the cacheRequiredCondition filter pattern, enabling efficient time-range scans.
COALESCE Removal: The simplification of cacheRequiredCondition (removing COALESCE wrappers) is safe because:
- The partial index predicate
WHERE blocked_by IS NULL AND (cache_creation_input_tokens > 0 OR cache_read_input_tokens > 0)inherently excludes NULL values - NULL comparisons in SQL return false, so rows with NULL token values are already excluded by the index condition
Model Normalization: The NULLIF(TRIM(...), '') pattern properly handles empty/whitespace model names, preventing meaningless groupings.
Test Updates: Tests correctly verify that provider-level metrics are properly derived from model-level aggregation.
Automated review by Claude AI
src/repository/leaderboard.ts
Outdated
| billingModelSource === "original" | ||
| ? sql<string>`COALESCE(${usageLedger.originalModel}, ${usageLedger.model})` | ||
| : sql<string>`COALESCE(${usageLedger.model}, ${usageLedger.originalModel})`; | ||
| const modelField = sql<string>`NULLIF(TRIM(${rawModelField}), '')`; |
There was a problem hiding this comment.
Empty/blank model names now excluded from both SQL grouping and modelStats via NULLIF(TRIM()), but still contribute to provider totals. Verify this behavior change is intentional
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/leaderboard.ts
Line: 507
Comment:
Empty/blank model names now excluded from both SQL grouping and `modelStats` via `NULLIF(TRIM())`, but still contribute to provider totals. Verify this behavior change is intentional
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: tests/unit/repository/leaderboard-provider-metrics.test.ts
Line: 89-91
Comment:
Wrong module mocked - code now uses `getCachedSystemSettings` from `@/lib/config/system-settings-cache`, not `getSystemSettings` from `@/repository/system-config`
```suggestion
vi.mock("@/lib/config/system-settings-cache", () => ({
getCachedSystemSettings: mocks.getSystemSettings,
}));
```
How can I resolve this? If you propose a fix, please make it concise. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/repository/leaderboard.ts (1)
503-507: 建议抽取模型字段归一化逻辑,避免双处漂移
rawModelField + NULLIF(TRIM(...), '')在两个函数里重复,后续若变更规则容易漏改。建议抽成一个小工具函数复用。可参考的最小改动
const clampRatio01 = (value: number | null | undefined) => Math.min(Math.max(value ?? 0, 0), 1); + +function buildNormalizedModelField(billingModelSource: "original" | "redirected") { + const rawModelField = + billingModelSource === "original" + ? sql<string>`COALESCE(${usageLedger.originalModel}, ${usageLedger.model})` + : sql<string>`COALESCE(${usageLedger.model}, ${usageLedger.originalModel})`; + return sql<string>`NULLIF(TRIM(${rawModelField}), '')`; +} @@ - const rawModelField = - billingModelSource === "original" - ? sql<string>`COALESCE(${usageLedger.originalModel}, ${usageLedger.model})` - : sql<string>`COALESCE(${usageLedger.model}, ${usageLedger.originalModel})`; - const modelField = sql<string>`NULLIF(TRIM(${rawModelField}), '')`; + const modelField = buildNormalizedModelField(billingModelSource); @@ - const rawModelField = - billingModelSource === "original" - ? sql<string>`COALESCE(${usageLedger.originalModel}, ${usageLedger.model})` - : sql<string>`COALESCE(${usageLedger.model}, ${usageLedger.originalModel})`; - const modelField = sql<string>`NULLIF(TRIM(${rawModelField}), '')`; + const modelField = buildNormalizedModelField(billingModelSource);Also applies to: 692-697
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/repository/leaderboard.ts` around lines 503 - 507, Extract the repeated model normalization logic into a small helper (e.g., a function named normalizeModelField or buildModelField) that takes the two columns and billingModelSource (or the chosen primary/secondary columns) and returns the sql<string>`NULLIF(TRIM(COALESCE(...)), '')` expression; then replace the duplicated rawModelField + modelField expressions (references: rawModelField and modelField) in both locations with calls to that helper so all normalization rules live in one place and are reused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/repository/leaderboard.ts`:
- Around line 503-507: Extract the repeated model normalization logic into a
small helper (e.g., a function named normalizeModelField or buildModelField)
that takes the two columns and billingModelSource (or the chosen
primary/secondary columns) and returns the
sql<string>`NULLIF(TRIM(COALESCE(...)), '')` expression; then replace the
duplicated rawModelField + modelField expressions (references: rawModelField and
modelField) in both locations with calls to that helper so all normalization
rules live in one place and are reused.
|
已根据 CodeRabbit 的 nitpick 调整: 当前 PR 相关 Actions checks 均已通过。若看到早期 |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
src/repository/leaderboard.ts (2)
713-720: 可把空模型过滤前移到 SQL,减少一次无效分组回传Line 718-720 在应用层再过滤
null/"",语义正确;但这里可以在查询条件中直接排除空模型,进一步降低传输与 map/filter 成本。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/repository/leaderboard.ts` around lines 713 - 720, Move the empty/null model filtering into the SQL query to avoid grouping and returning useless rows: update the query that builds rankings (the .where(...) chain using LEDGER_BILLING_CONDITION and buildDateCondition(period, timezone, dateRange)) to add a predicate excluding model IS NULL and model = '' (or equivalent non-empty check on modelField) before .groupBy(modelField)/.orderBy(...); this removes the need for the post-query .filter((entry) => entry.model !== null && entry.model !== "") and the subsequent type assertion in the .map, reducing data transfer and CPU work.
592-613: 建议在 provider 内对modelStats做一次显式稳定排序Line 607 目前直接使用插入顺序。虽然当前大多可用,但当
cacheHitRate和totalRequests并列时,顺序仍可能漂移。建议在回填modelStats时增加本地排序并加一个稳定兜底键。可选改法
const entries: ProviderCacheHitRateLeaderboardEntry[] = Array.from(providerAggById.entries()).map( ([providerId, agg]) => { + const modelStats = [...(modelStatsByProvider.get(providerId) ?? [])].sort( + (a, b) => + b.cacheHitRate - a.cacheHitRate || + b.totalRequests - a.totalRequests || + a.model.localeCompare(b.model) + ); + const cacheHitRate = agg.totalInputTokens > 0 ? agg.cacheReadTokens / agg.totalInputTokens : 0; return { providerId, providerName: agg.providerName, totalRequests: agg.totalRequests, totalCost: agg.totalCost, cacheReadTokens: agg.cacheReadTokens, cacheCreationCost: agg.cacheCreationCost, totalInputTokens: agg.totalInputTokens, totalTokens: agg.totalInputTokens, // deprecated, for backward compatibility cacheHitRate: clampRatio01(cacheHitRate), - modelStats: modelStatsByProvider.get(providerId) ?? [], + modelStats, }; } );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/repository/leaderboard.ts` around lines 592 - 613, When building entries in the mapping, the modelStats array is left in insertion order which can drift when providers tie on cacheHitRate and totalRequests; change the mapping for ProviderCacheHitRateLeaderboardEntry so you create a local copy of modelStats (from modelStatsByProvider.get(providerId) ?? []) and perform an explicit stable sort (e.g., sort by desired keys such as cacheHitRate desc, totalRequests desc, then a stable fallback like modelId or modelName) before assigning it to the modelStats property so ordering is deterministic even when top-level ties occur.tests/unit/repository/leaderboard-provider-metrics.test.ts (2)
344-354: 建议在“空模型过滤”用例里再断言 provider 聚合总量不丢失这个场景现在验证了
modelStats过滤正确,但还缺一组 provider 级总量断言(如totalRequests/totalInputTokens),否则未来若把过滤提前到聚合前,可能静默少算。Also applies to: 372-383
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/repository/leaderboard-provider-metrics.test.ts` around lines 344 - 354, The "empty model filter" test currently asserts modelStats filtering but doesn't verify provider-level aggregates; update the test in leaderboard-provider-metrics.test.ts to also assert that the provider object (providerId: 1 / providerName: "provider-a") still reports correct totalRequests and totalInputTokens (and optionally totalCost/cacheReadTokens/cacheCreationCost) after filtering so aggregation isn't lost; make the same addition in the other similar test around the 372-383 block to ensure provider-level totals are preserved even if model filtering is applied before or after aggregation.
100-101: 建议补一组billingModelSource: "original"分支用例当前两个
beforeEach都固定"redirected",新引入的分支只覆盖到一半。建议加一个最小用例把"original"路径也跑通。As per coding guidelines:
**/*.test.{ts,tsx,js,jsx}: All new features must have unit test coverage of at least 80%.Also applies to: 246-247
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/repository/leaderboard-provider-metrics.test.ts` around lines 100 - 101, Add a unit test that exercises the "original" billingModelSource path: create a new test case in the leaderboard-provider-metrics test suite that mocks mocks.getCachedSystemSettings to resolve { billingModelSource: "original" } and asserts the metrics/behavior differs as expected compared to the existing "redirected" case; ensure the new test targets the same functions/logic exercised by the existing tests (e.g., the leaderboard metrics collection code under test) so the "original" branch is covered and overall test coverage for this file increases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/repository/leaderboard.ts`:
- Around line 713-720: Move the empty/null model filtering into the SQL query to
avoid grouping and returning useless rows: update the query that builds rankings
(the .where(...) chain using LEDGER_BILLING_CONDITION and
buildDateCondition(period, timezone, dateRange)) to add a predicate excluding
model IS NULL and model = '' (or equivalent non-empty check on modelField)
before .groupBy(modelField)/.orderBy(...); this removes the need for the
post-query .filter((entry) => entry.model !== null && entry.model !== "") and
the subsequent type assertion in the .map, reducing data transfer and CPU work.
- Around line 592-613: When building entries in the mapping, the modelStats
array is left in insertion order which can drift when providers tie on
cacheHitRate and totalRequests; change the mapping for
ProviderCacheHitRateLeaderboardEntry so you create a local copy of modelStats
(from modelStatsByProvider.get(providerId) ?? []) and perform an explicit stable
sort (e.g., sort by desired keys such as cacheHitRate desc, totalRequests desc,
then a stable fallback like modelId or modelName) before assigning it to the
modelStats property so ordering is deterministic even when top-level ties occur.
In `@tests/unit/repository/leaderboard-provider-metrics.test.ts`:
- Around line 344-354: The "empty model filter" test currently asserts
modelStats filtering but doesn't verify provider-level aggregates; update the
test in leaderboard-provider-metrics.test.ts to also assert that the provider
object (providerId: 1 / providerName: "provider-a") still reports correct
totalRequests and totalInputTokens (and optionally
totalCost/cacheReadTokens/cacheCreationCost) after filtering so aggregation
isn't lost; make the same addition in the other similar test around the 372-383
block to ensure provider-level totals are preserved even if model filtering is
applied before or after aggregation.
- Around line 100-101: Add a unit test that exercises the "original"
billingModelSource path: create a new test case in the
leaderboard-provider-metrics test suite that mocks mocks.getCachedSystemSettings
to resolve { billingModelSource: "original" } and asserts the metrics/behavior
differs as expected compared to the existing "redirected" case; ensure the new
test targets the same functions/logic exercised by the existing tests (e.g., the
leaderboard metrics collection code under test) so the "original" branch is
covered and overall test coverage for this file increases.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
📒 Files selected for processing (2)
src/repository/leaderboard.tstests/unit/repository/leaderboard-provider-metrics.test.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/repository/leaderboard.ts (1)
621-622: 建议为 provider 排序添加稳定的第三排序键。
modelStats的排序使用了三级排序(cacheHitRate → totalRequests → model),但 provider 级排序只用了两级。当cacheHitRate和totalRequests都相等时,排序结果可能不稳定。♻️ 建议的修改
- entries.sort((a, b) => b.cacheHitRate - a.cacheHitRate || b.totalRequests - a.totalRequests); + entries.sort( + (a, b) => + b.cacheHitRate - a.cacheHitRate || + b.totalRequests - a.totalRequests || + a.providerId - b.providerId + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/repository/leaderboard.ts` around lines 621 - 622, The provider-level sort using entries.sort((a, b) => b.cacheHitRate - a.cacheHitRate || b.totalRequests - a.totalRequests) lacks a tertiary key causing instability when cacheHitRate and totalRequests tie; update that comparator (in src/repository/leaderboard.ts where entries.sort is used) to add a deterministic third key such as comparing a.provider (or a.providerId) lexicographically (e.g., localeCompare) or by id to ensure stable ordering; mirror the three-key approach used by modelStats (cacheHitRate → totalRequests → model) so ties are resolved consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/repository/leaderboard.ts`:
- Around line 621-622: The provider-level sort using entries.sort((a, b) =>
b.cacheHitRate - a.cacheHitRate || b.totalRequests - a.totalRequests) lacks a
tertiary key causing instability when cacheHitRate and totalRequests tie; update
that comparator (in src/repository/leaderboard.ts where entries.sort is used) to
add a deterministic third key such as comparing a.provider (or a.providerId)
lexicographically (e.g., localeCompare) or by id to ensure stable ordering;
mirror the three-key approach used by modelStats (cacheHitRate → totalRequests →
model) so ties are resolved consistently.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
📒 Files selected for processing (2)
src/repository/leaderboard.tstests/unit/repository/leaderboard-provider-metrics.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/repository/leaderboard-provider-metrics.test.ts
背景
供应商「缓存命中率」排行榜在缓存未命中时会对
usage_ledger做两次聚合(provider 级 + provider+model 级),在数据量较大时开销偏高。改动
src/repository/leaderboard.ts:将缓存命中率榜单改为仅执行一次 provider+model 聚合查询,并在应用层汇总 provider 级统计(减少一次 DB 聚合扫描);同时对每个 provider 的modelStats做稳定排序,避免顺序漂移,并为 provider 排名增加稳定第三排序键。TRIM + NULLIF归一化,避免空白模型名造成无意义分组;模型排行榜在 SQL 层排除空模型,减少无效分组回传。src/drizzle/schema.ts+ 迁移:为usage_ledger增加 partial indexidx_usage_ledger_cache_required_created_at(blocked_by IS NULL且 cache tokens > 0),加速按时间范围统计。tests/unit/repository/leaderboard-provider-metrics.test.ts:补齐billingModelSource: "original"分支覆盖;并在空模型过滤用例里额外断言 provider 聚合总量不丢失。兼容性
totalTokens仍按原语义保留(deprecated,等同totalInputTokens)。验证
bun run lintbun run typecheckbun run testbun run buildGreptile Summary
This PR successfully optimizes the cache hit rate leaderboard by reducing database load from 2 aggregation queries to 1, with provider-level statistics now aggregated at the application layer.
Key improvements:
idx_usage_ledger_cache_required_created_atonusage_ledgerto accelerate time-range queries for cache-enabled entriesbuildNormalizedModelField()helper usingTRIM + NULLIFto normalize empty/whitespace model names to NULLmodelStatsby (cacheHitRate desc, totalRequests desc, model name asc) to prevent ordering driftisNotNull(modelField)instead of post-query filteringgetSystemSettingstogetCachedSystemSettingsfor better cache utilizationBehavior preservation:
modelStatsarray but still contribute to provider-level totals (intentional, verified by tests)totalTokensdeprecated field maintained for backward compatibilitybillingModelSource: "original"branch and empty model handlingConfidence Score: 4/5
Important Files Changed
usage_ledger.created_atwith cache-required filter for optimized leaderboard queriesbillingModelSource: "original"branch and verifies empty model filtering preserves provider-level totalsFlowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD Start[Cache Hit Rate Leaderboard Request] --> OldOrNew{Implementation} OldOrNew -->|Before| Old1[Query 1: Provider-level aggregation<br/>GROUP BY providerId] Old1 --> Old2[Query 2: Model-level aggregation<br/>GROUP BY providerId, model] Old2 --> OldMerge[Merge results<br/>Attach modelStats to providers] OldMerge --> OldReturn[Return leaderboard] OldOrNew -->|After| New1[Single Query: Model-level aggregation<br/>GROUP BY providerId, providerName, model<br/>with NULLIF TRIM model normalization] New1 --> NewAgg[Application-layer aggregation:<br/>- Accumulate provider totals from model rows<br/>- Filter null models from modelStats<br/>- Stable sort modelStats] NewAgg --> NewSort[Sort providers by:<br/>cacheHitRate desc, totalRequests desc, providerId asc] NewSort --> NewReturn[Return leaderboard] NewReturn -.Index acceleration.-> Index[Partial Index:<br/>idx_usage_ledger_cache_required_created_at<br/>WHERE blockedBy IS NULL AND cache tokens > 0] style Old1 fill:#ffcccc style Old2 fill:#ffcccc style New1 fill:#ccffcc style Index fill:#cce5ffLast reviewed commit: 0f1f682