Skip to content

perf(leaderboard): 优化缓存命中率榜单查询并添加索引#856

Open
tesgth032 wants to merge 5 commits intoding113:devfrom
tesgth032:perf/leaderboard-query-index
Open

perf(leaderboard): 优化缓存命中率榜单查询并添加索引#856
tesgth032 wants to merge 5 commits intoding113:devfrom
tesgth032:perf/leaderboard-query-index

Conversation

@tesgth032
Copy link
Contributor

@tesgth032 tesgth032 commented Mar 2, 2026

背景

供应商「缓存命中率」排行榜在缓存未命中时会对 usage_ledger 做两次聚合(provider 级 + provider+model 级),在数据量较大时开销偏高。

改动

  • src/repository/leaderboard.ts:将缓存命中率榜单改为仅执行一次 provider+model 聚合查询,并在应用层汇总 provider 级统计(减少一次 DB 聚合扫描);同时对每个 provider 的 modelStats 做稳定排序,避免顺序漂移,并为 provider 排名增加稳定第三排序键。
  • 对 model 字段做 TRIM + NULLIF 归一化,避免空白模型名造成无意义分组;模型排行榜在 SQL 层排除空模型,减少无效分组回传。
  • src/drizzle/schema.ts + 迁移:为 usage_ledger 增加 partial index idx_usage_ledger_cache_required_created_atblocked_by IS NULL 且 cache tokens > 0),加速按时间范围统计。
  • tests/unit/repository/leaderboard-provider-metrics.test.ts:补齐 billingModelSource: "original" 分支覆盖;并在空模型过滤用例里额外断言 provider 聚合总量不丢失。

兼容性

  • 返回字段与排序规则保持不变;totalTokens 仍按原语义保留(deprecated,等同 totalInputTokens)。

验证

  • bun run lint
  • bun run typecheck
  • bun run test
  • bun run build

Greptile Summary

This PR successfully optimizes the cache hit rate leaderboard by reducing database load from 2 aggregation queries to 1, with provider-level statistics now aggregated at the application layer.

Key improvements:

  • Adds partial index idx_usage_ledger_cache_required_created_at on usage_ledger to accelerate time-range queries for cache-enabled entries
  • Introduces buildNormalizedModelField() helper using TRIM + NULLIF to normalize empty/whitespace model names to NULL
  • Consolidates cache hit rate query: executes single provider+model aggregation, then derives provider totals in application code (eliminates redundant DB scan)
  • Implements stable sorting for modelStats by (cacheHitRate desc, totalRequests desc, model name asc) to prevent ordering drift
  • Filters empty models at SQL level for model leaderboard using isNotNull(modelField) instead of post-query filtering
  • Changes from getSystemSettings to getCachedSystemSettings for better cache utilization

Behavior preservation:

  • Empty/blank model names excluded from modelStats array but still contribute to provider-level totals (intentional, verified by tests)
  • Return fields and ranking order unchanged; totalTokens deprecated field maintained for backward compatibility
  • Test coverage expanded for billingModelSource: "original" branch and empty model handling

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - it's a well-tested performance optimization with no breaking changes
  • Score reflects solid implementation of a performance optimization with comprehensive test coverage. The logic change from dual DB queries to single query + application aggregation is sound and tested. Model name normalization and stable sorting prevent edge cases. Minor confidence reduction due to complexity of aggregation logic migration, but extensive test assertions verify correctness including edge cases (empty models, multiple providers, sorting stability).
  • No files require special attention - the changes are well-structured and tested

Important Files Changed

Filename Overview
drizzle/0078_dear_alex_power.sql Adds partial index on usage_ledger.created_at with cache-required filter for optimized leaderboard queries
src/drizzle/schema.ts Adds index definition matching migration, properly scoped to cache-enabled and non-blocked ledger entries
src/repository/leaderboard.ts Optimizes cache hit rate query from 2 DB aggregations to 1, adds model name normalization, stable sorting for modelStats, and filters empty models at SQL level
tests/unit/repository/leaderboard-provider-metrics.test.ts Adds coverage for billingModelSource: "original" branch and verifies empty model filtering preserves provider-level totals

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start[Cache Hit Rate Leaderboard Request] --> OldOrNew{Implementation}
    
    OldOrNew -->|Before| Old1[Query 1: Provider-level aggregation<br/>GROUP BY providerId]
    Old1 --> Old2[Query 2: Model-level aggregation<br/>GROUP BY providerId, model]
    Old2 --> OldMerge[Merge results<br/>Attach modelStats to providers]
    OldMerge --> OldReturn[Return leaderboard]
    
    OldOrNew -->|After| New1[Single Query: Model-level aggregation<br/>GROUP BY providerId, providerName, model<br/>with NULLIF TRIM model normalization]
    New1 --> NewAgg[Application-layer aggregation:<br/>- Accumulate provider totals from model rows<br/>- Filter null models from modelStats<br/>- Stable sort modelStats]
    NewAgg --> NewSort[Sort providers by:<br/>cacheHitRate desc, totalRequests desc, providerId asc]
    NewSort --> NewReturn[Return leaderboard]
    
    NewReturn -.Index acceleration.-> Index[Partial Index:<br/>idx_usage_ledger_cache_required_created_at<br/>WHERE blockedBy IS NULL AND cache tokens > 0]
    
    style Old1 fill:#ffcccc
    style Old2 fill:#ffcccc
    style New1 fill:#ccffcc
    style Index fill:#cce5ff
Loading

Last reviewed commit: 0f1f682

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在通过优化数据库查询和引入新的索引来提升供应商缓存命中率排行榜的性能。核心改动在于将原有的双层数据库聚合查询精简为单层模型级别聚合,并在应用层进行数据汇总,从而减少了数据库负载。同时,通过模型字段的归一化处理,提升了排行榜数据的准确性和可用性。

Highlights

  • 性能优化: 重构了供应商缓存命中率排行榜的查询逻辑,将两次数据库聚合查询(供应商级别和供应商+模型级别)优化为仅执行一次供应商+模型级别的聚合,并在应用层汇总供应商级别统计,显著降低了大数据量下的查询开销。
  • 数据库索引: 为 usage_ledger 表新增了一个名为 idx_usage_ledger_cache_required_created_at 的部分索引,以加速涉及缓存请求的时间范围查询。
  • 数据归一化: 对模型字段(model)应用了 TRIM + NULLIF 函数进行归一化处理,避免因空白模型名导致无意义的分组,提高了数据准确性。
  • 测试更新: 更新了相关的单元测试,以适应新的排行榜计算逻辑和数据结构,确保了变更的正确性。
Changelog
  • drizzle/0078_dear_alex_power.sql
    • 新增了一个迁移脚本,用于在 usage_ledger 表上创建部分索引 idx_usage_ledger_cache_required_created_at
  • drizzle/meta/_journal.json
    • 更新了 Drizzle 迁移日志,记录了新的迁移文件 0078_dear_alex_power.sql
  • src/drizzle/schema.ts
    • usageLedger 表的模式定义中,添加了新的部分索引 usageLedgerCacheRequiredCreatedAtIdx,以优化缓存命中率排行榜的查询性能。
  • src/repository/leaderboard.ts
    • getSystemSettings 替换为 getCachedSystemSettings,以利用缓存的系统设置。
    • 简化了 cacheRequiredCondition 的 SQL 表达式,移除了 COALESCE 函数。
    • 移除了原有的直接针对供应商级别的聚合查询,改为在应用层从模型级别聚合数据中派生。
    • 引入了 clampRatio01 辅助函数,用于将比率限制在 0 到 1 之间。
    • 修改了模型字段的选取逻辑,增加了 TRIMNULLIF 以归一化模型名称。
    • 在模型级别查询中增加了 providerNametotalCostcacheCreationCost 字段。
    • 重构了排行榜的计算逻辑,先进行模型级别聚合,然后在应用层根据这些聚合数据计算并排序供应商级别的排行榜条目。
  • tests/unit/repository/leaderboard-provider-metrics.test.ts
    • 更新了单元测试中的模拟数据和断言,以适应新的排行榜计算逻辑,特别是供应商级别字段现在是从模型聚合中派生的这一变化。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

新增针对此次缓存相关条件且未被阻塞行的部分索引并在 schema 中导出;将 provider 聚合改为先按模型聚合再在内存合并,更新相应单元测试与迁移日记条目。

Changes

Cohort / File(s) Summary
数据库迁移与元数据
drizzle/0078_dear_alex_power.sql, drizzle/meta/_journal.json
新增 SQL 迁移:按需创建部分索引 idx_usage_ledger_cache_required_created_at(基于 created_at,WHERE: blocked_by IS NULL AND (cache_creation_input_tokens > 0 OR cache_read_input_tokens > 0));在 _journal.json 中插入对应迁移条目。
数据库模式声明
src/drizzle/schema.ts
在导出的 schema 中新增部分索引声明 usageLedgerCacheRequiredCreatedAtIdx,与迁移中的 WHERE 子句一致并导出该索引标识。
排行榜聚合与逻辑重构
src/repository/leaderboard.ts
将 provider 级聚合改为先按模型汇总再在内存合并;引入 getCachedSystemSettings、模型字段规范化 helper、clampRatio01,扩展 modelStats 与 provider 层聚合字段(如 totalCost、cacheCreationCost、totalInputTokens、cacheHitRate 等),并调整排序与空模型过滤逻辑。
单元测试更新
tests/unit/repository/leaderboard-provider-metrics.test.ts
测试改为使用 getCachedSystemSettings 的 mock,更新 mock 数据以包含 providerName/model 与数值型成本字段,调整断言以匹配新的聚合输出及模型明细。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确总结了主要改动:性能优化缓存命中率榜单查询并添加数据库索引,与变更内容(减少数据库聚合次数、添加partial index)相符。
Description check ✅ Passed 拉取请求的描述与变更集相关且详细说明了所有关键改动,包括数据库查询优化、索引添加、模型字段归一化和测试更新。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added enhancement New feature or request area:statistics labels Mar 2, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR aims to optimize the query performance of the cache hit rate leaderboard. No security vulnerabilities were found. The changes are excellent, effectively reducing database pressure by optimizing two database aggregate queries into one and summarizing data at the application layer. The new partial index idx_usage_ledger_cache_required_created_at for usage_ledger is well-designed, and model names are normalized using TRIM and NULLIF. The use of cached getCachedSystemSettings instead of getSystemSettings is a good practice, and updated unit tests ensure the correctness of the refactored logic.

agg.totalInputTokens += row.totalInputTokens;
providerAggById.set(row.providerId, agg);

const model = row.model?.trim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在数据库查询中,modelField 已经通过 NULLIF(TRIM(${rawModelField}), '') 进行了处理。这意味着从数据库返回的 row.model 字段要么是 null,要么是已经去除前后空格的字符串。因此,此处的 .trim() 是不必要的。

建议直接使用 row.model 以避免冗余操作。

Suggested change
const model = row.model?.trim();
const model = row.model;

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/repository/leaderboard.ts (1)

575-576: 可选优化:冗余的空值检查。

row.model?.trim() 的空值检查在数据库层已通过 NULLIF(TRIM(...), '') 处理,此处的 ?.trim()if (!model) continue 作为防御性代码是可接受的,但理论上在正常情况下 row.model 要么是非空字符串,要么是 null(不会是空字符串)。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repository/leaderboard.ts` around lines 575 - 576, The DB already applies
NULLIF(TRIM(...), ''), so remove the redundant optional chaining and trimming:
replace the use of "row.model?.trim()" with direct "row.model" and change the
guard to check explicitly for null (e.g., "if (model === null) continue") or
remove the guard entirely if callers already assume non-null; update references
to the "model" variable and the existing "if (!model) continue" guard in the
block to use this simplified check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/repository/leaderboard.ts`:
- Around line 575-576: The DB already applies NULLIF(TRIM(...), ''), so remove
the redundant optional chaining and trimming: replace the use of
"row.model?.trim()" with direct "row.model" and change the guard to check
explicitly for null (e.g., "if (model === null) continue") or remove the guard
entirely if callers already assume non-null; update references to the "model"
variable and the existing "if (!model) continue" guard in the block to use this
simplified check.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2769a75 and db421e2.

📒 Files selected for processing (6)
  • drizzle/0078_dear_alex_power.sql
  • drizzle/meta/0078_snapshot.json
  • drizzle/meta/_journal.json
  • src/drizzle/schema.ts
  • src/repository/leaderboard.ts
  • tests/unit/repository/leaderboard-provider-metrics.test.ts

@github-actions github-actions bot added the size/L Large PR (< 1000 lines) label Mar 2, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR optimizes the cache hit rate leaderboard query by consolidating two database aggregations (provider-level + model-level) into a single model-level aggregation with application-level provider summarization. The changes also add a partial index to accelerate time-range queries for cache-required requests.

PR Size: L

  • Lines changed: 4,149 (additions: 4,048, deletions: 101)
  • Files changed: 6

Note: The large line count is primarily due to the generated Drizzle snapshot file (0078_snapshot.json - ~3,900 lines). The actual functional changes are small (~160 lines across schema, leaderboard, and tests).

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - Clean
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - Adequate
  • Code clarity - Good

Technical Assessment

Optimization Strategy: The approach of reducing from 2 DB queries to 1 with in-memory aggregation is sound for this use case.

Index Design: The partial index idx_usage_ledger_cache_required_created_at correctly targets the cacheRequiredCondition filter pattern, enabling efficient time-range scans.

COALESCE Removal: The simplification of cacheRequiredCondition (removing COALESCE wrappers) is safe because:

  1. The partial index predicate WHERE blocked_by IS NULL AND (cache_creation_input_tokens > 0 OR cache_read_input_tokens > 0) inherently excludes NULL values
  2. NULL comparisons in SQL return false, so rows with NULL token values are already excluded by the index condition

Model Normalization: The NULLIF(TRIM(...), '') pattern properly handles empty/whitespace model names, preventing meaningless groupings.

Test Updates: Tests correctly verify that provider-level metrics are properly derived from model-level aggregation.


Automated review by Claude AI

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

billingModelSource === "original"
? sql<string>`COALESCE(${usageLedger.originalModel}, ${usageLedger.model})`
: sql<string>`COALESCE(${usageLedger.model}, ${usageLedger.originalModel})`;
const modelField = sql<string>`NULLIF(TRIM(${rawModelField}), '')`;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty/blank model names now excluded from both SQL grouping and modelStats via NULLIF(TRIM()), but still contribute to provider totals. Verify this behavior change is intentional

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/leaderboard.ts
Line: 507

Comment:
Empty/blank model names now excluded from both SQL grouping and `modelStats` via `NULLIF(TRIM())`, but still contribute to provider totals. Verify this behavior change is intentional

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link

greptile-apps bot commented Mar 2, 2026

Additional Comments (1)

tests/unit/repository/leaderboard-provider-metrics.test.ts
Wrong module mocked - code now uses getCachedSystemSettings from @/lib/config/system-settings-cache, not getSystemSettings from @/repository/system-config

vi.mock("@/lib/config/system-settings-cache", () => ({
  getCachedSystemSettings: mocks.getSystemSettings,
}));
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/repository/leaderboard-provider-metrics.test.ts
Line: 89-91

Comment:
Wrong module mocked - code now uses `getCachedSystemSettings` from `@/lib/config/system-settings-cache`, not `getSystemSettings` from `@/repository/system-config`

```suggestion
vi.mock("@/lib/config/system-settings-cache", () => ({
  getCachedSystemSettings: mocks.getSystemSettings,
}));
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/repository/leaderboard.ts (1)

503-507: 建议抽取模型字段归一化逻辑,避免双处漂移

rawModelField + NULLIF(TRIM(...), '') 在两个函数里重复,后续若变更规则容易漏改。建议抽成一个小工具函数复用。

可参考的最小改动
 const clampRatio01 = (value: number | null | undefined) => Math.min(Math.max(value ?? 0, 0), 1);
+
+function buildNormalizedModelField(billingModelSource: "original" | "redirected") {
+  const rawModelField =
+    billingModelSource === "original"
+      ? sql<string>`COALESCE(${usageLedger.originalModel}, ${usageLedger.model})`
+      : sql<string>`COALESCE(${usageLedger.model}, ${usageLedger.originalModel})`;
+  return sql<string>`NULLIF(TRIM(${rawModelField}), '')`;
+}
@@
-  const rawModelField =
-    billingModelSource === "original"
-      ? sql<string>`COALESCE(${usageLedger.originalModel}, ${usageLedger.model})`
-      : sql<string>`COALESCE(${usageLedger.model}, ${usageLedger.originalModel})`;
-  const modelField = sql<string>`NULLIF(TRIM(${rawModelField}), '')`;
+  const modelField = buildNormalizedModelField(billingModelSource);
@@
-  const rawModelField =
-    billingModelSource === "original"
-      ? sql<string>`COALESCE(${usageLedger.originalModel}, ${usageLedger.model})`
-      : sql<string>`COALESCE(${usageLedger.model}, ${usageLedger.originalModel})`;
-  const modelField = sql<string>`NULLIF(TRIM(${rawModelField}), '')`;
+  const modelField = buildNormalizedModelField(billingModelSource);

Also applies to: 692-697

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repository/leaderboard.ts` around lines 503 - 507, Extract the repeated
model normalization logic into a small helper (e.g., a function named
normalizeModelField or buildModelField) that takes the two columns and
billingModelSource (or the chosen primary/secondary columns) and returns the
sql<string>`NULLIF(TRIM(COALESCE(...)), '')` expression; then replace the
duplicated rawModelField + modelField expressions (references: rawModelField and
modelField) in both locations with calls to that helper so all normalization
rules live in one place and are reused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/repository/leaderboard.ts`:
- Around line 503-507: Extract the repeated model normalization logic into a
small helper (e.g., a function named normalizeModelField or buildModelField)
that takes the two columns and billingModelSource (or the chosen
primary/secondary columns) and returns the
sql<string>`NULLIF(TRIM(COALESCE(...)), '')` expression; then replace the
duplicated rawModelField + modelField expressions (references: rawModelField and
modelField) in both locations with calls to that helper so all normalization
rules live in one place and are reused.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between db421e2 and 2112716.

📒 Files selected for processing (1)
  • src/repository/leaderboard.ts

@tesgth032 tesgth032 changed the title perf(leaderboard): 优化缓存命中率榜单查询并添加索引 [未完成] perf(leaderboard): 优化缓存命中率榜单查询并添加索引 Mar 2, 2026
@tesgth032
Copy link
Contributor Author

已根据 CodeRabbit 的 nitpick 调整:modelStats 构建时移除冗余的 row.model?.trim(),改为直接使用已在 SQL 侧归一化(NULLIF(TRIM(...), ''))后的 row.model

当前 PR 相关 Actions checks 均已通过。若看到早期 Codex PR Review 失败,为 fork PR 在 pull_request_target 下的权限校验所致(与代码无关)。

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
src/repository/leaderboard.ts (2)

713-720: 可把空模型过滤前移到 SQL,减少一次无效分组回传

Line 718-720 在应用层再过滤 null/"",语义正确;但这里可以在查询条件中直接排除空模型,进一步降低传输与 map/filter 成本。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repository/leaderboard.ts` around lines 713 - 720, Move the empty/null
model filtering into the SQL query to avoid grouping and returning useless rows:
update the query that builds rankings (the .where(...) chain using
LEDGER_BILLING_CONDITION and buildDateCondition(period, timezone, dateRange)) to
add a predicate excluding model IS NULL and model = '' (or equivalent non-empty
check on modelField) before .groupBy(modelField)/.orderBy(...); this removes the
need for the post-query .filter((entry) => entry.model !== null && entry.model
!== "") and the subsequent type assertion in the .map, reducing data transfer
and CPU work.

592-613: 建议在 provider 内对 modelStats 做一次显式稳定排序

Line 607 目前直接使用插入顺序。虽然当前大多可用,但当 cacheHitRatetotalRequests 并列时,顺序仍可能漂移。建议在回填 modelStats 时增加本地排序并加一个稳定兜底键。

可选改法
   const entries: ProviderCacheHitRateLeaderboardEntry[] = Array.from(providerAggById.entries()).map(
     ([providerId, agg]) => {
+      const modelStats = [...(modelStatsByProvider.get(providerId) ?? [])].sort(
+        (a, b) =>
+          b.cacheHitRate - a.cacheHitRate ||
+          b.totalRequests - a.totalRequests ||
+          a.model.localeCompare(b.model)
+      );
+
       const cacheHitRate =
         agg.totalInputTokens > 0 ? agg.cacheReadTokens / agg.totalInputTokens : 0;
 
       return {
         providerId,
         providerName: agg.providerName,
         totalRequests: agg.totalRequests,
         totalCost: agg.totalCost,
         cacheReadTokens: agg.cacheReadTokens,
         cacheCreationCost: agg.cacheCreationCost,
         totalInputTokens: agg.totalInputTokens,
         totalTokens: agg.totalInputTokens, // deprecated, for backward compatibility
         cacheHitRate: clampRatio01(cacheHitRate),
-        modelStats: modelStatsByProvider.get(providerId) ?? [],
+        modelStats,
       };
     }
   );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repository/leaderboard.ts` around lines 592 - 613, When building entries
in the mapping, the modelStats array is left in insertion order which can drift
when providers tie on cacheHitRate and totalRequests; change the mapping for
ProviderCacheHitRateLeaderboardEntry so you create a local copy of modelStats
(from modelStatsByProvider.get(providerId) ?? []) and perform an explicit stable
sort (e.g., sort by desired keys such as cacheHitRate desc, totalRequests desc,
then a stable fallback like modelId or modelName) before assigning it to the
modelStats property so ordering is deterministic even when top-level ties occur.
tests/unit/repository/leaderboard-provider-metrics.test.ts (2)

344-354: 建议在“空模型过滤”用例里再断言 provider 聚合总量不丢失

这个场景现在验证了 modelStats 过滤正确,但还缺一组 provider 级总量断言(如 totalRequests/totalInputTokens),否则未来若把过滤提前到聚合前,可能静默少算。

Also applies to: 372-383

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/repository/leaderboard-provider-metrics.test.ts` around lines 344
- 354, The "empty model filter" test currently asserts modelStats filtering but
doesn't verify provider-level aggregates; update the test in
leaderboard-provider-metrics.test.ts to also assert that the provider object
(providerId: 1 / providerName: "provider-a") still reports correct totalRequests
and totalInputTokens (and optionally
totalCost/cacheReadTokens/cacheCreationCost) after filtering so aggregation
isn't lost; make the same addition in the other similar test around the 372-383
block to ensure provider-level totals are preserved even if model filtering is
applied before or after aggregation.

100-101: 建议补一组 billingModelSource: "original" 分支用例

当前两个 beforeEach 都固定 "redirected",新引入的分支只覆盖到一半。建议加一个最小用例把 "original" 路径也跑通。

As per coding guidelines: **/*.test.{ts,tsx,js,jsx}: All new features must have unit test coverage of at least 80%.

Also applies to: 246-247

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/repository/leaderboard-provider-metrics.test.ts` around lines 100
- 101, Add a unit test that exercises the "original" billingModelSource path:
create a new test case in the leaderboard-provider-metrics test suite that mocks
mocks.getCachedSystemSettings to resolve { billingModelSource: "original" } and
asserts the metrics/behavior differs as expected compared to the existing
"redirected" case; ensure the new test targets the same functions/logic
exercised by the existing tests (e.g., the leaderboard metrics collection code
under test) so the "original" branch is covered and overall test coverage for
this file increases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/repository/leaderboard.ts`:
- Around line 713-720: Move the empty/null model filtering into the SQL query to
avoid grouping and returning useless rows: update the query that builds rankings
(the .where(...) chain using LEDGER_BILLING_CONDITION and
buildDateCondition(period, timezone, dateRange)) to add a predicate excluding
model IS NULL and model = '' (or equivalent non-empty check on modelField)
before .groupBy(modelField)/.orderBy(...); this removes the need for the
post-query .filter((entry) => entry.model !== null && entry.model !== "") and
the subsequent type assertion in the .map, reducing data transfer and CPU work.
- Around line 592-613: When building entries in the mapping, the modelStats
array is left in insertion order which can drift when providers tie on
cacheHitRate and totalRequests; change the mapping for
ProviderCacheHitRateLeaderboardEntry so you create a local copy of modelStats
(from modelStatsByProvider.get(providerId) ?? []) and perform an explicit stable
sort (e.g., sort by desired keys such as cacheHitRate desc, totalRequests desc,
then a stable fallback like modelId or modelName) before assigning it to the
modelStats property so ordering is deterministic even when top-level ties occur.

In `@tests/unit/repository/leaderboard-provider-metrics.test.ts`:
- Around line 344-354: The "empty model filter" test currently asserts
modelStats filtering but doesn't verify provider-level aggregates; update the
test in leaderboard-provider-metrics.test.ts to also assert that the provider
object (providerId: 1 / providerName: "provider-a") still reports correct
totalRequests and totalInputTokens (and optionally
totalCost/cacheReadTokens/cacheCreationCost) after filtering so aggregation
isn't lost; make the same addition in the other similar test around the 372-383
block to ensure provider-level totals are preserved even if model filtering is
applied before or after aggregation.
- Around line 100-101: Add a unit test that exercises the "original"
billingModelSource path: create a new test case in the
leaderboard-provider-metrics test suite that mocks mocks.getCachedSystemSettings
to resolve { billingModelSource: "original" } and asserts the metrics/behavior
differs as expected compared to the existing "redirected" case; ensure the new
test targets the same functions/logic exercised by the existing tests (e.g., the
leaderboard metrics collection code under test) so the "original" branch is
covered and overall test coverage for this file increases.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2112716 and c9c4b96.

📒 Files selected for processing (2)
  • src/repository/leaderboard.ts
  • tests/unit/repository/leaderboard-provider-metrics.test.ts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/repository/leaderboard.ts (1)

621-622: 建议为 provider 排序添加稳定的第三排序键。

modelStats 的排序使用了三级排序(cacheHitRate → totalRequests → model),但 provider 级排序只用了两级。当 cacheHitRatetotalRequests 都相等时,排序结果可能不稳定。

♻️ 建议的修改
-  entries.sort((a, b) => b.cacheHitRate - a.cacheHitRate || b.totalRequests - a.totalRequests);
+  entries.sort(
+    (a, b) =>
+      b.cacheHitRate - a.cacheHitRate ||
+      b.totalRequests - a.totalRequests ||
+      a.providerId - b.providerId
+  );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repository/leaderboard.ts` around lines 621 - 622, The provider-level
sort using entries.sort((a, b) => b.cacheHitRate - a.cacheHitRate ||
b.totalRequests - a.totalRequests) lacks a tertiary key causing instability when
cacheHitRate and totalRequests tie; update that comparator (in
src/repository/leaderboard.ts where entries.sort is used) to add a deterministic
third key such as comparing a.provider (or a.providerId) lexicographically
(e.g., localeCompare) or by id to ensure stable ordering; mirror the three-key
approach used by modelStats (cacheHitRate → totalRequests → model) so ties are
resolved consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/repository/leaderboard.ts`:
- Around line 621-622: The provider-level sort using entries.sort((a, b) =>
b.cacheHitRate - a.cacheHitRate || b.totalRequests - a.totalRequests) lacks a
tertiary key causing instability when cacheHitRate and totalRequests tie; update
that comparator (in src/repository/leaderboard.ts where entries.sort is used) to
add a deterministic third key such as comparing a.provider (or a.providerId)
lexicographically (e.g., localeCompare) or by id to ensure stable ordering;
mirror the three-key approach used by modelStats (cacheHitRate → totalRequests →
model) so ties are resolved consistently.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between c9c4b96 and a2a66f9.

📒 Files selected for processing (2)
  • src/repository/leaderboard.ts
  • tests/unit/repository/leaderboard-provider-metrics.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/repository/leaderboard-provider-metrics.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:statistics enhancement New feature or request size/L Large PR (< 1000 lines)

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant