Skip to content

perf: 常用页面读路径降负载(系统设置缓存与日志轮询)#850

Open
tesgth032 wants to merge 16 commits intoding113:devfrom
tesgth032:perf/common-pages-optim
Open

perf: 常用页面读路径降负载(系统设置缓存与日志轮询)#850
tesgth032 wants to merge 16 commits intoding113:devfrom
tesgth032:perf/common-pages-optim

Conversation

@tesgth032
Copy link
Contributor

@tesgth032 tesgth032 commented Mar 1, 2026

背景

常用页面(仪表盘 / 使用记录 / 排行榜 / 供应商管理)与代理热路径会频繁读取 system_settings;同时使用记录页在 infiniteQuery 场景下的自动刷新容易触发“全量 pages 重拉”,在多人同时打开页面时会持续放大 DB QPS 与 CPU。

相关工作

变更

  • 系统设置缓存:新增 getCachedSystemSettings()(60s 进程内 TTL + in-flight 并发去重 + Redis Pub/Sub 跨实例失效;无 Redis 自动降级为纯 TTL)
  • 保存系统设置后发布失效广播,确保各实例尽快拿到最新配置
  • 常用页面 / 接口 / 代理热路径统一改用缓存设置(减少重复 DB 读取)
  • 使用记录页自动刷新:仅轮询“最新一页”,并 merge + 去重(避免全量 refetch)
  • 写入链路:usage_ledger 触发器在 UPDATE 时跳过“无关字段更新”导致的重复 UPSERT(比较派生值 final_provider_id / is_success),显著降低写放大;新增迁移 0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • 供应商管理页:合并 providers/health/system-settings 的读取为 getProviderManagerBootstrapData(),减少瀑布式请求(约 4 个请求 -> 2 个请求);providers 列表改用 30s TTL + pub/sub 失效的进程缓存降低 DB 压力
  • 文档沉淀:docs/performance/common-pages-optimization-plan.md

行为与兼容性

  • Fail-open:读取设置失败返回上一份缓存或最小默认值,不阻塞请求;Redis 不可用时保持 TTL 行为
  • 升级无感:不改变 API 协议与页面展示逻辑(除“更快、更省资源”外尽量无体感差异)
  • 迁移兼容:仅 CREATE OR REPLACE FUNCTION,不改表结构/不 drop 对象

测试

  • bun run lint
  • bun run typecheck
  • bun run test
  • bun run build
  • 手动测试:打开仪表盘/使用记录页,验证自动刷新正常工作且 DB 查询减少
  • 手动测试:保存系统设置,验证跨实例缓存失效(多实例部署环境)

Description enhanced by Claude AI

Greptile Summary

Summary

implemented multi-layer performance optimizations targeting common pages (dashboard/usage logs/leaderboard/provider management) and proxy hot path to reduce DB QPS and CPU under concurrent load

Key optimizations:

  • System settings cache: 60s TTL process cache with Redis pub/sub cross-instance invalidation (falling back to TTL-only if Redis unavailable), reducing repeated system_settings reads across dashboard/proxy/API paths
  • Usage logs polling: refactored auto-refresh to poll only the latest page and merge into existing infinite list via deduplication, preventing react-query from refetching all pages simultaneously when multiple users have logs page open
  • Database trigger write reduction: trigger now skips UPSERT when UPDATE doesn't affect usage_ledger-relevant fields (comparing derived values like final_provider_id/is_success), and completely skips ledger creation for blocked (non-billable) requests, significantly reducing write amplification
  • Provider management request consolidation: merged providers/health/system-settings into single getProviderManagerBootstrapData() endpoint, reducing waterfall from ~4 requests to 2; provider list now uses 30s TTL + pub/sub invalidation cache
  • Fail-open design: cache reads return stale data or safe defaults on DB failure rather than blocking requests; subscription failures don't prevent TTL-based cache operation

Architecture strengths:

  • Strong consistency maintained for permission checks (getAllowGlobalUsageViewFromDB bypasses cache)
  • Comprehensive test coverage for trigger logic
  • Detailed performance documentation with risk-based roadmap (P0/P1/P2)
  • Version-based concurrency control prevents cache corruption during invalidation races

Known trade-offs (already noted in previous comments):

  • special_settings flush delay increased to 5s (larger data loss window on crash)
  • Trigger self-heal EXISTS check adds overhead to every skipped UPDATE
  • DB failure fallbacks get cached, allowing stale data across TTL windows

Confidence Score: 4/5

  • safe to merge with comprehensive testing and fail-open design ensuring production resilience
  • strong architecture with process-level caching, cross-instance invalidation, and graceful degradation paths; comprehensive test coverage validates trigger logic and edge cases; backward-compatible migrations; permission-critical paths correctly bypass cache; known trade-offs (cache staleness during prolonged failures, larger flush windows) are documented and acceptable for performance gains
  • manual verification needed: confirm cross-instance cache invalidation works in multi-instance deployment; monitor DB CPU/QPS reduction and trigger UPSERT skip rates in production; verify usage logs auto-refresh behavior under concurrent user load

Important Files Changed

Filename Overview
src/lib/config/system-settings-cache.ts introduced 60s TTL process cache with Redis pub/sub cross-instance invalidation, in-flight deduplication, and fail-open fallbacks (lines 238-241 cache fallback on DB failure)
drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql extended trigger to skip creating ledger rows for blocked (non-billable) requests (lines 18-24), reducing write amplification for rate-limited/blocked traffic
src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx polling only fetches latest page (line 193), merges with existing data via deduplication (lines 205-214), preventing infiniteQuery from refetching all pages
src/actions/providers.ts added getProviderManagerBootstrapData() consolidating providers/health/system-settings into single call (line 364), reducing waterfall requests from ~4 to 2
src/lib/cache/provider-cache.ts introduced 30s TTL provider process cache with Redis pub/sub invalidation (lines 62-65), version-based concurrency control (line 29), and request-level snapshot consistency
src/repository/system-config.ts getAllowGlobalUsageViewFromDB() uses strong-consistency direct DB read (line 282), bypassing cache for permission-critical field to avoid eventually-consistent authorization issues
docs/performance/common-pages-optimization-plan.md comprehensive documentation of optimization rationale, implementation locations (lines 79-88), and future roadmap with risk-based prioritization (P0/P1/P2)

Last reviewed commit: 160930e

@coderabbitai
Copy link

coderabbitai bot commented Mar 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

新增服务器端系统设置缓存与跨实例失效发布、服务端专用时区解析、usage_ledger 触发器优化;大量将系统设置读取切换为缓存/DB 路径;在 dashboard/leaderboard 引入 Redis 读穿缓存与分布式锁;新增 provider 管理 bootstrap 接口与前端缓存键统一;调整多处测试与迁移文件。

Changes

Cohort / File(s) Summary
系统配置缓存与失效发布
src/lib/config/index.ts, src/lib/config/system-settings-cache.ts
新增 getCachedSystemSettings 实现、Redis pub/sub(CHANNEL_SYSTEM_SETTINGS_UPDATED)、inFlightFetch/cacheGeneration 保护;新增并导出 publishSystemSettingsCacheInvalidation,替换旧 invalidate 调用。
服务端时区解析分离
src/lib/utils/timezone.server.ts, src/lib/utils/timezone.ts, src/i18n/request.ts
新增 server-only 时区解析(DB→env→UTC 回退),移除 resolveSystemTimezone 在 client 模块并在多处改为导入 timezone.server
usage_ledger 触发器与迁移
drizzle/0078_*.sql, drizzle/0079_*.sql, src/lib/ledger-backfill/trigger.sql, drizzle/meta/_journal.json, tests/unit/usage-ledger/trigger.test.ts
新增/替换 fn_upsert_usage_ledger:处理 warmup/blocked 请求、派生 final_provider_id/is_success,比较旧新字段以跳过冗余写入,并同步迁移与测试断言。
权限 DB 化与相关调用
src/repository/system-config.ts, src/actions/overview.ts, src/actions/provider-slots.ts, src/actions/dashboard-realtime.ts, src/app/.../leaderboard/page.tsx, src/app/api/leaderboard/route.ts
添加 getAllowGlobalUsageViewFromDB() 并在多处将非管理员的 allowGlobalUsageView 来源切换为 DB 读取,调整权限判断与部分错误节流日志。
统一改为缓存化系统设置(大量调用点)
多处,如 src/actions/my-usage.ts, src/actions/key-quota.ts, src/repository/leaderboard.ts, src/app/.../*.tsx, src/app/api/system-settings/route.ts, src/app/v1/_lib/proxy/*
将大量位置由 getSystemSettings() 切换为 getCachedSystemSettings()@/lib/config),并在若干后端路径引入短 TTL 的 Redis 写回(如配额/今日统计)。
Dashboard 实时与排行榜缓存
src/actions/dashboard-realtime.ts, src/lib/redis/leaderboard-cache.ts
在 dashboard 实时路径加入 Redis 读穿缓存、短 TTL、分布式锁与 Lua 解锁脚本;用缓存包装排行榜读取并增强错误/失败容忍。
日志虚拟表与受控轮询
src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx, .../virtualized-logs-table.test.tsx
引入滚动窗口分页合并与受控轮询(MAX_PAGES、去重、queryClient 缓存合并),替代简单 refetchInterval;测试新增 useQueryClient mock。
Provider 管理 bootstrap API 与 UI 改造
src/actions/providers.ts, src/app/[locale]/settings/providers/.../*
新增 getProviderManagerBootstrapData() 返回 providers+healthStatus+systemSettings,前端改为单一 bootstrap 查询并相应调整数据流与加载逻辑。
React Query 缓存键统一
多处 src/app/[locale]/settings/providers/_components/*
将原来对 ["providers"] / ["providers-health"] 的 invalidateQueries 替换为统一的 ["providers-bootstrap"],更新多个组件的失效点。
消息写缓冲调度增强
src/repository/message-write-buffer.ts
新增 specialSettingsOnlyFlushIntervalMs 与检测函数、flushDueAt 管理,按 patch 内容调度 flush 节奏,扩展 WriterConfig 类型与定时器逻辑。
认证常量重导出
src/lib/auth/constants.ts, src/lib/auth.ts, src/proxy.ts
新增 AUTH_COOKIE_NAME 常量模块并由 src/lib/auth.ts 重导出,更新 proxy 的引用路径。
测试与 mock 调整
tests/.../*
大量测试将 mock/导入从 @/repository/system-config/getSystemSettings@/lib/utils/timezone 切换为 @/lib/config/getCachedSystemSettings@/lib/utils/timezone.server;集成测试加入缓存失效调用以重置状态。
其它局部调整
若干文件(如 src/lib/security/constant-time-compare.ts、多个 timezone import、React 组件细微改动)
小命名/导入路径调整、注释与日志微调、局部重构;新增/修改少量导出(例如 providers bootstrap 类型与函数、auth 常量、timezone.server 的导出)。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ding113
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确概括了核心变更:性能优化、系统设置缓存、日志轮询机制,与 raw_summary 中文件变更内容高度相关。
Description check ✅ Passed PR 描述与代码变更相关,详细说明了常用页面读路径降负载的多个优化方向(系统设置缓存、使用记录轮询、触发器优化、供应商管理合并请求等)。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在通过引入系统设置缓存和优化使用记录页面的自动刷新机制,显著降低常用页面和代理热路径的数据库读取压力,从而提升系统整体性能和响应速度。这些改进旨在提供无感知的用户体验升级,同时为未来的性能优化奠定基础。

Highlights

  • 系统设置缓存: 引入了 getCachedSystemSettings(),提供 60 秒进程内 TTL 缓存、并发去重,并支持通过 Redis Pub/Sub 进行跨实例失效,在无 Redis 环境下自动降级为纯 TTL 缓存。
  • 使用记录自动刷新优化: 优化了使用记录页面的自动刷新机制,现在仅轮询最新一页数据,并将其与现有列表合并及去重,避免了在 infiniteQuery 场景下全量重新拉取所有页面,显著降低了数据库 QPS 和 CPU 负载。
  • 统一系统设置读取路径: 将仪表盘、使用记录、排行榜、供应商管理等常用页面以及代理热路径中对 system_settings 的频繁读取统一改为使用新的缓存函数,减少了重复的数据库查询。
  • 性能优化文档: 新增了 docs/performance/common-pages-optimization-plan.md 文档,详细阐述了常用页面和写入链路的性能优化路线图,包括目标、约束、现状分析、已落地优化和分层优化策略。
  • 时区解析重构: 将系统时区解析逻辑拆分为客户端安全版本 (timezone.ts) 和服务器端专用版本 (timezone.server.ts),并更新了相关引用,以更好地支持 Edge Runtime 环境。
Changelog
  • docs/performance/common-pages-optimization-plan.md
    • 新增了性能优化路线图文档,详细说明了常用页面和写入链路的优化策略。
  • src/actions/dashboard-realtime.ts
    • 更新了 getDashboardRealtimeData 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/actions/key-quota.ts
    • 更新了 getKeyQuotaUsage 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/actions/keys.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/actions/my-usage.ts
    • 更新了 getMyUsageMetadata, getMyTodayStats, getMyUsageLogs, getMyStatsSummary 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/actions/notifications.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/actions/overview.ts
    • 更新了 getOverviewData 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/actions/provider-slots.ts
    • 更新了 getProviderSlots 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/actions/statistics.ts
    • 更新了 getUserStatistics 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/actions/system-config.ts
    • 更新了 saveSystemSettings 函数,使用 publishSystemSettingsCacheInvalidation 替代 invalidateSystemSettingsCache 来发布缓存失效通知。
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/actions/users.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/actions/webhook-targets.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/app/[locale]/dashboard/_components/dashboard-bento-sections.tsx
    • 更新了 getCachedSystemSettings 的导入,从 repository/system-config 更改为 lib/config
  • src/app/[locale]/dashboard/_components/dashboard-sections.tsx
    • 更新了 getCachedSystemSettings 的导入,从 repository/system-config 更改为 lib/config
  • src/app/[locale]/dashboard/_components/user/forms/limit-rule-picker.tsx
    • useEffect 的依赖数组中移除了 availableTypes
  • src/app/[locale]/dashboard/leaderboard/page.tsx
    • 更新了 LeaderboardPage,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/app/[locale]/dashboard/logs/_components/usage-logs-sections.tsx
    • 更新了 getCachedSystemSettings 的导入,从 repository/system-config 更改为 lib/config
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.test.tsx
    • @tanstack/react-queryuseQueryClient 添加了 mock 实现。
  • src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx
    • 引入了 useQueryClientInfiniteData 类型。
    • 定义了 MAX_PAGES, MAX_TOTAL_LOGS, Cursor, CursorParam, UsageLogsBatchData, UsageLogWithCursor 类型。
    • 新增了 chunkArrayresolveCursorFromLog 辅助函数。
    • 修改了 useInfiniteQueryrefetchIntervalfalse
    • 实现了 useEffect 钩子,用于轮询最新一页的使用记录,并进行合并、去重和固定窗口管理,以优化自动刷新性能。
  • src/app/[locale]/dashboard/logs/_hooks/use-lazy-filter-options.ts
    • useCallback 的依赖数组中移除了 fetcher
  • src/app/[locale]/dashboard/my-quota/page.tsx
    • 更新了 MyQuotaPage,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/app/[locale]/dashboard/quotas/providers/page.tsx
    • 更新了 ProvidersQuotaContent,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/app/[locale]/dashboard/quotas/users/page.tsx
    • 更新了 UsersQuotaContent,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/app/[locale]/layout.tsx
    • 更新了 generateMetadataRootLayout,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/app/[locale]/settings/config/page.tsx
    • 更新了 SettingsConfigContent,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/app/api/leaderboard/route.ts
    • 更新了 GET 请求处理函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/app/api/system-settings/route.ts
    • 更新了 GET 请求处理函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/app/v1/_lib/proxy/provider-selector.ts
    • 移除了本地 SETTINGS_CACHE_TTL_MScachedVerboseProviderError 变量。
    • 更新了 getVerboseProviderErrorCached 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/app/v1/_lib/proxy/response-handler.ts
    • 更新了 updateRequestCostFromUsage 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/app/v1/_lib/proxy/session.ts
    • 更新了 billingModelSourcePromise 的实现,从 repository/system-config 导入 getCachedSystemSettings
  • src/app/v1/_lib/proxy/version-guard.ts
    • 更新了 ensure 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
  • src/i18n/request.ts
    • 新增了 resolveEnvTimezoneresolveRequestTimezone 函数,用于更健壮地解析时区,并支持 Edge Runtime。
    • 更新了 getRequestConfig,使用新的 resolveRequestTimezone 函数。
  • src/lib/auth.ts
    • AUTH_COOKIE_NAME 的定义移动到 src/lib/auth/constants.ts 并从那里导入。
  • src/lib/auth/constants.ts
    • 新增了 AUTH_COOKIE_NAME 常量,用于 Edge-safe 的认证常量。
  • src/lib/config/index.ts
    • 导出了 publishSystemSettingsCacheInvalidation 函数。
  • src/lib/config/system-settings-cache.ts
    • 添加了 server-only 标记。
    • 引入了 CHANNEL_SYSTEM_SETTINGS_UPDATED 常量。
    • 新增了 inFlightFetch 变量用于并发去重,以及 cacheGeneration 用于防止陈旧数据覆盖。
    • 新增了 ensureSubscription 函数,用于初始化 Redis Pub/Sub 订阅以实现跨实例缓存失效。
    • 重构了 getCachedSystemSettings,加入了并发去重逻辑、缓存世代管理和更完善的错误回退机制。
    • 更新了 invalidateSystemSettingsCache,增加了 cacheGeneration 的递增和 inFlightFetch 的重置。
    • 新增了 publishSystemSettingsCacheInvalidation 函数,用于发布跨实例缓存失效通知。
  • src/lib/notification/notification-queue.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/lib/notification/tasks/cache-hit-rate-alert.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/lib/notification/tasks/daily-leaderboard.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/lib/rate-limit/time-utils.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/lib/redis/leaderboard-cache.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/lib/security/constant-time-compare.ts
    • dummy 变量重命名为 _dummy 以避免 linting 警告。
  • src/lib/utils/timezone.server.ts
    • 新增了服务器端专用的时区解析工具文件,包含 resolveSystemTimezone 函数。
  • src/lib/utils/timezone.ts
    • 移除了服务器端时区解析逻辑,使其成为客户端安全工具文件。
  • src/proxy.ts
    • 更新了 AUTH_COOKIE_NAME 的导入路径,从 lib/auth 更改为 lib/auth/constants
  • src/repository/leaderboard.ts
    • 更新了 findProviderCacheHitRateLeaderboardWithTimezonefindModelLeaderboardWithTimezone 函数,使用 getCachedSystemSettings 替代 getSystemSettings 来获取系统设置。
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/repository/notification-bindings.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/repository/overview.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/repository/provider.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • src/repository/statistics.ts
    • resolveSystemTimezone 的导入路径从 lib/utils/timezone 更改为 lib/utils/timezone.server
  • tests/integration/billing-model-source.test.ts
    • beforeEachrunScenariorunPriceMissingScenario 函数中添加了 invalidateSystemSettingsCache 调用,以确保测试隔离性。
  • tests/unit/actions/my-usage-date-range-dst.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/actions/my-usage-token-aggregation.test.ts
    • 更新了 vi.mocklib/config 的 mock 实现,以包含 getCachedSystemSettings 的 mock。
  • tests/unit/actions/system-config-save.test.ts
    • invalidateSystemSettingsCacheMock 重命名为 publishSystemSettingsCacheInvalidationMock,并更新了相关引用。
    • 更新了 vi.mocklib/config 的 mock 实现,以反映 publishSystemSettingsCacheInvalidation 的使用。
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/lib/rate-limit/cost-limits.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/lib/rate-limit/lease.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/lib/rate-limit/rolling-window-5h.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/lib/rate-limit/rolling-window-cache-warm.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/lib/rate-limit/service-extra.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/lib/rate-limit/time-utils.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/lib/timezone/timezone-resolver.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/notification/cost-alert-window.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/proxy/session.test.ts
    • 更新了 vi.mockrepository/system-config 的 mock 为 lib/configgetCachedSystemSettings
    • 更新了测试用例中对 getSystemSettings 的引用为 getCachedSystemSettings
  • tests/unit/repository/leaderboard-provider-metrics.test.ts
    • 更新了 mocksgetSystemSettingsgetCachedSystemSettings
    • 更新了 vi.mockrepository/system-config 的 mock 为 lib/configgetCachedSystemSettings
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/repository/leaderboard-timezone-parentheses.test.ts
    • 更新了 mocksgetSystemSettingsgetCachedSystemSettings
    • 更新了 vi.mockrepository/system-config 的 mock 为 lib/configgetCachedSystemSettings
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
  • tests/unit/repository/overview-timezone-parentheses.test.ts
    • 更新了 vi.mocktimezone 的导入路径为 lib/utils/timezone.server
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added enhancement New feature or request area:core area:UI size/XL Extra Large PR (> 1000 lines) labels Mar 1, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个 PR 做的性能优化非常出色。对 system_settings 的缓存实现考虑得很周全,包括了进程内缓存、飞行中请求去重和跨实例的 Redis Pub/Sub 失效,这将显著降低数据库负载。对使用记录页面的轮询优化也很有针对性,通过只拉取最新一页数据并合并,有效避免了 infiniteQuery 的全量刷新问题。

代码重构方面,将 getSystemSettings 统一替换为新的缓存函数 getCachedSystemSettings,以及将时区解析逻辑重构为 server-only 的模块,都做得很好,提升了代码质量和性能。

我发现了一些小问题,主要是在文档和一处 useEffect 的依赖项上,具体请看我的评论。整体来说,这是一个高质量的性能优化 PR。

@@ -0,0 +1,164 @@
# 常用页面与写入链路性能优化路线图(无感升级优先)

更新时间:2026-03-01
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

更新时间 设置为了未来的日期 2026-03-01,这可能是一个笔误。建议修改为当前日期或一个过去的时间点,以避免混淆。

Suggested change
更新时间:2026-03-01
更新时间:2024-03-01

inFlightRef.current = promise;
return promise;
}, [fetcher, isLoaded]);
}, [isLoaded]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

useCallback 的依赖数组中移除了 fetcher,这违反了 react-hooks/exhaustive-deps 规则。虽然在当前使用场景下(fetcher 是一个稳定的函数引用)可能不会立即引发问题,但这会使 Hook 变得不够健壮,如果将来传入一个不稳定的 fetcher 函数,可能会导致意料之外的行为或陈旧的闭包问题。建议将 fetcher 加回依赖数组,以遵循最佳实践并提高代码的健壮性。

Suggested change
}, [isLoaded]);
}, [fetcher, isLoaded]);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/actions/statistics.ts (1)

48-56: ⚠️ Potential issue | 🟠 Major

权限判定不应依赖可过期缓存配置。

这里把 allowGlobalUsageView(决定数据可见范围)放在缓存读取路径上;当配置撤权且 Redis 失效广播不可达时,可能在 TTL 窗口内继续放行 mixed/global 视图,造成短时越权读取。

建议修复(鉴权字段走强一致读取)
 import { getSession } from "@/lib/auth";
 import { getCachedSystemSettings } from "@/lib/config";
+import { getSystemSettings } from "@/repository/system-config";
 ...
-    const settings = await getCachedSystemSettings();
+    // 鉴权相关字段建议强一致读取,避免撤权窗口
+    const settings = await getSystemSettings();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/statistics.ts` around lines 48 - 56, The permission decision
currently uses the cached settings from getCachedSystemSettings() (checking
settings.allowGlobalUsageView) to compute mode, which can allow short-term
over-privileged access if the cache is stale; change the flow so the
authorization branch for non-admin users reads allowGlobalUsageView via a
strong-consistent/non-cached read (e.g. call the non-cached getter like
getSystemSettings() or a dedicated getAllowGlobalUsageView() that bypasses
Redis) before constructing mode, keep session.user.role check as is, and use
that fresh boolean in place of settings.allowGlobalUsageView when computing the
mode variable.
src/lib/utils/timezone.ts (1)

132-141: ⚠️ Potential issue | 🟡 Minor

移除已失效的 JSDoc 残留,避免文档误导。

Line 132-141 仍在描述系统时区解析链路,但该职责已迁移到 timezone.server.ts,当前文件没有对应导出函数。

✂️ 建议修复
-/**
- * Resolves the system timezone using the fallback chain:
- *   1. DB system_settings.timezone (via cached settings)
- *   2. env TZ variable
- *   3. "UTC" as final fallback
- *
- * Each candidate is validated via isValidIANATimezone before being accepted.
- *
- * `@returns` Resolved IANA timezone identifier (always valid)
- */
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/utils/timezone.ts` around lines 132 - 141, Remove the stale JSDoc
block that describes a timezone-resolution chain from src/lib/utils/timezone.ts
(the comment that mentions DB system_settings.timezone, env TZ, and UTC) because
that logic was moved to timezone.server.ts; either delete the obsolete comment
or replace it with a short note pointing readers to timezone.server.ts for the
actual implementation, ensuring the file no longer claims to export or perform
the resolution it doesn't implement.
🧹 Nitpick comments (5)
src/app/[locale]/dashboard/logs/_hooks/use-lazy-filter-options.ts (1)

50-80: 依赖数组的变更是正确的,但建议添加注释说明原因。

fetcher 来自外层工厂函数的闭包,在 hook 生命周期内保持稳定引用,因此从依赖数组中移除是安全且正确的优化。

但这可能会触发 react-hooks/exhaustive-deps 的 lint 警告,且未来维护者可能不理解为何 fetcher 不在依赖列表中。建议添加注释说明此设计意图。

♻️ 建议添加注释
     }, [isLoaded]);
+    // Note: fetcher is intentionally omitted — it's captured from the factory closure
+    // and remains stable throughout the hook's lifecycle.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/`[locale]/dashboard/logs/_hooks/use-lazy-filter-options.ts around
lines 50 - 80, Add a brief inline comment above the useCallback definition for
load explaining why fetcher is intentionally omitted from the dependency array:
state that fetcher comes from the outer factory and is a stable closure for the
hook's lifetime, that leaving it out avoids needless re-creation, and note that
this suppresses/reacts to potential react-hooks/exhaustive-deps warnings for
future maintainers; place the comment near the useCallback(() => { ... },
[isLoaded]) declaration referencing load and fetcher so it's easy to find.
src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx (3)

246-246: 依赖数组存在冗余项。

queryKey 是基于 filters 的 memoized 值(第 110 行),因此在依赖数组中同时包含 filtersqueryKey 是冗余的。虽然这不会导致问题,但可以简化为只保留 queryKey,因为它已经包含了 filters 的语义。

♻️ 建议的改进
-  }, [autoRefreshIntervalMs, filters, queryClient, queryKey, shouldPoll]);
+  }, [autoRefreshIntervalMs, queryClient, queryKey, shouldPoll]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/`[locale]/dashboard/logs/_components/virtualized-logs-table.tsx at
line 246, 在 useEffect 的依赖数组中同时包含 memoized 的 queryKey 和其来源 filters 是冗余的;将依赖数组从
[autoRefreshIntervalMs, filters, queryClient, queryKey, shouldPoll] 简化为移除
filters,只保留 queryKey(queryKey 在 memoized 表达式处定义),这样保留相同语义并避免重复依赖(参考 queryKey 和
filters 的定义以定位修改点)。

43-43: createdAtRaw 属性未被使用。

UsageLogWithCursor 类型定义了可选的 createdAtRaw 属性,但在代码中从未设置该属性。resolveCursorFromLog 会直接回退到 createdAt.toISOString()。如果这是为未来功能预留的,建议添加注释说明;否则可以移除简化类型定义。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/`[locale]/dashboard/logs/_components/virtualized-logs-table.tsx at
line 43, The optional createdAtRaw field on the UsageLogWithCursor type is never
set or used by resolveCursorFromLog; either remove createdAtRaw from the type to
simplify the model, or explicitly populate it where logs are mapped (e.g., when
building UsageLogsBatchData items) and update resolveCursorFromLog to prefer
log.createdAtRaw before falling back to createdAt.toISOString(); choose one and
add a short comment explaining the intent if you keep it.

232-236: 空 catch 块可能影响问题排查。

当前实现静默忽略轮询错误。虽然注释说明了原因(用户仍可手动刷新),但在开发和生产环境中完全静默可能会使问题排查变得困难。

建议至少在开发环境添加日志:

♻️ 建议的改进
       } catch {
-        // Ignore polling errors (manual refresh still available).
+        // Ignore polling errors in production (manual refresh still available).
+        // Log in development for debugging purposes.
+        if (process.env.NODE_ENV === "development") {
+          console.debug("[VirtualizedLogsTable] Polling error ignored");
+        }
       } finally {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/`[locale]/dashboard/logs/_components/virtualized-logs-table.tsx
around lines 232 - 236, The empty catch in the polling logic of
virtualized-logs-table.tsx silently swallows errors; change the catch to capture
the error (e) and log it in non-production environments (e.g., if
process.env.NODE_ENV !== 'production' or using your existing logger) while
keeping the current behavior for production, making sure the finally block still
sets refreshInFlightRef.current = false; locate the try/catch around the
polling/refresh function where refreshInFlightRef is referenced to implement
this.
src/lib/config/system-settings-cache.ts (1)

78-116: 订阅初始化失败后建议加重试冷却,避免热路径反复尝试。

当前在 Redis 不可达且环境变量开启的情况下,ensureSubscription() 可能被每次读取触发一次初始化尝试。建议增加最小重试间隔,降低高并发下的额外开销。

♻️ 建议改动
 let subscriptionInitialized = false;
 let subscriptionInitPromise: Promise<void> | null = null;
+let lastSubscriptionAttemptAt = 0;
+const SUBSCRIPTION_RETRY_COOLDOWN_MS = 30 * 1000;

 async function ensureSubscription(): Promise<void> {
   if (subscriptionInitialized) return;
   if (subscriptionInitPromise) return subscriptionInitPromise;
+  const now = Date.now();
+  if (now - lastSubscriptionAttemptAt < SUBSCRIPTION_RETRY_COOLDOWN_MS) return;
+  lastSubscriptionAttemptAt = now;

   subscriptionInitPromise = (async () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/config/system-settings-cache.ts` around lines 78 - 116,
ensureSubscription may be retried on every call when subscription fails; add a
cooldown to avoid hot-path repeated attempts by recording a
lastSubscriptionAttempt timestamp and a MIN_RETRY_INTERVAL constant (e.g. 30s).
In the ensureSubscription function, before starting a new
subscriptionInitPromise check the last attempt time and if now -
lastSubscriptionAttempt < MIN_RETRY_INTERVAL return early (no-op or resolved
promise) so callers don't trigger new subscribeCacheInvalidation calls; on each
actual attempt update lastSubscriptionAttempt, and on failure keep
subscriptionInitialized false but ensure subscriptionInitPromise is cleared so
the next allowed retry after the cooldown can run. Reference symbols:
ensureSubscription, subscriptionInitPromise, subscriptionInitialized,
subscribeCacheInvalidation, lastSubscriptionAttempt, MIN_RETRY_INTERVAL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/actions/statistics.ts`:
- Around line 48-56: The permission decision currently uses the cached settings
from getCachedSystemSettings() (checking settings.allowGlobalUsageView) to
compute mode, which can allow short-term over-privileged access if the cache is
stale; change the flow so the authorization branch for non-admin users reads
allowGlobalUsageView via a strong-consistent/non-cached read (e.g. call the
non-cached getter like getSystemSettings() or a dedicated
getAllowGlobalUsageView() that bypasses Redis) before constructing mode, keep
session.user.role check as is, and use that fresh boolean in place of
settings.allowGlobalUsageView when computing the mode variable.

In `@src/lib/utils/timezone.ts`:
- Around line 132-141: Remove the stale JSDoc block that describes a
timezone-resolution chain from src/lib/utils/timezone.ts (the comment that
mentions DB system_settings.timezone, env TZ, and UTC) because that logic was
moved to timezone.server.ts; either delete the obsolete comment or replace it
with a short note pointing readers to timezone.server.ts for the actual
implementation, ensuring the file no longer claims to export or perform the
resolution it doesn't implement.

---

Nitpick comments:
In `@src/app/`[locale]/dashboard/logs/_components/virtualized-logs-table.tsx:
- Line 246: 在 useEffect 的依赖数组中同时包含 memoized 的 queryKey 和其来源 filters 是冗余的;将依赖数组从
[autoRefreshIntervalMs, filters, queryClient, queryKey, shouldPoll] 简化为移除
filters,只保留 queryKey(queryKey 在 memoized 表达式处定义),这样保留相同语义并避免重复依赖(参考 queryKey 和
filters 的定义以定位修改点)。
- Line 43: The optional createdAtRaw field on the UsageLogWithCursor type is
never set or used by resolveCursorFromLog; either remove createdAtRaw from the
type to simplify the model, or explicitly populate it where logs are mapped
(e.g., when building UsageLogsBatchData items) and update resolveCursorFromLog
to prefer log.createdAtRaw before falling back to createdAt.toISOString();
choose one and add a short comment explaining the intent if you keep it.
- Around line 232-236: The empty catch in the polling logic of
virtualized-logs-table.tsx silently swallows errors; change the catch to capture
the error (e) and log it in non-production environments (e.g., if
process.env.NODE_ENV !== 'production' or using your existing logger) while
keeping the current behavior for production, making sure the finally block still
sets refreshInFlightRef.current = false; locate the try/catch around the
polling/refresh function where refreshInFlightRef is referenced to implement
this.

In `@src/app/`[locale]/dashboard/logs/_hooks/use-lazy-filter-options.ts:
- Around line 50-80: Add a brief inline comment above the useCallback definition
for load explaining why fetcher is intentionally omitted from the dependency
array: state that fetcher comes from the outer factory and is a stable closure
for the hook's lifetime, that leaving it out avoids needless re-creation, and
note that this suppresses/reacts to potential react-hooks/exhaustive-deps
warnings for future maintainers; place the comment near the useCallback(() => {
... }, [isLoaded]) declaration referencing load and fetcher so it's easy to
find.

In `@src/lib/config/system-settings-cache.ts`:
- Around line 78-116: ensureSubscription may be retried on every call when
subscription fails; add a cooldown to avoid hot-path repeated attempts by
recording a lastSubscriptionAttempt timestamp and a MIN_RETRY_INTERVAL constant
(e.g. 30s). In the ensureSubscription function, before starting a new
subscriptionInitPromise check the last attempt time and if now -
lastSubscriptionAttempt < MIN_RETRY_INTERVAL return early (no-op or resolved
promise) so callers don't trigger new subscribeCacheInvalidation calls; on each
actual attempt update lastSubscriptionAttempt, and on failure keep
subscriptionInitialized false but ensure subscriptionInitPromise is cleared so
the next allowed retry after the cooldown can run. Reference symbols:
ensureSubscription, subscriptionInitPromise, subscriptionInitialized,
subscribeCacheInvalidation, lastSubscriptionAttempt, MIN_RETRY_INTERVAL.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2769a75 and 5fe92d1.

📒 Files selected for processing (66)
  • docs/performance/common-pages-optimization-plan.md
  • src/actions/dashboard-realtime.ts
  • src/actions/key-quota.ts
  • src/actions/keys.ts
  • src/actions/my-usage.ts
  • src/actions/notifications.ts
  • src/actions/overview.ts
  • src/actions/provider-slots.ts
  • src/actions/statistics.ts
  • src/actions/system-config.ts
  • src/actions/users.ts
  • src/actions/webhook-targets.ts
  • src/app/[locale]/dashboard/_components/dashboard-bento-sections.tsx
  • src/app/[locale]/dashboard/_components/dashboard-sections.tsx
  • src/app/[locale]/dashboard/_components/user/forms/limit-rule-picker.tsx
  • src/app/[locale]/dashboard/leaderboard/page.tsx
  • src/app/[locale]/dashboard/logs/_components/usage-logs-sections.tsx
  • src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.test.tsx
  • src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx
  • src/app/[locale]/dashboard/logs/_hooks/use-lazy-filter-options.ts
  • src/app/[locale]/dashboard/my-quota/page.tsx
  • src/app/[locale]/dashboard/quotas/providers/page.tsx
  • src/app/[locale]/dashboard/quotas/users/page.tsx
  • src/app/[locale]/layout.tsx
  • src/app/[locale]/settings/config/page.tsx
  • src/app/api/leaderboard/route.ts
  • src/app/api/system-settings/route.ts
  • src/app/v1/_lib/proxy/provider-selector.ts
  • src/app/v1/_lib/proxy/response-handler.ts
  • src/app/v1/_lib/proxy/session.ts
  • src/app/v1/_lib/proxy/version-guard.ts
  • src/i18n/request.ts
  • src/lib/auth.ts
  • src/lib/auth/constants.ts
  • src/lib/config/index.ts
  • src/lib/config/system-settings-cache.ts
  • src/lib/notification/notification-queue.ts
  • src/lib/notification/tasks/cache-hit-rate-alert.ts
  • src/lib/notification/tasks/daily-leaderboard.ts
  • src/lib/rate-limit/time-utils.ts
  • src/lib/redis/leaderboard-cache.ts
  • src/lib/security/constant-time-compare.ts
  • src/lib/utils/timezone.server.ts
  • src/lib/utils/timezone.ts
  • src/proxy.ts
  • src/repository/leaderboard.ts
  • src/repository/notification-bindings.ts
  • src/repository/overview.ts
  • src/repository/provider.ts
  • src/repository/statistics.ts
  • tests/integration/billing-model-source.test.ts
  • tests/unit/actions/my-usage-date-range-dst.test.ts
  • tests/unit/actions/my-usage-token-aggregation.test.ts
  • tests/unit/actions/system-config-save.test.ts
  • tests/unit/lib/rate-limit/cost-limits.test.ts
  • tests/unit/lib/rate-limit/lease.test.ts
  • tests/unit/lib/rate-limit/rolling-window-5h.test.ts
  • tests/unit/lib/rate-limit/rolling-window-cache-warm.test.ts
  • tests/unit/lib/rate-limit/service-extra.test.ts
  • tests/unit/lib/rate-limit/time-utils.test.ts
  • tests/unit/lib/timezone/timezone-resolver.test.ts
  • tests/unit/notification/cost-alert-window.test.ts
  • tests/unit/proxy/session.test.ts
  • tests/unit/repository/leaderboard-provider-metrics.test.ts
  • tests/unit/repository/leaderboard-timezone-parentheses.test.ts
  • tests/unit/repository/overview-timezone-parentheses.test.ts

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] [ERROR-NO-USER-FEEDBACK] Missing error handling for Redis publish failure

Why this is a problem: The publishSystemSettingsCacheInvalidation function calls await publishCacheInvalidation() without try/catch. If Redis publish fails (e.g., Redis connection issue, network timeout), the error will propagate up and cause the saveSystemSettings action to fail - blocking users from saving system settings.

This defeats the fail-open design principle. The cross-instance invalidation is a "nice-to-have" optimization, but failing the entire save operation is worse than having other instances not receive the invalidation immediately.

Suggested fix:

export async function publishSystemSettingsCacheInvalidation(): Promise<void> {
  invalidateSystemSettingsCache();

  const redisUrl = process.env.REDIS_URL?.trim();
  const rateLimitRaw = process.env.ENABLE_RATE_LIMIT?.trim();
  const isRateLimitEnabled = rateLimitRaw !== "false" && rateLimitRaw !== "0";

  if (!redisUrl || !isRateLimitEnabled) return;

  try {
    await publishCacheInvalidation(CHANNEL_SYSTEM_SETTINGS_UPDATED);
    logger.debug("[SystemSettingsCache] Published cache invalidation");
  } catch (error) {
    // Fail-open: don't block settings save if pub/sub fails
    logger.warn("[SystemSettingsCache] Failed to publish cache invalidation", { error });
  }
}

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] [TEST-INCOMPLETE] Missing tests for concurrent fetch deduplication and cache generation

Why this is a problem: The new getCachedSystemSettings() function introduces complex logic for:

  • inFlightFetch - deduplication of concurrent cache misses
  • cacheGeneration - preventing stale in-flight fetches from overwriting newer cache

These are critical code paths that prevent thundering herd and cache stampede issues. The existing test file (tests/unit/lib/config/system-settings-cache.test.ts) does not cover these scenarios.

Suggested fix: Add tests for:

  1. Concurrent requests should share the same in-flight fetch (deduplication)
  2. Cache invalidation during in-flight fetch should not overwrite with stale data
  3. Multiple rapid invalidations should be handled correctly

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR introduces system settings caching with Redis Pub/Sub for cross-instance invalidation to reduce database load on common pages (dashboard, usage logs, leaderboard, provider management). The changes include a new in-memory cache with TTL, concurrent fetch deduplication, and cache generation tracking.

PR Size: XL

  • Lines changed: 910 (674 additions, 236 deletions)
  • Files changed: 66
  • This is a large PR that touches multiple layers (actions, API routes, proxy pipeline, UI components). Consider splitting into smaller PRs for future changes.

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 1 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 1 0
Simplification 0 0 0 0

High Priority Issues (Should Fix)

  1. [ERROR-NO-USER-FEEDBACK] publishSystemSettingsCacheInvalidation lacks error handling for Redis publish - can block system settings save on Redis failure (line 1438 in diff)

Medium Priority Issues (Consider Fixing)

  1. [TEST-INCOMPLETE] Concurrent fetch deduplication (inFlightFetch, cacheGeneration) not covered by tests - critical for preventing cache stampede

Review Coverage

  • Logic and correctness - Clean
  • Security (OWASP Top 10) - Clean
  • Error handling - 1 issue found
  • Type safety - Clean
  • Documentation accuracy - Clean
  • Test coverage - 1 gap identified
  • Code clarity - Good

Automated review by Claude AI

@tesgth032
Copy link
Contributor Author

针对 @gemini-code-assist 的两点建议说明一下:

  1. docs/performance/common-pages-optimization-plan.md更新时间:2026-03-01 不是笔误,本 PR 的编写时间就是 2026-03-01(今日),因此保持不改。

  2. useCallback 依赖项:createLazyFilterHook(fetcher) 中的 fetcher 是外层闭包参数,在 hook 生命周期内是稳定常量(不会随着 render 变化)。Biome 的 useExhaustiveDependencies 也会提示把它放进 deps 属于“多余依赖”。因此这里保留 load 的依赖为 [isLoaded],避免引入新的 lint warning,同时不会产生陈旧闭包问题。

@gemini-code-assist
Copy link
Contributor

感谢您的详细说明!

  1. 关于 docs/performance/common-pages-optimization-plan.md 中的日期,我理解了。如果 2026-03-01 是PR创建的实际日期,那么保持不变是完全合理的。
  2. 关于 useCallback 依赖项中 fetcher 的处理,我明白了。考虑到 fetcher 作为外层闭包参数的稳定性以及 Biome 的 useExhaustiveDependencies 规则,您目前的实现方式是正确的,并且避免了不必要的 lint 警告和潜在的陈旧闭包问题。感谢您的澄清!

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

85 files reviewed, 13 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Mar 1, 2026

Additional Comments (2)

src/lib/log-cleanup/cleanup-queue.ts
consider using getCachedSystemSettings for consistency with other system settings reads

import { getCachedSystemSettings } from "@/lib/config";
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/log-cleanup/cleanup-queue.ts
Line: 7

Comment:
consider using `getCachedSystemSettings` for consistency with other system settings reads

```suggestion
import { getCachedSystemSettings } from "@/lib/config";
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


src/repository/cache-hit-rate-alert.ts
consider using getCachedSystemSettings for consistency

import { getCachedSystemSettings } from "@/lib/config/system-settings-cache";
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/cache-hit-rate-alert.ts
Line: 8

Comment:
consider using `getCachedSystemSettings` for consistency

```suggestion
import { getCachedSystemSettings } from "@/lib/config/system-settings-cache";
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@coderabbitai coderabbitai bot requested a review from ding113 March 1, 2026 07:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/actions/providers.ts (2)

363-400: Bootstrap 函数实现合理,但 forceRefresh: true 可能影响缓存效果

getProviderManagerBootstrapData 有效合并了多个请求,减少瀑布式查询。但 Line 379 使用 forceRefresh: true 会强制刷新 Redis 缓存,这在每次页面加载时可能增加 Redis 负载。

建议确认:是否需要在 bootstrap 场景下强制刷新?如果前端已有轮询机制(如健康状态监控),可考虑改为 forceRefresh: false 以利用 Redis 缓存。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/providers.ts` around lines 363 - 400, The bootstrap function
getProviderManagerBootstrapData currently calls
getAllHealthStatusAsync(providerIds, { forceRefresh: true }), which forces a
Redis refresh on every page load; change this call to avoid forcing refresh by
default (e.g., set forceRefresh: false or make it configurable) so the cached
health status is used during normal bootstrap; locate the call to
getAllHealthStatusAsync and update the options or add a conditional (based on a
new parameter/env or a "isBootstrapRefresh" flag) to only set forceRefresh: true
when an explicit fresh read is required.

383-393: Object.entries + forEach 可简化为 Object.fromEntries

当前实现正确,但可以用更函数式的方式重构以提升可读性:

可选重构建议
-  const healthStatus: ProviderHealthStatusMap = {};
-  Object.entries(healthStatusRaw).forEach(([providerId, health]) => {
-    healthStatus[Number(providerId)] = {
-      circuitState: health.circuitState,
-      failureCount: health.failureCount,
-      lastFailureTime: health.lastFailureTime,
-      circuitOpenUntil: health.circuitOpenUntil,
-      recoveryMinutes: health.circuitOpenUntil
-        ? Math.ceil((health.circuitOpenUntil - now) / 60000)
-        : null,
-    };
-  });
+  const healthStatus: ProviderHealthStatusMap = Object.fromEntries(
+    Object.entries(healthStatusRaw).map(([providerId, health]) => [
+      Number(providerId),
+      {
+        circuitState: health.circuitState,
+        failureCount: health.failureCount,
+        lastFailureTime: health.lastFailureTime,
+        circuitOpenUntil: health.circuitOpenUntil,
+        recoveryMinutes: health.circuitOpenUntil
+          ? Math.ceil((health.circuitOpenUntil - now) / 60000)
+          : null,
+      },
+    ])
+  );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/providers.ts` around lines 383 - 393, Replace the imperative
Object.entries(...).forEach block that mutates healthStatus with a functional
Object.fromEntries transformation: map over Object.entries(healthStatusRaw) to
produce [Number(providerId), { circuitState, failureCount, lastFailureTime,
circuitOpenUntil, recoveryMinutes }] pairs (compute recoveryMinutes as the
current code does using now and circuitOpenUntil), then assign the resulting
object to healthStatus; reference healthStatusRaw, healthStatus, now, and
circuitOpenUntil to locate the relevant variables and preserve the exact
property names and calculations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/actions/providers.ts`:
- Around line 363-400: The bootstrap function getProviderManagerBootstrapData
currently calls getAllHealthStatusAsync(providerIds, { forceRefresh: true }),
which forces a Redis refresh on every page load; change this call to avoid
forcing refresh by default (e.g., set forceRefresh: false or make it
configurable) so the cached health status is used during normal bootstrap;
locate the call to getAllHealthStatusAsync and update the options or add a
conditional (based on a new parameter/env or a "isBootstrapRefresh" flag) to
only set forceRefresh: true when an explicit fresh read is required.
- Around line 383-393: Replace the imperative Object.entries(...).forEach block
that mutates healthStatus with a functional Object.fromEntries transformation:
map over Object.entries(healthStatusRaw) to produce [Number(providerId), {
circuitState, failureCount, lastFailureTime, circuitOpenUntil, recoveryMinutes
}] pairs (compute recoveryMinutes as the current code does using now and
circuitOpenUntil), then assign the resulting object to healthStatus; reference
healthStatusRaw, healthStatus, now, and circuitOpenUntil to locate the relevant
variables and preserve the exact property names and calculations.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5fe92d1 and 02aad0f.

📒 Files selected for processing (8)
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • drizzle/meta/0078_snapshot.json
  • drizzle/meta/_journal.json
  • src/actions/providers.ts
  • src/app/[locale]/settings/providers/_components/provider-manager-loader.tsx
  • src/lib/ledger-backfill/trigger.sql
  • tests/unit/actions/providers.test.ts
  • tests/unit/usage-ledger/trigger.test.ts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/performance/common-pages-optimization-plan.md (1)

101-101: 术语建议统一为单数 endpoint,避免歧义。

“合并为单一 endpoints”建议改为“合并为单一 endpoint”或“合并为统一接口”,提升文档专业性与可读性。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/performance/common-pages-optimization-plan.md` at line 101, Update the
phrase "合并为单一 endpoints" to use the singular term to avoid ambiguity: replace it
with either "合并为单一 endpoint" or "合并为统一接口" in the sentence that currently reads
"供应商管理:将 providers/health/statistics 的刷新节奏错峰,或合并为单一 endpoints(需要评估 UI 代码改动范围)。"
so the documentation reads clearly and consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/performance/common-pages-optimization-plan.md`:
- Around line 49-55: The document lists two different page entry locations for
the Providers page (one references page.tsx and another references
settings/providers); update the wording to clearly distinguish the actual page
entry file (the Providers page component, e.g., page.tsx) from the reused
settings/providers component directory and make them consistent across the doc
(replace the incorrect path mention so both references point to the true entry
and note that it reuses the settings/providers component). Ensure mentions of
getProviders, getProvidersHealthStatus, and getProviderStatisticsAsync remain
correct while consolidating the “页面入口” and “复用组件目录” descriptions.

---

Nitpick comments:
In `@docs/performance/common-pages-optimization-plan.md`:
- Line 101: Update the phrase "合并为单一 endpoints" to use the singular term to
avoid ambiguity: replace it with either "合并为单一 endpoint" or "合并为统一接口" in the
sentence that currently reads "供应商管理:将 providers/health/statistics
的刷新节奏错峰,或合并为单一 endpoints(需要评估 UI 代码改动范围)。" so the documentation reads clearly
and consistently.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between b480660 and 9ab60a8.

📒 Files selected for processing (1)
  • docs/performance/common-pages-optimization-plan.md

@coderabbitai
Copy link

coderabbitai bot commented Mar 1, 2026

@tesgth032 感谢详尽的更新说明!我已理解本轮收敛的改动:

已确认改进:

  1. 安全权限读取(fail-closed):将涉及 allowGlobalUsageView 的鉴权分支改为强一致 DB 读取(getAllowGlobalUsageViewFromDB())是正确的决策,能有效避免 Redis/pubsub 不可用时 TTL 窗口内的越权风险,符合安全最佳实践。

  2. 使用记录轮询优化(setTimeout 链):从 setInterval 改为 setTimeout 链式调度是推荐做法。这种模式能确保"上一次调用结束"与"下一次调用开始"之间的最小间隔,避免当 tick > interval 时请求堆积,且能根据响应时间动态调整,特别适合 API 轮询场景,能规避网络延迟导致的拥塞和竞态条件。

  3. Providers bootstrap 函数式化:partial success + healthStatus 映射的函数式处理提升了数据流清晰度和健壮性。

  4. 测试对齐与文档:读取实际部署的迁移 SQL (0079) 提高了测试准确性;文档术语与路径说明的统一改善了可维护性。

保持决策的合理性:

  • EXISTS self-heal:利用 request_id UNIQUE 索引的自愈逻辑,能以可控开销处理历史异常,是务实的权衡。
  • special_settings flush:SIGTERM/SIGINT 的 best-effort 处理 + env 可调 delay 已足够覆盖常规场景。
  • pubsub 订阅重试:失败时不置标志位且清空 promise 的设计,确保后续调用自动重试,逻辑自洽。

将恢复自动审查流程。

✅ Actions performed

Reviews resumed.

@tesgth032
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 1, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
src/actions/providers.ts (1)

364-436: 实现良好,可考虑提取健康状态映射逻辑以减少重复。

函数整体实现优秀:

  • 使用 Promise.allSettled 并行获取数据并优雅降级
  • 正确的 admin 权限检查
  • 各子请求失败时有合理的默认值(空 providers、USD 货币、空 health)

健康状态映射逻辑(lines 414-429)与 getProvidersHealthStatus() 中的逻辑(lines 1156-1165)存在重复。可考虑提取为共享的辅助函数以提高可维护性。

♻️ 可选:提取健康状态映射函数
// 在文件顶部或 utils 中定义
function mapHealthStatusToProviderHealthStatusMap(
  healthStatusRaw: Record<string, { 
    circuitState: "closed" | "open" | "half-open";
    failureCount: number;
    lastFailureTime: number | null;
    circuitOpenUntil?: number | null;
  }>,
  now: number = Date.now()
): ProviderHealthStatusMap {
  return Object.fromEntries(
    Object.entries(healthStatusRaw).map(([providerIdRaw, health]) => {
      const providerId = Number(providerIdRaw);
      const circuitOpenUntil = health.circuitOpenUntil ?? null;
      return [
        providerId,
        {
          circuitState: health.circuitState,
          failureCount: health.failureCount,
          lastFailureTime: health.lastFailureTime,
          circuitOpenUntil,
          recoveryMinutes: circuitOpenUntil ? Math.ceil((circuitOpenUntil - now) / 60000) : null,
        },
      ];
    })
  );
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/providers.ts` around lines 364 - 436, The health-status mapping
in getProviderManagerBootstrapData duplicates logic in getProvidersHealthStatus;
extract that mapping into a shared helper (e.g.,
mapHealthStatusToProviderHealthStatusMap) and replace the inline
Object.fromEntries block in getProviderManagerBootstrapData (lines constructing
healthStatus from healthStatusRaw) and the analogous block in
getProvidersHealthStatus to call this helper, accepting healthStatusRaw and
optional now timestamp and returning ProviderHealthStatusMap.
src/actions/my-usage.ts (1)

366-374: 缓存写入可考虑添加 debug 级别日志。

当前 .catch(() => {}) 完全忽略写入错误,虽然对缓存写入来说可以接受,但在排查问题时可能缺少线索。建议在 catch 中添加 debug 级别日志以提升可观测性。

🔧 可选:添加 debug 日志
     if (redis) {
       const cachePayload = {
         ...quota,
         userExpiresAt: quota.userExpiresAt ? quota.userExpiresAt.toISOString() : null,
         expiresAt: quota.expiresAt ? quota.expiresAt.toISOString() : null,
       };
-      await redis.setex(cacheKey, 2, JSON.stringify(cachePayload)).catch(() => {});
+      await redis.setex(cacheKey, 2, JSON.stringify(cachePayload)).catch((err) => {
+        logger.debug("[my-usage] getMyQuota cache write failed", { error: err });
+      });
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/my-usage.ts` around lines 366 - 374, The redis.setex call
currently swallows errors with .catch(() => {}); update the error handling for
the cache write (the setex on redis using cacheKey and cachePayload) to log
failures at debug level instead of ignoring them: catch the error and call the
module logger's debug (or a provided logger) with a descriptive message
including cacheKey and the error, while preserving the short TTL behavior and
not throwing—refer to the redis.setex(...) invocation and the
cachePayload/userExpiresAt/expiresAt usage to locate the change.
tests/unit/usage-ledger/trigger.test.ts (1)

5-9: 建议补一条“migration SQL 与 backfill SQL 关键条件一致性”断言,降低双文件漂移风险。

现在只校验 drizzle/0079...sql。若 src/lib/ledger-backfill/trigger.sql 后续漂移,回填路径与线上迁移函数可能出现语义偏差。建议并行读取两份 SQL,对关键片段做一致性检查(例如 blocked 短路、派生值比较、自愈 EXISTS)。

参考改法
 const sql = readFileSync(
   resolve(process.cwd(), "drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql"),
   "utf-8"
 );
+const backfillSql = readFileSync(
+  resolve(process.cwd(), "src/lib/ledger-backfill/trigger.sql"),
+  "utf-8",
+);
@@
   it("creates function definition", () => {
     expect(sql).toContain("CREATE OR REPLACE FUNCTION fn_upsert_usage_ledger");
   });
+
+  it("keeps critical predicates aligned with backfill trigger SQL", () => {
+    const criticalTokens = [
+      "NEW.blocked_by IS NOT NULL",
+      "v_final_provider_id IS NOT DISTINCT FROM v_old_final_provider_id",
+      "EXISTS (SELECT 1 FROM usage_ledger WHERE request_id = NEW.id)",
+    ];
+    for (const token of criticalTokens) {
+      expect(sql).toContain(token);
+      expect(backfillSql).toContain(token);
+    }
+  });
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/usage-ledger/trigger.test.ts` around lines 5 - 9, Add a second
read of the backfill SQL and assert key-condition consistency: in the
trigger.test.ts test where you already read
drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql into the sql
variable, also read src/lib/ledger-backfill/trigger.sql (e.g. backfillSql) and
perform targeted string/regex assertions that ensure the critical logic matches
(blocked short-circuit/blocked row predicate, derived-value expressions, and any
self-heal EXISTS checks); update test names to reflect the dual-file check and
fail with clear messages if the key fragments differ so the migration SQL and
backfill SQL cannot drift silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql`:
- Around line 24-30: 在把 NEW.provider_chain 最后一个元素的 'id' 字符串转换为 integer 并赋给
v_final_provider_id 之前,先对该字符串的数值范围做校验以防超出 int32 上限(2147483647);具体修复是在现有 "纯数字"
正则检查后先将该字段作为 bigint 或数值读取并判断其值是否在 -2147483648..2147483647(或至少 <= 2147483647 且 >=
0,视业务而定),仅当范围通过时再执行 ::integer 转换,否则按安全策略处理(例如设为 NULL、记录/抛出可识别错误或跳过该字段)以避免在
EXCEPTION WHEN OTHERS 中被静默吞掉;同样在触发器的另一个实现(src/lib/ledger-backfill/trigger.sql
对应位置)做相同范围校验,确保 v_final_provider_id 的赋值总是安全。

In `@src/actions/dashboard-realtime.ts`:
- Around line 276-277: The object construction currently hardcodes display
strings ("Unknown") for the fields model and provider; update the code that sets
model and provider (the properties named model and provider in the item mapping)
to return null (or your agreed non-display sentinel) instead of "Unknown" so the
UI layer can supply i18n text; ensure you change both model: item.originalModel
?? item.model ?? "Unknown" and provider: item.providerName ?? "Unknown" to use
null/ sentinel and run tests to confirm no downstream code expects a display
string.
- Around line 140-141: 当前分布式锁使用固定值 "1"(lockResult/lockAcquired 与
redis.set(lockKey, "1", ... ))并在 finally 中直接删除锁,可能会在 TTL 过期后误删他人锁。改为生成唯一
token(例如 UUID)作为锁值,调用 redis.set(lockKey, token, "EX", lockTtlSeconds, "NX") 并通过
lockAcquired = lockResult === "OK" 判断;在释放锁时不要直接调用 redis.del(lockKey),而是使用 Redis
Lua compare-and-delete 脚本(EVAL)仅当当前值与 token
匹配时才删除,以确保只有持有者能释放锁(定位修改点:redis.set(...) 的调用处和 finally 块中释放锁的代码)。

In `@src/app/`[locale]/dashboard/logs/_components/virtualized-logs-table.tsx:
- Around line 205-231: The current logic computes lastHasMore from oldPages or
latestPage, but when deduped was truncated by MAX_TOTAL_LOGS (i.e., combined
length exceeded that cap) we must force hasMore=true because older logs were
dropped; update the computation around lastHasMore in the function that builds
pages (using variables combined, deduped, MAX_TOTAL_LOGS, lastHasMore and pages)
so that if combined.length > MAX_TOTAL_LOGS (or deduped.length <
combined.length) then lastHasMore is set to true (or OR'ed with true) before
mapping chunks to pages, ensuring the final page's hasMore reflects that more
older data exists.

---

Nitpick comments:
In `@src/actions/my-usage.ts`:
- Around line 366-374: The redis.setex call currently swallows errors with
.catch(() => {}); update the error handling for the cache write (the setex on
redis using cacheKey and cachePayload) to log failures at debug level instead of
ignoring them: catch the error and call the module logger's debug (or a provided
logger) with a descriptive message including cacheKey and the error, while
preserving the short TTL behavior and not throwing—refer to the redis.setex(...)
invocation and the cachePayload/userExpiresAt/expiresAt usage to locate the
change.

In `@src/actions/providers.ts`:
- Around line 364-436: The health-status mapping in
getProviderManagerBootstrapData duplicates logic in getProvidersHealthStatus;
extract that mapping into a shared helper (e.g.,
mapHealthStatusToProviderHealthStatusMap) and replace the inline
Object.fromEntries block in getProviderManagerBootstrapData (lines constructing
healthStatus from healthStatusRaw) and the analogous block in
getProvidersHealthStatus to call this helper, accepting healthStatusRaw and
optional now timestamp and returning ProviderHealthStatusMap.

In `@tests/unit/usage-ledger/trigger.test.ts`:
- Around line 5-9: Add a second read of the backfill SQL and assert
key-condition consistency: in the trigger.test.ts test where you already read
drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql into the sql
variable, also read src/lib/ledger-backfill/trigger.sql (e.g. backfillSql) and
perform targeted string/regex assertions that ensure the critical logic matches
(blocked short-circuit/blocked row predicate, derived-value expressions, and any
self-heal EXISTS checks); update test names to reflect the dual-file check and
fail with clear messages if the key fragments differ so the migration SQL and
backfill SQL cannot drift silently.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9ab60a8 and dc4f60d.

📒 Files selected for processing (20)
  • docs/performance/common-pages-optimization-plan.md
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql
  • drizzle/meta/_journal.json
  • src/actions/dashboard-realtime.ts
  • src/actions/my-usage.ts
  • src/actions/overview.ts
  • src/actions/provider-slots.ts
  • src/actions/providers.ts
  • src/actions/statistics.ts
  • src/app/[locale]/dashboard/leaderboard/page.tsx
  • src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx
  • src/app/[locale]/dashboard/logs/_hooks/use-lazy-filter-options.ts
  • src/app/api/leaderboard/route.ts
  • src/lib/ledger-backfill/trigger.sql
  • src/lib/utils/timezone.ts
  • src/repository/_shared/ledger-conditions.ts
  • src/repository/message-write-buffer.ts
  • src/repository/system-config.ts
  • tests/unit/usage-ledger/trigger.test.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • drizzle/meta/_journal.json
  • src/app/api/leaderboard/route.ts
  • src/app/[locale]/dashboard/logs/_hooks/use-lazy-filter-options.ts
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • docs/performance/common-pages-optimization-plan.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/actions/dashboard-realtime.ts (2)

276-277: ⚠️ Potential issue | 🟠 Major

避免在服务端返回硬编码 "Unknown" 展示文案。

Line [276]-Line [277] 的 "Unknown" 可能直接进入 UI;建议返回 null(或约定 sentinel)并在展示层做 i18n 文案映射。

建议修复(文案下沉到 UI)
 export interface ActivityStreamEntry {
@@
-  model: string;
+  model: string | null;
@@
-  provider: string;
+  provider: string | null;
@@
-          model: item.originalModel ?? item.model ?? "Unknown", // 优先使用计费模型
-          provider: item.providerName ?? "Unknown",
+          model: item.originalModel ?? item.model ?? null, // 展示层负责 i18n fallback
+          provider: item.providerName ?? null,

As per coding guidelines "All user-facing strings must use i18n (5 languages supported: zh-CN, zh-TW, en, ja, ru). Never hardcode display text".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 276 - 277, The object
currently sets model: item.originalModel ?? item.model ?? "Unknown" and
provider: item.providerName ?? "Unknown"; replace the hardcoded "Unknown"
sentinel with null (or a designated non-display sentinel) so the service does
not return UI text, i.e., use null for missing values, and ensure the UI layer
handles presentation/i18n mapping for these fields instead; update any
downstream consumers of model and provider to accept null and perform i18n
lookup/display there.

140-141: ⚠️ Potential issue | 🟠 Major

分布式锁缺少所有权校验,存在误删他人锁风险。

Line [140] 用固定值 "1" 加锁,Line [367] 直接 DEL 释放;当锁过期并被其他实例重获时,当前实例的 finally 可能删掉新锁,导致并发回源放大。

建议修复(唯一 token + compare-and-delete)
+    const lockValue = `${process.pid}-${Date.now()}-${Math.random()}`;
     let lockAcquired = false;
@@
-        const lockResult = await redis.set(lockKey, "1", "EX", lockTtlSeconds, "NX");
+        const lockResult = await redis.set(lockKey, lockValue, "EX", lockTtlSeconds, "NX");
         lockAcquired = lockResult === "OK";
@@
-      if (redis && lockAcquired) {
-        await redis.del(lockKey).catch(() => {});
+      if (redis && lockAcquired) {
+        await redis
+          .eval(
+            "if redis.call('GET', KEYS[1]) == ARGV[1] then return redis.call('DEL', KEYS[1]) else return 0 end",
+            1,
+            lockKey,
+            lockValue,
+          )
+          .catch(() => {});
       }
#!/bin/bash
# 只读核验:确认是否仍存在“固定锁值 + 无所有权释放”
rg -n -C2 'redis\.set\(lockKey,\s*"1",\s*"EX",\s*lockTtlSeconds,\s*"NX"\)|redis\.del\(lockKey\)' src/actions/dashboard-realtime.ts
rg -n -C2 'lockValue|redis\.eval\(' src/actions/dashboard-realtime.ts

Also applies to: 367-367

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 140 - 141, 当前分布式锁使用固定值
"1"(redis.set(lockKey, "1", "EX", lockTtlSeconds, "NX"))并在 finally 直接用
redis.del(lockKey) 释放,会导致过期后误删他人锁;请改为在获取锁时生成唯一随机 token(例如 lockValue 或类似变量)并将该
token 作为 set 的值,保存到作用域供 finally 使用;在释放锁时不要直接 DEL,而是用带比较的原子释放(通过 redis.eval 执行
compare-and-delete Lua 脚本 或使用 Redis 的 SET/GET+WATCH/MULTI 事务保证只有 token
匹配时才删除),确保涉及的标识符包括 lockKey、lockValue、lockTtlSeconds、以及释放处调用的
redis.del/redis.eval/redis.evalsha 都被替换为安全的比较删除实现。
🧹 Nitpick comments (2)
src/repository/message-write-buffer.ts (1)

70-79: 建议将 special_settings 专用 flush 间隔改为独立环境变量

当前 Math.max(flushIntervalMs, 5000) 会把“通用刷新间隔”和“special_settings 专用节流”绑定在一起,线上难以分别调优。建议增加独立 env,再做下限保护。

参考改法
 function loadWriterConfig(): WriterConfig {
   const env = getEnvConfig();
   const flushIntervalMs = env.MESSAGE_REQUEST_ASYNC_FLUSH_INTERVAL_MS ?? 250;
+  const specialSettingsOnlyFlushIntervalMs =
+    env.MESSAGE_REQUEST_ASYNC_SPECIAL_SETTINGS_ONLY_FLUSH_INTERVAL_MS ??
+    Math.max(flushIntervalMs, 5000);
   return {
     flushIntervalMs,
     batchSize: env.MESSAGE_REQUEST_ASYNC_BATCH_SIZE ?? 200,
     maxPending: env.MESSAGE_REQUEST_ASYNC_MAX_PENDING ?? 5000,
-    specialSettingsOnlyFlushIntervalMs: Math.max(flushIntervalMs, 5000),
+    specialSettingsOnlyFlushIntervalMs,
   };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repository/message-write-buffer.ts` around lines 70 - 79, The current
specialSettingsOnlyFlushIntervalMs is tied to flushIntervalMs via
Math.max(flushIntervalMs, 5000), making it impossible to tune independently;
change this to read a new env var (e.g.
MESSAGE_REQUEST_SPECIAL_SETTINGS_FLUSH_INTERVAL_MS) and set
specialSettingsOnlyFlushIntervalMs = Math.max(flushIntervalMs,
Number(env.MESSAGE_REQUEST_SPECIAL_SETTINGS_FLUSH_INTERVAL_MS) ?? 5000) (or
similar) so the special-settings interval can be configured separately while
still enforcing a minimum fallback.
tests/unit/usage-ledger/trigger.test.ts (1)

37-45: 建议补一条“provider_chain.id 范围保护”回归断言。

当前新增断言覆盖了 UPDATE 跳过逻辑,但没有锁定 id -> integer 的上界防护。建议加一个独立断言,避免后续回退为“纯数字即强转”。

建议补充断言
   it("skips irrelevant updates to reduce write amplification", () => {
     expect(sql).toContain("TG_OP = 'UPDATE'");
     expect(sql).toContain("IS NOT DISTINCT FROM");
@@
     expect(sql).toContain("EXISTS (SELECT 1 FROM usage_ledger WHERE request_id = NEW.id)");
   });
+
+  it("guards provider_chain id range before integer cast", () => {
+    expect(sql).toContain("::bigint");
+    expect(sql).toContain("2147483647");
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/usage-ledger/trigger.test.ts` around lines 37 - 45, Add a
regression assertion to the test (in tests/unit/usage-ledger/trigger.test.ts
where the `sql` expectations live) that verifies the SQL includes an explicit
upper-bound check protecting provider_chain.id from unsafe numeric casting —
e.g. assert `sql` contains a substring like `CAST((provider_chain->>'id') AS
integer) <= 2147483647` or a similar `<= 2147483647` guard so future changes
don't revert to an unbounded numeric cast; place it alongside the other
`expect(sql).toContain(...)` assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql`:
- Around line 17-23: The numeric-string check for NEW.provider_chain -> -1 ->>
'id' is insufficient: before casting to integer (assigned to
v_final_provider_id) you must validate the value fits int32 to avoid overflow
and silent UPSERT skips; update the block that assigns v_final_provider_id
(referencing NEW.provider_chain and v_final_provider_id) to first parse or
compare the numeric string against the int32 bounds (e.g. check length or
convert to numeric and ensure <= 2147483647 and >= -2147483648) and only then
perform the ::integer cast, otherwise handle out-of-range values (e.g. skip
assignment or use bigint/raise a controlled notice) so the EXCEPTION handler no
longer swallows an overflow and causes unintended RETURN NEW behavior.

In `@src/actions/providers.ts`:
- Around line 413-426: The computed recoveryMinutes in the healthStatus mapping
can be negative when circuitOpenUntil <= now; update the calculation in the
healthStatus construction (where recoveryMinutes is set) to clamp the value to a
minimum of 0 by computing Math.max(0, Math.ceil((circuitOpenUntil - now) /
60000)) (keep null when circuitOpenUntil is null), ensuring the
ProviderHealthStatusMap entry created in the same block uses that clamped value
for recoveryMinutes.

---

Duplicate comments:
In `@src/actions/dashboard-realtime.ts`:
- Around line 276-277: The object currently sets model: item.originalModel ??
item.model ?? "Unknown" and provider: item.providerName ?? "Unknown"; replace
the hardcoded "Unknown" sentinel with null (or a designated non-display
sentinel) so the service does not return UI text, i.e., use null for missing
values, and ensure the UI layer handles presentation/i18n mapping for these
fields instead; update any downstream consumers of model and provider to accept
null and perform i18n lookup/display there.
- Around line 140-141: 当前分布式锁使用固定值 "1"(redis.set(lockKey, "1", "EX",
lockTtlSeconds, "NX"))并在 finally 直接用 redis.del(lockKey)
释放,会导致过期后误删他人锁;请改为在获取锁时生成唯一随机 token(例如 lockValue 或类似变量)并将该 token 作为 set
的值,保存到作用域供 finally 使用;在释放锁时不要直接 DEL,而是用带比较的原子释放(通过 redis.eval 执行
compare-and-delete Lua 脚本 或使用 Redis 的 SET/GET+WATCH/MULTI 事务保证只有 token
匹配时才删除),确保涉及的标识符包括 lockKey、lockValue、lockTtlSeconds、以及释放处调用的
redis.del/redis.eval/redis.evalsha 都被替换为安全的比较删除实现。

---

Nitpick comments:
In `@src/repository/message-write-buffer.ts`:
- Around line 70-79: The current specialSettingsOnlyFlushIntervalMs is tied to
flushIntervalMs via Math.max(flushIntervalMs, 5000), making it impossible to
tune independently; change this to read a new env var (e.g.
MESSAGE_REQUEST_SPECIAL_SETTINGS_FLUSH_INTERVAL_MS) and set
specialSettingsOnlyFlushIntervalMs = Math.max(flushIntervalMs,
Number(env.MESSAGE_REQUEST_SPECIAL_SETTINGS_FLUSH_INTERVAL_MS) ?? 5000) (or
similar) so the special-settings interval can be configured separately while
still enforcing a minimum fallback.

In `@tests/unit/usage-ledger/trigger.test.ts`:
- Around line 37-45: Add a regression assertion to the test (in
tests/unit/usage-ledger/trigger.test.ts where the `sql` expectations live) that
verifies the SQL includes an explicit upper-bound check protecting
provider_chain.id from unsafe numeric casting — e.g. assert `sql` contains a
substring like `CAST((provider_chain->>'id') AS integer) <= 2147483647` or a
similar `<= 2147483647` guard so future changes don't revert to an unbounded
numeric cast; place it alongside the other `expect(sql).toContain(...)`
assertions.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9ab60a8 and dc4f60d.

📒 Files selected for processing (20)
  • docs/performance/common-pages-optimization-plan.md
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql
  • drizzle/meta/_journal.json
  • src/actions/dashboard-realtime.ts
  • src/actions/my-usage.ts
  • src/actions/overview.ts
  • src/actions/provider-slots.ts
  • src/actions/providers.ts
  • src/actions/statistics.ts
  • src/app/[locale]/dashboard/leaderboard/page.tsx
  • src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx
  • src/app/[locale]/dashboard/logs/_hooks/use-lazy-filter-options.ts
  • src/app/api/leaderboard/route.ts
  • src/lib/ledger-backfill/trigger.sql
  • src/lib/utils/timezone.ts
  • src/repository/_shared/ledger-conditions.ts
  • src/repository/message-write-buffer.ts
  • src/repository/system-config.ts
  • tests/unit/usage-ledger/trigger.test.ts
✅ Files skipped from review due to trivial changes (1)
  • src/repository/_shared/ledger-conditions.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/actions/overview.ts
  • src/app/[locale]/dashboard/leaderboard/page.tsx
  • src/app/[locale]/dashboard/logs/_components/virtualized-logs-table.tsx
  • drizzle/meta/_journal.json
  • src/actions/my-usage.ts

@tesgth032 tesgth032 changed the title perf: 常用页面读路径降负载(系统设置缓存与日志轮询)[未完成] perf: 常用页面读路径降负载(系统设置缓存与日志轮询) Mar 1, 2026
Comment on lines +100 to +107
const cleanup = await subscribeCacheInvalidation(CHANNEL_SYSTEM_SETTINGS_UPDATED, () => {
invalidateSystemSettingsCache();
logger.debug("[SystemSettingsCache] Cache invalidated via pub/sub");
});

if (!cleanup) return;

subscriptionInitialized = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if subscribeCacheInvalidation returns null (Redis connection available but subscription failed), this exits without setting subscriptionInitialized = true

subsequent calls to getCachedSystemSettings() will retry subscription on every cache miss until it succeeds (though subscriptionInitPromise prevents concurrent attempts)

consider setting subscriptionInitialized = true even on failure after a threshold to prevent indefinite retries

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/config/system-settings-cache.ts
Line: 100-107

Comment:
if `subscribeCacheInvalidation` returns `null` (Redis connection available but subscription failed), this exits without setting `subscriptionInitialized = true`

subsequent calls to `getCachedSystemSettings()` will retry subscription on every cache miss until it succeeds (though `subscriptionInitPromise` prevents concurrent attempts)

consider setting `subscriptionInitialized = true` even on failure after a threshold to prevent indefinite retries

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +11 to +22
IF NEW.blocked_by = 'warmup' THEN
-- If a ledger row already exists (row was originally non-warmup), mark it as warmup
UPDATE usage_ledger SET blocked_by = 'warmup' WHERE request_id = NEW.id;
RETURN NEW;
END IF;

IF NEW.blocked_by IS NOT NULL THEN
-- Blocked requests are excluded from billing stats; avoid creating usage_ledger rows.
-- If a ledger row already exists (row was originally unblocked), mark it as blocked.
UPDATE usage_ledger SET blocked_by = NEW.blocked_by WHERE request_id = NEW.id;
RETURN NEW;
END IF;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warmup check executes before general blocked check - handles warmup as special case, then catches all other blocked_by types (rate_limit, blocked_client, etc.)

Prompt To Fix With AI
This is a comment left during a code review.
Path: drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql
Line: 11-22

Comment:
warmup check executes before general blocked check - handles warmup as special case, then catches all other `blocked_by` types (rate_limit, blocked_client, etc.)

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +276 to +282
/**
* 强一致读取 allowGlobalUsageView(权限相关字段不应依赖可过期缓存)
*
* - 失败时 fail-closed 返回 false(避免短时越权)
* - 不会创建默认记录(与 getSystemSettings() 的“自愈创建”行为区分)
*/
export async function getAllowGlobalUsageViewFromDB(): Promise<boolean> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uses strong-consistency direct DB read for allowGlobalUsageView to ensure permission checks don't rely on eventually-consistent cache

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/repository/system-config.ts
Line: 276-282

Comment:
uses strong-consistency direct DB read for `allowGlobalUsageView` to ensure permission checks don't rely on eventually-consistent cache

How can I resolve this? If you propose a fix, please make it concise.

- Providers healthStatus:recoveryMinutes 负值钳制为 0

- System settings cache:订阅失败增加退避,避免 cache miss 反复重试

- Dashboard realtime:Redis 锁改为 token + compare-and-del 安全释放

- usage_ledger trigger:provider_chain.id 加 int32 越界保护,并补回填 SQL 与测试断言
@tesgth032
Copy link
Contributor Author

@coderabbitai review

已在 commit 0aa582e 收敛并修复以下问题:

  • Providers healthStatus:recoveryMinutes 负值钳制为 0(避免展示/计算出现负数)
  • System settings cache:订阅失败增加退避,避免每次 cache miss 反复重试
  • Dashboard realtime:Redis 锁改为 token + compare-and-del 安全释放(避免锁过期后误删他人锁)
  • usage_ledger trigger:provider_chain.id 增加 int32 越界保护(避免 cast 溢出走 EXCEPTION 导致 UPSERT 跳过),并同步更新 backfill SQL + 测试断言

本地已通过:bun run lint:fix && bun run lint && bun run typecheck && bun run test && bun run build

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

@tesgth032 感谢详细的更新说明!已理解 commit 0aa582e 的改进:

已确认的改进点:

  1. Providers healthStatus 负值保护:将 recoveryMinutes 钳制为 0 是必要的防御性编程,能避免时间计算边界情况下展示负数或触发下游异常,提升 UI 健壮性。

  2. System settings cache 订阅退避:为 pubsub 订阅失败增加退避机制是正确的决策。避免每次 cache miss 时反复重试失败的订阅,能有效减轻 Redis/网络压力,符合重试最佳实践(exponential backoff with jitter)。

  3. Dashboard realtime 锁安全释放:使用 token + compare-and-delete 是分布式锁的标准安全模式。防止锁过期后误删其他实例持有的锁,解决了经典的"锁窃取"问题,对于高并发场景至关重要。

  4. usage_ledger trigger int32 越界保护:为 provider_chain.id 增加 int32 边界检查能防止 cast 溢出触发 EXCEPTION 导致 UPSERT 被跳过,确保账单数据完整性。同步更新 backfill SQL 和测试断言保证了实现一致性。

本地检查通过确认了变更的质量。将触发完整审查流程。

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/actions/dashboard-realtime.ts (1)

280-281: ⚠️ Potential issue | 🟠 Major

移除展示层硬编码 "Unknown" 占位文本。

这里的 fallback 文案会直接进入展示数据,建议返回 null(或约定 sentinel),由 UI 层做 i18n 渲染。

建议修改
 export interface ActivityStreamEntry {
@@
-  model: string;
+  model: string | null;
@@
-  provider: string;
+  provider: string | null;
@@
-          model: item.originalModel ?? item.model ?? "Unknown", // 优先使用计费模型
-          provider: item.providerName ?? "Unknown",
+          model: item.originalModel ?? item.model ?? null, // 展示层负责 i18n fallback
+          provider: item.providerName ?? null,

As per coding guidelines "All user-facing strings must use i18n (5 languages supported: zh-CN, zh-TW, en, ja, ru). Never hardcode display text".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 280 - 281, Replace the
hardcoded display fallbacks in the dashboard data (the object keys where model
is set using item.originalModel ?? item.model ?? "Unknown" and provider uses
item.providerName ?? "Unknown") so they return null (or a defined sentinel)
instead of the literal "Unknown"; update the code paths that populate those
fields to return null when no value exists and add a brief comment mentioning
that the UI layer should perform i18n rendering for missing values.
🧹 Nitpick comments (2)
tests/unit/usage-ledger/trigger.test.ts (1)

33-36: 补充 OLD.provider_chain 的范围保护断言。

当前只校验了 NEW 路径,建议同时覆盖 OLD 路径,避免后续改动误删 OLD 分支保护而测试仍通过。

建议修改
   it("guards provider_chain id extraction cast range", () => {
     expect(sql).toContain("2147483647");
     expect(sql).toContain("length(NEW.provider_chain -> -1 ->> 'id') <= 10");
+    expect(sql).toContain("length(OLD.provider_chain -> -1 ->> 'id') <= 10");
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/usage-ledger/trigger.test.ts` around lines 33 - 36, The test
"guards provider_chain id extraction cast range" currently asserts the
NEW.branch check but misses the same protection for OLD.provider_chain; update
the test to also assert that the generated SQL contains the OLD path length
guard by adding an assertion like
expect(sql).toContain("length(OLD.provider_chain -> -1 ->> 'id') <= 10");
(referencing the sql variable and the OLD.provider_chain/NEW.provider_chain
string fragments to locate where to add this expectation).
drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql (1)

17-21: 避免对未变化的 blocked_by 重复 UPDATE。

这里会在 blocked 请求的每次 UPDATE 都写 usage_ledger,即使值没变,仍会产生不必要写入。建议加 IS DISTINCT FROM 条件。并同步到 src/lib/ledger-backfill/trigger.sql 保持一致。

建议修改
-    UPDATE usage_ledger SET blocked_by = NEW.blocked_by WHERE request_id = NEW.id;
+    UPDATE usage_ledger
+    SET blocked_by = NEW.blocked_by
+    WHERE request_id = NEW.id
+      AND blocked_by IS DISTINCT FROM NEW.blocked_by;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql` around lines 17
- 21, The UPDATE in the trigger always writes blocked_by even when the value
hasn't changed; modify the condition so you only run UPDATE usage_ledger SET
blocked_by = NEW.blocked_by WHERE request_id = NEW.id AND (blocked_by IS
DISTINCT FROM NEW.blocked_by) to avoid redundant writes, and apply the same
change to the corresponding trigger implementation in
src/lib/ledger-backfill/trigger.sql to keep behavior consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/actions/dashboard-realtime.ts`:
- Around line 280-281: Replace the hardcoded display fallbacks in the dashboard
data (the object keys where model is set using item.originalModel ?? item.model
?? "Unknown" and provider uses item.providerName ?? "Unknown") so they return
null (or a defined sentinel) instead of the literal "Unknown"; update the code
paths that populate those fields to return null when no value exists and add a
brief comment mentioning that the UI layer should perform i18n rendering for
missing values.

---

Nitpick comments:
In `@drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql`:
- Around line 17-21: The UPDATE in the trigger always writes blocked_by even
when the value hasn't changed; modify the condition so you only run UPDATE
usage_ledger SET blocked_by = NEW.blocked_by WHERE request_id = NEW.id AND
(blocked_by IS DISTINCT FROM NEW.blocked_by) to avoid redundant writes, and
apply the same change to the corresponding trigger implementation in
src/lib/ledger-backfill/trigger.sql to keep behavior consistent.

In `@tests/unit/usage-ledger/trigger.test.ts`:
- Around line 33-36: The test "guards provider_chain id extraction cast range"
currently asserts the NEW.branch check but misses the same protection for
OLD.provider_chain; update the test to also assert that the generated SQL
contains the OLD path length guard by adding an assertion like
expect(sql).toContain("length(OLD.provider_chain -> -1 ->> 'id') <= 10");
(referencing the sql variable and the OLD.provider_chain/NEW.provider_chain
string fragments to locate where to add this expectation).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between dc4f60d and 0aa582e.

📒 Files selected for processing (7)
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql
  • src/actions/dashboard-realtime.ts
  • src/actions/providers.ts
  • src/lib/config/system-settings-cache.ts
  • src/lib/ledger-backfill/trigger.sql
  • tests/unit/usage-ledger/trigger.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/actions/providers.ts

- usage_ledger trigger:blocked_by/warmup 分支 UPDATE 增加 IS DISTINCT FROM,避免重复写入

- trigger SQL 测试:补充 OLD.provider_chain 的长度保护断言
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/actions/dashboard-realtime.ts (1)

280-281: ⚠️ Potential issue | 🟠 Major

将缺省展示值下沉到 UI i18n 层。

Line [280]-Line [281] 仍返回硬编码 "Unknown"。建议这里返回 null(或约定 sentinel),由 UI 统一按 i18n 渲染。

As per coding guidelines "All user-facing strings must use i18n (5 languages supported: zh-CN, zh-TW, en, ja, ru). Never hardcode display text".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 280 - 281, Change the
hardcoded fallback strings to a null sentinel so the UI i18n layer can render
localized labels: replace the fallbacks in the object construction (the
expressions producing model and provider — e.g., model: item.originalModel ??
item.model ?? "Unknown" and provider: item.providerName ?? "Unknown") to return
null (or a agreed sentinel) instead of "Unknown"; keep the existing precedence
(originalModel → model → fallback) but ensure the final fallback is null so the
UI can supply the localized display text.
🧹 Nitpick comments (3)
drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql (1)

149-151: 建议补充异常路径的监控指标,避免静默漏同步长期不可见。

EXCEPTION WHEN OTHERS 后直接 RETURN NEW 的策略能保护主链路,但建议至少按 SQLSTATE/错误类型打点告警,便于快速发现持续性 ledger 漏写。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql` around lines
149 - 151, 在 fn_upsert_usage_ledger 异常处理分支(EXCEPTION WHEN OTHERS)中,除了现有的 RAISE
WARNING 和 RETURN NEW 外,增加上报监控的调用以避免静默漏写:捕获 SQLSTATE 或 SQLERRM/SQLSTATE
的值并通过现有的告警通路上报(例如调用监控函数、写入监控表或使用 pg_notify),并在告警中包含函数名
fn_upsert_usage_ledger、NEW.id 和错误类型/SQLSTATE 以便区分临时错误或持续性失败;保留原有 RAISE
WARNING/RETURN NEW 行为以不影响主链路。
tests/unit/usage-ledger/trigger.test.ts (1)

6-9: 避免将校验对象固定到单一迁移编号。

Line 6-8 直接绑定 0079。后续若出现 008x 再次 CREATE OR REPLACE FUNCTION fn_upsert_usage_ledger,该测试可能继续通过但不再约束“当前生效 SQL”。建议改为选择同前缀最新迁移,或增加“0079 仍是最新相关迁移”的守卫断言。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/usage-ledger/trigger.test.ts` around lines 6 - 9, The test
currently hardcodes the migration file
"0079_perf_usage_ledger_trigger_skip_blocked_rows.sql" when reading SQL for
fn_upsert_usage_ledger; update the test
(tests/unit/usage-ledger/trigger.test.ts) to either (a) dynamically select the
newest migration file matching the prefix/pattern that contains
fn_upsert_usage_ledger (e.g., glob "drizzle/007*_*usage_ledger*.sql" and pick
the latest by name or mtime) before assigning sql, or (b) keep the hardcoded
filename but add an explicit guard assertion that verifies this migration is the
latest relevant migration (so the test fails when a newer 008x migration
exists); reference the variable sql and the function name fn_upsert_usage_ledger
when implementing the change.
src/actions/dashboard-realtime.ts (1)

233-263: 空结果不应按失败级别打 warn

Line [233]-Line [263] 当前把“空数据”与“请求失败”统一记为 warn,低流量时会持续制造噪音。建议仅对 rejected/action failed 记 warn,空数据降级为 debug

建议改法(示例)
-      if (activityStreamResult.status === "rejected" || !activityStreamItems.length) {
+      if (activityStreamResult.status === "rejected") {
         logger.warn("Failed to get activity stream", {
-          reason:
-            activityStreamResult.status === "rejected" ? activityStreamResult.reason : "empty data",
+          reason: activityStreamResult.reason,
         });
+      } else if (!activityStreamItems.length) {
+        logger.debug("Activity stream is empty");
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 233 - 263, The logging
currently treats empty results as warnings; update the checks in the block
handling activityStreamResult, providerSlotsResult and statisticsResult so that
logger.warn is only called when the Promise result status is "rejected" (or an
explicit action failure), and when the result is empty (e.g.,
!activityStreamItems, !providerSlots, !statisticsData) log at debug level
instead (logger.debug) with an "empty data" message; keep the existing reason
payload for rejected cases (e.g., activityStreamResult.reason,
providerSlotsResult.reason, statisticsResult.reason) and ensure
providerSlotsResult and statisticsResult still use the conditional reason
formatting but only trigger warn for rejected/action-failed branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql`:
- Around line 11-21: 在触发器中对 usage_ledger 的 UPDATE 应加上“值变化”条件以避免重复写入:在处理
blocked_by = 'warmup' 分支的 UPDATE(usage_ledger ... WHERE request_id = NEW.id)新增
AND (blocked_by IS DISTINCT FROM 'warmup'),在处理 NEW.blocked_by IS NOT NULL 分支的
UPDATE 新增 AND (blocked_by IS DISTINCT FROM NEW.blocked_by),分别引用 usage_ledger
表、blocked_by 列、NEW.blocked_by 和 request_id 来定位并修改对应的 WHERE 子句。

In `@src/lib/config/system-settings-cache.ts`:
- Around line 202-209: The fallback SystemSettings currently hardcodes siteTitle
("Claude Code Hub"); replace this literal with the centralized default display
value so user-facing text goes through i18n or a shared constant. Locate the
fallback construction where cachedSettings is used (variable names: fallback,
cachedSettings, SystemSettings, getSystemSettings) and swap the hardcoded string
for the appropriate i18n key or shared DEFAULT_SITE_TITLE constant (or a call
that resolves the localized string), ensuring all translations (zh-CN, zh-TW,
en, ja, ru) are used instead of a literal.

---

Duplicate comments:
In `@src/actions/dashboard-realtime.ts`:
- Around line 280-281: Change the hardcoded fallback strings to a null sentinel
so the UI i18n layer can render localized labels: replace the fallbacks in the
object construction (the expressions producing model and provider — e.g., model:
item.originalModel ?? item.model ?? "Unknown" and provider: item.providerName ??
"Unknown") to return null (or a agreed sentinel) instead of "Unknown"; keep the
existing precedence (originalModel → model → fallback) but ensure the final
fallback is null so the UI can supply the localized display text.

---

Nitpick comments:
In `@drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql`:
- Around line 149-151: 在 fn_upsert_usage_ledger 异常处理分支(EXCEPTION WHEN
OTHERS)中,除了现有的 RAISE WARNING 和 RETURN NEW 外,增加上报监控的调用以避免静默漏写:捕获 SQLSTATE 或
SQLERRM/SQLSTATE 的值并通过现有的告警通路上报(例如调用监控函数、写入监控表或使用 pg_notify),并在告警中包含函数名
fn_upsert_usage_ledger、NEW.id 和错误类型/SQLSTATE 以便区分临时错误或持续性失败;保留原有 RAISE
WARNING/RETURN NEW 行为以不影响主链路。

In `@src/actions/dashboard-realtime.ts`:
- Around line 233-263: The logging currently treats empty results as warnings;
update the checks in the block handling activityStreamResult,
providerSlotsResult and statisticsResult so that logger.warn is only called when
the Promise result status is "rejected" (or an explicit action failure), and
when the result is empty (e.g., !activityStreamItems, !providerSlots,
!statisticsData) log at debug level instead (logger.debug) with an "empty data"
message; keep the existing reason payload for rejected cases (e.g.,
activityStreamResult.reason, providerSlotsResult.reason,
statisticsResult.reason) and ensure providerSlotsResult and statisticsResult
still use the conditional reason formatting but only trigger warn for
rejected/action-failed branches.

In `@tests/unit/usage-ledger/trigger.test.ts`:
- Around line 6-9: The test currently hardcodes the migration file
"0079_perf_usage_ledger_trigger_skip_blocked_rows.sql" when reading SQL for
fn_upsert_usage_ledger; update the test
(tests/unit/usage-ledger/trigger.test.ts) to either (a) dynamically select the
newest migration file matching the prefix/pattern that contains
fn_upsert_usage_ledger (e.g., glob "drizzle/007*_*usage_ledger*.sql" and pick
the latest by name or mtime) before assigning sql, or (b) keep the hardcoded
filename but add an explicit guard assertion that verifies this migration is the
latest relevant migration (so the test fails when a newer 008x migration
exists); reference the variable sql and the function name fn_upsert_usage_ledger
when implementing the change.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between dc4f60d and 0aa582e.

📒 Files selected for processing (7)
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql
  • src/actions/dashboard-realtime.ts
  • src/actions/providers.ts
  • src/lib/config/system-settings-cache.ts
  • src/lib/ledger-backfill/trigger.sql
  • tests/unit/usage-ledger/trigger.test.ts

@tesgth032
Copy link
Contributor Author

@coderabbitai review

已按最新 review 补齐(commit 0dfa3e8):

  • usage_ledger trigger:blocked_by/warmup 分支 UPDATE 增加 IS DISTINCT FROM,避免重复写入(同步到 backfill trigger.sql)
  • tests/unit/usage-ledger/trigger.test.ts:补充 OLD.provider_chain 的长度保护断言

本地已通过:bun run lint:fix && bun run lint && bun run typecheck && bun run test && bun run build

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

@tesgth032 感谢详细的更新说明!已理解 commit 0dfa3e8 的改进:

已确认的优化点:

  1. blocked_by/warmup 分支写入优化:在 UPDATE 路径增加 IS DISTINCT FROM 检查是正确的优化。这能避免当 blocked_by 值实际未变化时仍执行 UPDATE,减少不必要的写放大,特别是在高频更新场景下能有效降低数据库负载。

  2. backfill 同步一致性:将该优化同步到 trigger.sql 确保了迁移脚本与 backfill 逻辑的一致性,避免行为差异。

  3. 测试覆盖完善:为 OLD.provider_chain 补充长度保护断言增强了边界条件验证,确保触发器在处理历史数据时的健壮性。

本地检查全部通过确认了变更质量。将触发完整审查流程。

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/lib/ledger-backfill/trigger.sql (1)

1-155: 建议为两份 trigger SQL 增加一致性保护,降低后续漂移风险。

这里与迁移文件里的函数实现几乎同构,但测试已切到迁移 SQL。建议补一个“一致性对比测试”(至少校验关键片段)或改为单一来源生成,避免后续只改一处。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/ledger-backfill/trigger.sql` around lines 1 - 155, The PR has two
nearly identical trigger SQL implementations (fn_upsert_usage_ledger in the
trigger file vs the migration SQL) which can drift; add a consistency safeguard
by either (a) centralizing the SQL generation so both artifacts are produced
from one source, or (b) adding a test that loads both SQL blobs and asserts key
sections are identical (at minimum: the IF warmup/blocked_by early-return logic,
the final_provider_id derivation, the UPDATE-skip predicates block, and the
INSERT ... ON CONFLICT clause) to catch future divergence; locate
fn_upsert_usage_ledger, the INSERT ... ON CONFLICT (request_id) DO UPDATE block,
and the OLD/NEW comparison predicates when implementing the centralization or
the comparison assertions.
tests/unit/usage-ledger/trigger.test.ts (1)

16-19: 建议把 blocked/warmup 分支的“去重写”条件也纳入断言。

当前只校验了 UPDATE 语句存在;若后续误删 IS DISTINCT FROM 条件,这个测试不会失败。建议补两条断言锁住该优化。

建议修改
   it("skips blocked requests to avoid wasted ledger writes", () => {
     expect(sql).toContain("NEW.blocked_by IS NOT NULL");
     expect(sql).toContain("UPDATE usage_ledger SET blocked_by = NEW.blocked_by");
+    expect(sql).toContain("blocked_by IS DISTINCT FROM 'warmup'");
+    expect(sql).toContain("blocked_by IS DISTINCT FROM NEW.blocked_by");
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/usage-ledger/trigger.test.ts` around lines 16 - 19, The test
currently only asserts presence of "NEW.blocked_by IS NOT NULL" and the UPDATE
statement; add assertions that the UPDATE includes the "IS DISTINCT FROM"
deduplication condition for both blocked and warmup branches so the optimization
is locked in; specifically update the test in trigger.test.ts to assert sql
contains the exact "UPDATE usage_ledger SET blocked_by = NEW.blocked_by WHERE
... IS DISTINCT FROM" pattern (and likewise for the warmup UPDATE that sets
warmup_by) so removal of the IS DISTINCT FROM condition would fail the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/lib/ledger-backfill/trigger.sql`:
- Around line 1-155: The PR has two nearly identical trigger SQL implementations
(fn_upsert_usage_ledger in the trigger file vs the migration SQL) which can
drift; add a consistency safeguard by either (a) centralizing the SQL generation
so both artifacts are produced from one source, or (b) adding a test that loads
both SQL blobs and asserts key sections are identical (at minimum: the IF
warmup/blocked_by early-return logic, the final_provider_id derivation, the
UPDATE-skip predicates block, and the INSERT ... ON CONFLICT clause) to catch
future divergence; locate fn_upsert_usage_ledger, the INSERT ... ON CONFLICT
(request_id) DO UPDATE block, and the OLD/NEW comparison predicates when
implementing the centralization or the comparison assertions.

In `@tests/unit/usage-ledger/trigger.test.ts`:
- Around line 16-19: The test currently only asserts presence of "NEW.blocked_by
IS NOT NULL" and the UPDATE statement; add assertions that the UPDATE includes
the "IS DISTINCT FROM" deduplication condition for both blocked and warmup
branches so the optimization is locked in; specifically update the test in
trigger.test.ts to assert sql contains the exact "UPDATE usage_ledger SET
blocked_by = NEW.blocked_by WHERE ... IS DISTINCT FROM" pattern (and likewise
for the warmup UPDATE that sets warmup_by) so removal of the IS DISTINCT FROM
condition would fail the test.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0aa582e and 0dfa3e8.

📒 Files selected for processing (4)
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql
  • src/lib/ledger-backfill/trigger.sql
  • tests/unit/usage-ledger/trigger.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql (1)

143-146: Fail-open 异常处理符合设计意图,建议增强可观测性。

EXCEPTION WHEN OTHERS 确保触发器不会阻塞主写入路径,符合 PR 中"fail-open"的设计决策。RAISE WARNING 提供了基本的日志可见性。

可选改进:考虑在 WARNING 中追加 SQLSTATE 以便区分错误类型(如网络超时 vs 数据异常),便于后续监控告警分类。

 EXCEPTION WHEN OTHERS THEN
-  RAISE WARNING 'fn_upsert_usage_ledger failed for request_id=%: %', NEW.id, SQLERRM;
+  RAISE WARNING 'fn_upsert_usage_ledger failed for request_id=% [%]: %', NEW.id, SQLSTATE, SQLERRM;
   RETURN NEW;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql` around
lines 143 - 146, The EXCEPTION WHEN OTHERS block in the trigger currently logs
only SQLERRM; update the RAISE WARNING call inside the exception handler for
fn_upsert_usage_ledger to also include SQLSTATE so operators can distinguish
error classes (e.g., network timeout vs data errors). Locate the RAISE WARNING
that mentions NEW.id and SQLERRM and append the SQLSTATE value to the log
message (keeping the same fail-open behavior and RETURN NEW).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/ledger-backfill/trigger.sql`:
- Around line 93-97: The current trigger prematurely returns when a row with the
same request_id exists, which prevents the INSERT...ON CONFLICT UPSERT from
repairing partially-created/stale rows (see usage_ledger, request_id, NEW and
the fail-open handling around lines 151-153); change the IF EXISTS check to
verify that the existing row's key ledger columns exactly match the NEW values
(compare the specific columns the UPSERT would set against usage_ledger.* using
NEW.<col>), and only RETURN NEW when the row exists AND all those key columns
are equal; if any key column differs, let the trigger continue to perform the
INSERT...ON CONFLICT so the stale row can be fixed.

---

Nitpick comments:
In `@drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql`:
- Around line 143-146: The EXCEPTION WHEN OTHERS block in the trigger currently
logs only SQLERRM; update the RAISE WARNING call inside the exception handler
for fn_upsert_usage_ledger to also include SQLSTATE so operators can distinguish
error classes (e.g., network timeout vs data errors). Locate the RAISE WARNING
that mentions NEW.id and SQLERRM and append the SQLSTATE value to the log
message (keeping the same fail-open behavior and RETURN NEW).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0aa582e and 0dfa3e8.

📒 Files selected for processing (4)
  • drizzle/0078_perf_usage_ledger_trigger_skip_irrelevant_updates.sql
  • drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql
  • src/lib/ledger-backfill/trigger.sql
  • tests/unit/usage-ledger/trigger.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • drizzle/0079_perf_usage_ledger_trigger_skip_blocked_rows.sql
  • tests/unit/usage-ledger/trigger.test.ts

- DashboardRealtime:空数据降级为 debug;model/provider 缺省值改为 '-'(避免硬编码英文文案)

- usage_ledger trigger 测试:校验使用最新迁移文件,并锁定 blocked_by 去重写条件
@tesgth032
Copy link
Contributor Author

@coderabbitai review

已按 Changes Requested 中的两点收敛(commit bab5f55):

  • DashboardRealtime:空数据不再按 warn 级别刷屏;model/provider 缺省值改为 '-'(避免硬编码英文文案)
  • trigger.test.ts:增加“最新迁移文件守卫”与 blocked_by 去重写(IS DISTINCT FROM)断言

本地已通过:bun run lint:fix && bun run lint && bun run typecheck && bun run test && bun run build

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

@tesgth032 感谢详细的更新说明!已理解 commit bab5f55 的改进:

已确认的优化点:

  1. DashboardRealtime 日志优化:将空数据的日志级别从 warn 降级是正确的决策。空数据在某些时段(如新部署、低流量时段)是正常状态,避免 warn 级别刷屏能减少告警噪音,提升运维体验。

  2. i18n 友好的缺省值:将 model/provider 缺省值从硬编码英文改为 '-' 是良好的国际化实践。符号 '-' 作为"无数据"占位符在各语言环境下语义一致,避免了文案本地化问题。

  3. 测试覆盖增强

    • 最新迁移文件守卫:确保测试始终验证实际部署的迁移 SQL,防止测试与生产逻辑漂移,是关键的回归防护。
    • blocked_by 去重写断言:验证 IS DISTINCT FROM 逻辑能有效测试写放大优化的正确性,确保相同 blocked_by 值不会触发冗余 UPDATE。

本地检查全部通过确认了变更质量。将触发完整审查流程。


🧠 Learnings used
Learnt from: CR
Repo: ding113/claude-code-hub PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-10T17:53:25.074Z
Learning: Before committing, run: `bun run build`, `bun run lint`, `bun run lint:fix`, `bun run typecheck`, and `bun run test`
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/actions/dashboard-realtime.ts (1)

286-287: ⚠️ Potential issue | 🟠 Major

请移除活动流中的硬编码展示占位符。

Line [286]-Line [287] 仍返回 "-",这会把展示文案固化在服务端。建议返回 null(或约定 sentinel)并由 UI 层做 i18n fallback。

建议修复
 export interface ActivityStreamEntry {
@@
-  model: string;
+  model: string | null;
@@
-  provider: string;
+  provider: string | null;
@@
-          model: item.originalModel ?? item.model ?? "-", // 优先使用计费模型
-          provider: item.providerName ?? "-",
+          model: item.originalModel ?? item.model ?? null, // 展示层负责 i18n fallback
+          provider: item.providerName ?? null,

As per coding guidelines "All user-facing strings must use i18n (5 languages supported: zh-CN, zh-TW, en, ja, ru). Never hardcode display text".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 286 - 287, The object
returned in dashboard realtime uses hardcoded display placeholders ("-") for
model and provider, which must be removed; update the code that sets model and
provider (the lines assigning model: item.originalModel ?? item.model ?? "-" and
provider: item.providerName ?? "-") to return null (or a designated sentinel)
instead of the string, so the UI can perform i18n/fallback logic; ensure any
callers of this data (e.g., code that consumes the returned object from this
module) handle null/sentinel values accordingly.
🧹 Nitpick comments (1)
tests/unit/usage-ledger/trigger.test.ts (1)

10-16: 边界情况处理建议:空数组时 .slice(-1)[0] 返回 undefined

如果 drizzle/ 目录中没有匹配 ^\d+_perf_usage_ledger_trigger_.*\.sql$ 的文件,latest 会变成 undefined,测试失败时的错误信息不够直观。

♻️ 可选改进:增加空数组断言
   it("uses latest relevant migration file", () => {
-    const latest = readdirSync(resolve(process.cwd(), "drizzle"))
+    const migrations = readdirSync(resolve(process.cwd(), "drizzle"))
       .filter((name) => /^\d+_perf_usage_ledger_trigger_.*\.sql$/.test(name))
-      .sort()
-      .slice(-1)[0];
+      .sort();
 
+    expect(migrations.length).toBeGreaterThan(0);
+    const latest = migrations.at(-1);
     expect(latest).toBe(MIGRATION_FILENAME);
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/usage-ledger/trigger.test.ts` around lines 10 - 16, The test
builds `latest` by filtering files from `readdirSync(...)` and then using
`.slice(-1)[0]`, which yields `undefined` when the filtered array is empty;
update the test to first capture the filtered array (e.g., `const candidates =
...filter(...).sort()`), assert that `candidates.length` is greater than 0 with
a clear message, then derive `latest` from `candidates` (e.g.,
`candidates.slice(-1)[0]` or `candidates.pop()`) before comparing to
`MIGRATION_FILENAME`, so the failure clearly indicates a missing migration
rather than an `undefined` value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/actions/dashboard-realtime.ts`:
- Around line 286-287: The object returned in dashboard realtime uses hardcoded
display placeholders ("-") for model and provider, which must be removed; update
the code that sets model and provider (the lines assigning model:
item.originalModel ?? item.model ?? "-" and provider: item.providerName ?? "-")
to return null (or a designated sentinel) instead of the string, so the UI can
perform i18n/fallback logic; ensure any callers of this data (e.g., code that
consumes the returned object from this module) handle null/sentinel values
accordingly.

---

Nitpick comments:
In `@tests/unit/usage-ledger/trigger.test.ts`:
- Around line 10-16: The test builds `latest` by filtering files from
`readdirSync(...)` and then using `.slice(-1)[0]`, which yields `undefined` when
the filtered array is empty; update the test to first capture the filtered array
(e.g., `const candidates = ...filter(...).sort()`), assert that
`candidates.length` is greater than 0 with a clear message, then derive `latest`
from `candidates` (e.g., `candidates.slice(-1)[0]` or `candidates.pop()`) before
comparing to `MIGRATION_FILENAME`, so the failure clearly indicates a missing
migration rather than an `undefined` value.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0dfa3e8 and bab5f55.

📒 Files selected for processing (2)
  • src/actions/dashboard-realtime.ts
  • tests/unit/usage-ledger/trigger.test.ts

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/actions/dashboard-realtime.ts (1)

358-365: 合并重复的缓存写回分支,避免一支路静默吞错。

当前两个分支都执行同一条 setex,且 else 分支在 Line [364] 额外吞错,导致失败不可观测。建议单路径写入,并在日志中带上 lockAcquired

建议修改
       if (redis) {
         try {
-          // 尽量只由“拿到锁”的实例写缓存;未拿到锁时也允许 best-effort 写入兜底(防止锁持有者计算失败)。
-          if (lockAcquired) {
-            await redis.setex(cacheKey, cacheTtlSeconds, JSON.stringify(data));
-          } else {
-            await redis.setex(cacheKey, cacheTtlSeconds, JSON.stringify(data)).catch(() => {});
-          }
+          // 未拿到锁时仍允许 best-effort 写入兜底(防止锁持有者计算失败)
+          await redis.setex(cacheKey, cacheTtlSeconds, JSON.stringify(data));
         } catch (error) {
-          logger.warn("[DashboardRealtime] Cache write failed", { cacheKey, error });
+          logger.warn("[DashboardRealtime] Cache write failed", {
+            cacheKey,
+            lockAcquired,
+            error,
+          });
         }
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 358 - 365, Merge the
duplicate cache write branches so there is a single call to
redis.setex(cacheKey, cacheTtlSeconds, JSON.stringify(data)) instead of separate
if/else paths; wrap that single call in a try/catch so errors are not silently
swallowed and, on failure, log the error along with the lockAcquired flag and
relevant context (cacheKey/cacheTtlSeconds) to make failures observable. Target
the code that currently checks lockAcquired and calls redis.setex, replacing the
two branches with one try/catch around redis.setex while preserving the same
TTL/key/data arguments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/actions/dashboard-realtime.ts`:
- Around line 200-202: 当前日志在 overviewResult.status === "fulfilled" 且 !ok 时始终写
"Unknown error",导致丢失业务错误信息;请修改计算 errorReason(在使用 overviewResult、ok、logger.error
的代码块内)为:当 overviewResult.status === "rejected" 时使用 overviewResult.reason,否则如果
overviewResult.status === "fulfilled" 且 overviewResult.value?.error
存在则记录该业务错误(例如 overviewResult.value.error 或其 message/summary),最后回退到 "Unknown
error";将该 errorReason 传给 logger.error 的 metadata,以便同时记录请求平台错误和业务错误。
- Line 290: The assignment using parseFloat(item.costUsd ?? "0") can produce NaN
for empty or non-numeric strings; update the code in
src/actions/dashboard-realtime.ts where you build the cost field (the object
using item and item.costUsd) to parse the value into a number (e.g., via
parseFloat or Number), then guard with Number.isFinite(parsed) and fallback to 0
if invalid; ensure the object’s cost property always receives a finite number
(use the parsed value when finite, otherwise 0).

---

Nitpick comments:
In `@src/actions/dashboard-realtime.ts`:
- Around line 358-365: Merge the duplicate cache write branches so there is a
single call to redis.setex(cacheKey, cacheTtlSeconds, JSON.stringify(data))
instead of separate if/else paths; wrap that single call in a try/catch so
errors are not silently swallowed and, on failure, log the error along with the
lockAcquired flag and relevant context (cacheKey/cacheTtlSeconds) to make
failures observable. Target the code that currently checks lockAcquired and
calls redis.setex, replacing the two branches with one try/catch around
redis.setex while preserving the same TTL/key/data arguments.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0dfa3e8 and bab5f55.

📒 Files selected for processing (2)
  • src/actions/dashboard-realtime.ts
  • tests/unit/usage-ledger/trigger.test.ts

- 记录 overview action 的业务错误信息(避免 Unknown error 掩盖原因)

- costUsd 解析增加 finite guard,避免 NaN

- 合并缓存写回分支并在失败日志附带 lockAcquired
@tesgth032
Copy link
Contributor Author

@coderabbitai review

已按最新 CHANGES_REQUESTED 收敛(commit 160930e):

  • overviewResult fulfilled 但 ok=false 时:errorReason 记录业务 error(不再固定 Unknown error)
  • costUsd 解析增加 Number.isFinite guard,避免 NaN
  • 合并缓存写回 setex 分支:不再静默吞错,失败日志附带 lockAcquired/cacheTtlSeconds

本地已通过:bun run lint:fix && bun run lint && bun run typecheck && bun run test && bun run build

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

@tesgth032 感谢详细的更新说明!已理解 commit 160930e 的改进:

已确认的优化点:

  1. 业务错误透传:overviewResult fulfilled 但 ok=false 时,将实际的业务 error 记录到 errorReason,替代固定的 "Unknown error" 是正确的改进。这能显著提升问题诊断效率,让日志直接反映失败根因。

  2. costUsd NaN 防护:增加 Number.isFinite guard 是必要的防御性编程。能有效防止解析异常数据(如 InfinityNaN)时污染下游计算和展示,确保数值字段的健壮性。

  3. 缓存写回可观测性:将缓存 setex 失败从静默吞错改为记录日志,并附带 lockAcquiredcacheTtlSeconds 上下文,是重要的运维改进。这能帮助快速识别缓存层问题(如 Redis 连接、权限、TTL 配置),避免生产环境排查盲点。

本地检查全部通过确认了变更质量。将触发完整审查流程。

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/actions/dashboard-realtime.ts (1)

125-129: 建议把缓存/锁等待参数外置为可配置项。

cacheTtlSecondslockTtlSecondslockWaitMslockWaitBudgetMs 目前硬编码,线上压测或突发流量时无法快速调参,建议提供 env 覆盖并保留当前默认值。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 125 - 129, The four timing
constants cacheTtlSeconds, lockTtlSeconds, lockWaitMs, and lockWaitBudgetMs are
hardcoded; make them configurable via environment variables while keeping
current values as defaults: read process.env (or equivalent config loader) for
CACHE_TTL_SECONDS, LOCK_TTL_SECONDS, LOCK_WAIT_MS, LOCK_WAIT_BUDGET_MS, parse
them to numbers and fall back to 2, 5, 50, and 200 respectively, and replace the
current constant assignments with these env-backed values (ensure robust parsing
and handle invalid/missing env by using the defaults).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/actions/dashboard-realtime.ts`:
- Around line 125-129: The four timing constants cacheTtlSeconds,
lockTtlSeconds, lockWaitMs, and lockWaitBudgetMs are hardcoded; make them
configurable via environment variables while keeping current values as defaults:
read process.env (or equivalent config loader) for CACHE_TTL_SECONDS,
LOCK_TTL_SECONDS, LOCK_WAIT_MS, LOCK_WAIT_BUDGET_MS, parse them to numbers and
fall back to 2, 5, 50, and 200 respectively, and replace the current constant
assignments with these env-backed values (ensure robust parsing and handle
invalid/missing env by using the defaults).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between bab5f55 and 160930e.

📒 Files selected for processing (1)
  • src/actions/dashboard-realtime.ts

@tesgth032
Copy link
Contributor Author

@coderabbitai approve

上述 CHANGES_REQUESTED 已逐条处理完成;当前仅剩“缓存/锁等待参数 env 可配置”属于可选 nitpick,为最小化行为变更暂不引入。如需我也可以补 env 覆盖。

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/actions/dashboard-realtime.ts (1)

101-102: 建议避免重复鉴权读取,减少热路径额外 DB 压力

Line [101] 已完成 getAllowGlobalUsageViewFromDB(),但 Line [178] 调用 getOverviewData() 会在其内部再次做 session/权限读取。轮询场景下这会增加不必要的读负载,和本 PR 的降载目标相悖。建议拆分一个“仅取 overview 数据”的内部函数(复用现有权限结果或直接传 userId)。

Also applies to: 178-179

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/actions/dashboard-realtime.ts` around lines 101 - 102, You’re currently
calling getAllowGlobalUsageViewFromDB() and then later calling getOverviewData()
which re-reads session/permissions, causing duplicate DB reads; refactor by
adding a new internal helper (e.g., getOverviewDataWithoutAuth or
getOverviewDataForUser) that accepts the already-resolved permission or
userId/flags (reuse allowGlobalUsageView / canViewGlobalData) so
getOverviewData’s hot path won’t re-query permissions, then update the call site
that currently invokes getOverviewData() (the call around where you compute
canViewGlobalData) to call the new helper or pass the permission flag to avoid
the extra DB session/permission read.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/actions/dashboard-realtime.ts`:
- Around line 101-102: You’re currently calling getAllowGlobalUsageViewFromDB()
and then later calling getOverviewData() which re-reads session/permissions,
causing duplicate DB reads; refactor by adding a new internal helper (e.g.,
getOverviewDataWithoutAuth or getOverviewDataForUser) that accepts the
already-resolved permission or userId/flags (reuse allowGlobalUsageView /
canViewGlobalData) so getOverviewData’s hot path won’t re-query permissions,
then update the call site that currently invokes getOverviewData() (the call
around where you compute canViewGlobalData) to call the new helper or pass the
permission flag to avoid the extra DB session/permission read.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

📥 Commits

Reviewing files that changed from the base of the PR and between bab5f55 and 160930e.

📒 Files selected for processing (1)
  • src/actions/dashboard-realtime.ts

@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

@tesgth032 非常感谢您细致的工作!已确认所有 CHANGES_REQUESTED 项均已完整处理:

已完成的关键改进:

  • ✅ 安全鉴权强一致读取(fail-closed)
  • ✅ 轮询改用 setTimeout 链避免堆积
  • ✅ 分布式锁 token+compare-and-del 安全释放
  • ✅ usage_ledger trigger int32 越界保护与去重写优化
  • ✅ 业务错误透传、NaN 防护、缓存可观测性

关于剩余 nitpick:
缓存/锁等待参数的 env 可配置确实属于可选优化。考虑到:

  1. 当前硬编码值(60s TTL、30s 锁超时等)已经过充分验证
  2. 这是性能优化 PR,最小化行为变更是明智的工程权衡
  3. 未来如需调优可通过独立 PR 添加

您的决策是合理的——在没有明确调优需求前,避免引入额外的配置复杂度。如后续生产环境有针对性调优需求,再补充 env 覆盖也不迟。

将批准此 PR。🎉

✅ Actions performed

Comments resolved and changes approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core area:UI enhancement New feature or request size/XL Extra Large PR (> 1000 lines)

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant