Skip to content

fix(logs): improve fake 200 error logs#765

Merged
ding113 merged 26 commits intoding113:refactor/fake-200-error-logsfrom
tesgth032:fix/issue-749-fake-200-html-detection
Feb 15, 2026
Merged

fix(logs): improve fake 200 error logs#765
ding113 merged 26 commits intoding113:refactor/fake-200-error-logsfrom
tesgth032:fix/issue-749-fake-200-html-detection

Conversation

@tesgth032
Copy link
Contributor

@tesgth032 tesgth032 commented Feb 11, 2026

背景(Fixes #749

部分上游在鉴权/风控/WAF/Cloudflare 等场景下会返回 HTTP 200,但 body 实际是错误页/错误 JSON(例如 text/html 的拦截页,或 JSON 中包含非空 error)。这会导致 CCH 将该请求误记为成功,从而污染熔断/故障转移统计与 session 智能绑定(后续重试可能仍复用到同一坏 provider)。

变更

  • 非流式 2xx:对 text/html/application/xhtml+xml、缺失/非法 Content-Length、以及小体积 JSON(Content-Length <= 32KiB)做最多 32KiB 的前缀嗅探:

    • 空 body => 视为失败(触发故障转移/熔断)
    • HTML 文档强信号(<!doctype html / <html)=> FAKE_200_HTML_BODY
    • JSON 顶层 error 非空 / error.message 非空 => FAKE_200_JSON_*
  • SSE 流式:保持“延迟结算”策略,等待流自然结束后对最终内容做同一套检测;若判定为假200则内部按失败结算(不改变客户端已收到的内容/状态码)。

  • 新增:假200 状态码推断(优先级低,仅在已判定假200后触发;不覆盖非200/其它检测)。

    • 用途:将内部结算/熔断/故障转移/会话绑定使用的状态码从“统一 502”修正为更贴近语义的 4xx/5xx;未命中规则则保持 502。
    • 覆盖:
      • 429:rate limit / throttling / retry-after / RESOURCE_EXHAUSTED / Cloudflare Error 1015 / 常见中英文“限流/请求过于频繁”等
      • 401:unauthorized / invalid api key / token expired / UNAUTHENTICATED / 常见中英文“鉴权失败/密钥无效”等
      • 403:forbidden / access denied / region blocked / Cloudflare Error 1020 / 常见中英文“无权限/地区限制”等
      • 402:insufficient credits / billing hard limit / 常见中英文“余额不足/欠费”等
      • 404:model/resource/endpoint not found / 常见中英文“未找到/不存在/模型不存在”等
      • 503/504/500/400/413/415/409/422/408/451:按常见错误页/错误 JSON 关键字做保守推断
  • 日志/UI:当状态码为推断结果时,显著标记为“推测/Inferred”。

    • provider chain 写入 statusCodeInferred=true
    • 技术时间线显示 Status Code (inferred)
    • logs UI 徽标:在状态码旁展示 “推测/Inferred”,LogicTrace 中在 HTTP xxx 后追加 (推测)/(inferred)
  • verboseProviderError:当系统设置开启时,对 fake-200/空响应等上游异常,在错误响应 error.details.upstreamError 中附带更详细诊断信息与上游响应原文片段(仅内存短暂保留,不写入 DB/决策链)。

附带修复(日志可读性)

  • strict endpoint policy:补齐 endpoint_pool_exhausted 的链路归因,避免被去重逻辑吞掉。
  • upstream 404:补齐 resource_not_found 在技术时间线/决策链中的 failure 展示与 i18n。

取舍 / 安全

  • 为避免额外开销,非流式仅读取最多 32KiB;因此“超大 JSON 错误体 + HTTP 200”可能漏检。
  • 状态码推断为启发式:仅在已判定假200后用于内部结算/统计与非流式错误响应;未命中推断规则保持 502。
  • 原文片段仅在 verboseProviderError 开启时回传,且不落库;回传前会做基础脱敏(Bearer/key/JWT/email 等),但仍可能包含敏感信息,仅建议在可信环境启用。

测试

  • npm run lint
  • npm run typecheck
  • npm run test
  • npm run build

Greptile Summary

This PR significantly improves fake-200 error detection and handling to prevent circuit breaker pollution and incorrect session binding when upstream services return HTTP 200 with error content.

Key Changes:

  • Non-streaming fake-200 detection: Inspects response bodies up to 32KiB for HTML documents (DOCTYPE/html tags), empty bodies, and JSON with non-empty error fields. Properly handles both HTML and JSON content types with Content-Length validation.

  • Streaming SSE fake-200 detection: Defers detection until stream completion, analyzes buffered content, and internally settles as failure without changing client-facing status code (already streamed).

  • Status code inference: Heuristically infers more semantic 4xx/5xx codes from error body content (429 for rate limits, 401 for auth failures, 403 for forbidden, 402 for payment issues, 404 for not found, etc.). Falls back to 502 if no pattern matches. Used only for internal circuit breaker/failover decisions.

  • UI/Logging improvements: Status codes marked as "Inferred" with badges in logs UI and (inferred) suffix in technical timelines. Added resource_not_found chain reason for 404s (doesn't trigger circuit breaker).

  • Verbose error details: New verboseProviderError system setting exposes sanitized upstream response snippets in error responses for debugging. Security properly handled via sanitizeErrorTextForDetail() which redacts Bearer tokens, API keys, JWTs, emails, and sensitive key-value pairs.

  • Comprehensive test coverage: 536 lines of new tests covering HTML/JSON detection, status inference, failover behavior, and verbose error details.

  • Complete i18n: All 5 locales (en, zh-CN, zh-TW, ja, ru) have matching translations for detection reasons, inferred badges, and tooltips.

Trade-offs:

  • Large JSON responses (>32KiB) are not inspected to avoid memory/latency overhead
  • 404 errors bypass circuit breaker (treated as resource issues, not provider failures)
  • Status code inference is heuristic and may not always match upstream intent

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Excellent implementation quality with comprehensive test coverage (536 new test lines), proper security handling via sanitization, complete i18n across all locales, and thoughtful architectural decisions. The fake-200 detection logic is well-bounded with conservative heuristics, security concerns from previous reviews have been properly addressed with sanitizeErrorTextForDetail(), and all edge cases are covered by tests.
  • No files require special attention

Important Files Changed

Filename Overview
src/lib/utils/upstream-error-detection.ts Core fake-200 detection logic with HTML/JSON error field checking, status code inference patterns, and sanitization. Well-structured with comprehensive error matchers.
src/app/v1/_lib/proxy/forwarder.ts Implements non-streaming fake-200 detection with body inspection up to 32KiB. Properly handles HTML/JSON content-type checking and Content-Length validation.
src/app/v1/_lib/proxy/response-handler.ts Streaming SSE fake-200 detection after stream completion. Correctly infers status codes, updates circuit breaker, and handles 404 as resource_not_found without circuit breaker impact.
src/app/v1/_lib/proxy/error-handler.ts Verbose error details handling with proper sanitization via sanitizeErrorTextForDetail(). Only exposes sanitized rawBody when verboseProviderError is enabled. Security concerns addressed.
src/app/v1/_lib/proxy/errors.ts ProxyError.getClientSafeMessage() properly maps FAKE_200_* codes to user-friendly messages with sanitized upstream snippets. Adds statusCodeInferred field to upstreamError.
tests/unit/proxy/proxy-forwarder-fake-200-html.test.ts Comprehensive test coverage for fake-200 detection including HTML documents, JSON errors with/without Content-Length, status code inference (429/401/403), and failover behavior.
src/lib/utils/provider-chain-formatter.ts Adds support for resource_not_found reason and formats inferred status codes with proper i18n. Handles timeline display with (inferred) suffix.
src/app/[locale]/dashboard/logs/_components/provider-chain-popover.tsx UI component displaying status code with "Inferred" badge when statusCodeInferred is true. Proper i18n integration and tooltip explanations.

Flowchart

flowchart TD
    A[Upstream Response HTTP 200] --> B{Is Streaming SSE?}
    B -->|No| C[Non-Streaming Path]
    B -->|Yes| D[Streaming Path]
    
    C --> E{Check Content-Type}
    E -->|HTML or Missing Content-Length| F[Clone & Read up to 32KiB]
    E -->|JSON with valid Content-Length ≤32KiB| F
    E -->|Other| G[Pass Through]
    
    F --> H{Detect Fake-200?}
    H -->|HTML DOCTYPE/Tag| I[FAKE_200_HTML_BODY]
    H -->|Empty Body| J[FAKE_200_EMPTY_BODY]
    H -->|JSON error field non-empty| K[FAKE_200_JSON_ERROR_NON_EMPTY]
    H -->|No Error| G
    
    I --> L[Infer Status Code from Body]
    J --> L
    K --> L
    
    L --> M{Keyword Match?}
    M -->|Rate Limit| N[429]
    M -->|Auth Failed| O[401]
    M -->|Forbidden| P[403]
    M -->|Payment| Q[402]
    M -->|Not Found| R[404]
    M -->|No Match| S[502 Default]
    
    N --> T[Trigger Failover]
    O --> T
    P --> T
    Q --> T
    R --> U[Skip Circuit Breaker]
    S --> T
    
    T --> V[Record Circuit Breaker Failure]
    U --> W[Mark resource_not_found]
    V --> X[Update Provider Chain]
    W --> X
    
    D --> Y[Stream to Client]
    Y --> Z{Stream Ends Naturally?}
    Z -->|Yes| AA[Read Buffered Content]
    Z -->|No| AB[Handle as Error]
    
    AA --> AC{Detect Fake-200?}
    AC -->|Yes| AD[Infer Status Code]
    AC -->|No| AE[Record Success]
    
    AD --> AF{Status Code?}
    AF -->|404| AG[Skip Circuit Breaker]
    AF -->|Other| AH[Record Failure]
    
    AH --> AI[Update Provider Chain with Inferred Code]
    AG --> AI
    AE --> AJ[Update Session Binding]
    
    G --> AK[Return Success Response]
Loading

Last reviewed commit: 25978b0

@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

为非流式上游响应添加有限字节读取与解码(32 KiB 上限),新增 HTML 文档嗅探与 BOM 处理,在转发器中将强烈的伪 200 信号(如 FAKE_200_HTML_BODY/FAKE_200_JSON_*)映射为失败以触发故障切换,并扩展相关单元测试与测试桩。

Changes

Cohort / File(s) Summary
配置更新
biome.json
更新 Biome JSON schema URL 至 https://biomejs.dev/schemas/2.3.14/schema.json
核心代理转发器
src/app/v1/_lib/proxy/forwarder.ts
新增非流式响应有限读取与解码(NON_STREAM_BODY_INSPECTION_MAX_BYTESreadResponseTextUpTo())、Content-Type 规范化与 HTML 判断;在无 Content-Length 时做部分体检查并重用检测文本;引入并使用 detectUpstreamErrorFromSseOrJsonText 以识别并对强伪 200(HTML/JSON)抛出 ProxyError 触发故障转移;记录 attemptNumber 至会话链条。
上游错误检测工具
src/lib/utils/upstream-error-detection.ts, src/lib/utils/upstream-error-detection.test.ts
新增 HTML 文档嗅探(BOM 处理、isLikelyHtmlDocument、DOCTYPE/<html> 正则、HTML_DOC_SNIFF_MAX_CHARS)并将 FAKE_200_HTML_BODY 作为优先信号;测试覆盖 BOM、HTML 与混合内容场景。
错误映射与客户端消息
src/app/v1/_lib/proxy/errors.ts
ProxyError.getClientSafeMessage():当 statusCode 为 502 且消息以 FAKE_200_ 开头时,返回统一的客户端安全提示字符串。
提供商故障与转发测试
tests/unit/proxy/proxy-forwarder-fake-200-html.test.ts, tests/unit/proxy/proxy-forwarder-endpoint-audit.test.ts
新增针对伪 200 HTML/JSON/缺失字段的完整单元测试,包含大量依赖模拟与断言提供商切换行为;新增/重复的 endpoint_pool_exhausted 测试确保审计条目不被去重。
探针与断路器测试桩
tests/unit/lib/provider-endpoints/probe.test.ts
测试中扩展了对端点断路器模块的模拟,添加 getEndpointCircuitStateSyncresetEndpointCircuit 的模拟实现。
提供商链本地化与展示
messages/*/provider-chain.json, src/lib/utils/provider-chain-formatter.ts, src/lib/utils/provider-chain-formatter.test.ts, src/app/[locale]/dashboard/.../provider-chain-popover.tsx, LogicTraceTab.tsx
新增 resource_not_found(资源不存在 404)相关本地化键(多语言),在 provider-chain 格式化器与日志/弹窗展示中将 resource_not_foundendpoint_pool_exhausted 视为失败/实际请求,更新相应格式化与测试;若干测试重复插入需注意。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ⚠️ Warning PR标题'fix(logs): improve fake 200 error logs'与主要变更不匹配。实际变更重点是在代理层(forwarder)检测和处理假200错误以触发故障转移,而非仅改进日志显示。 建议修改标题为更准确的描述,如'fix(proxy): detect and handle fake 200 HTML/JSON errors'或'fix(proxy): improve fake 200 error detection and failover',更好地反映核心变更目的。
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed 代码变更全面实现了Issue #749所述的需求:检测假200并触发故障转移,包括HTML文档检测、JSON错误检测、BOM处理及会话绑定修复。
Out of Scope Changes check ✅ Passed PR中所有代码变更都与假200检测、故障转移及相关测试直接相关,包括localization消息文件、日志UI、schema更新等都在合理范围内,无明显超范围变更。
Description check ✅ Passed PR 描述详细阐述了背景、变更内容、测试覆盖和取舍考虑,与代码变更高度相关。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tesgth032, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在解决代理服务中一个关键的可靠性问题:上游服务可能返回HTTP 200状态码,但实际内容却是HTML错误页面或空响应,导致代理层误判为成功。通过引入智能的HTML错误页面检测、增强统一的错误识别工具以及改进空响应的处理逻辑,本变更确保了代理能够准确识别这些“假200”情况,并及时触发故障转移和熔断机制,从而显著提升了系统的稳定性和用户体验。

Highlights

  • 新增假200 HTML检测: 引入了对上游返回的HTTP 200状态码但实际内容为HTML错误页的检测机制,以避免误判为成功响应。
  • 增强错误检测工具: detectUpstreamErrorFromSseOrJsonText 函数现在能够通过识别doctype和html标签来更准确地判断是否为HTML错误文档。
  • 修复空响应错误处理: 确保在非流式响应中,EmptyResponseError 不再被JSON解析的catch块意外捕获,从而正确触发故障转移。
  • 优化内存安全性: 为非流式响应体检查设置了最大字节限制(1 MiB),以防止在处理超大响应时出现内存占用失控的问题。
Changelog
  • biome.json
    • 更新了Biome配置文件的JSON Schema版本。
  • src/app/v1/_lib/proxy/forwarder.ts
    • 引入了detectUpstreamErrorFromSseOrJsonText工具,新增了readResponseTextUpTo函数用于受限读取响应体,并实现了对HTML内容类型和无Content-Length响应体的“假200”检测逻辑,同时修正了EmptyResponseError的捕获行为。
  • src/lib/utils/upstream-error-detection.test.ts
    • 增加了针对HTML假200检测和JSON内容中包含HTML标签但不应误判的单元测试。
  • src/lib/utils/upstream-error-detection.ts
    • 扩展了错误码定义,添加了isLikelyHtmlDocument辅助函数,并在detectUpstreamErrorFromSseOrJsonText中加入了HTML文档的强信号识别逻辑。
  • tests/unit/lib/provider-endpoints/probe.test.ts
    • 为断路器相关的模拟函数添加了getEndpointCircuitStateSync和resetEndpointCircuit。
  • tests/unit/proxy/proxy-forwarder-fake-200-html.test.ts
    • 新增了测试文件,专门用于验证ProxyForwarder在处理假200 HTML响应和缺失内容时故障转移的正确性。
Activity
  • 目前没有人工活动记录。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added bug Something isn't working area:core labels Feb 11, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个 Pull Request 引入了一种机制来检测实际上是 HTML 错误页的“假 200”响应,这对代理的可靠性和故障转移逻辑是一个很大的改进。实现很可靠,包括内存安全的响应体嗅探和添加全面的单元测试。我还注意到一个修复,防止 EmptyResponseError 被吞掉,这是对错误处理的一个关键修正。

我在 src/lib/utils/upstream-error-detection.ts 中有一个建议,可以让 HTML 检测逻辑更健壮,防止潜在的误报。总的来说,这是一个高质量的贡献。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@github-actions github-actions bot added the size/M Medium PR (< 500 lines) label Feb 11, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

No significant issues identified in this PR. The implementation is well-structured and conservative: HTML document detection uses strong signals only (doctype/html tags), the memory-safe body inspection has a sensible 1 MiB cap, and the EmptyResponseError re-throw fix correctly addresses a real bug where missing_content errors were being silently swallowed by the JSON parse catch block.

PR Size: M

  • Lines changed: 521 (499 additions, 22 deletions)
  • Files changed: 6

Review Coverage

  • Logic and correctness - Clean. SSE early-return ensures non-streaming checks are properly guarded. The isHtml || !contentLength inspection condition correctly covers both HTML content-type and missing content-length scenarios. The response.clone() approach preserves the original response for downstream consumption.
  • Security (OWASP Top 10) - Clean. The truncateForDetail and sanitizeErrorTextForDetail functions in upstream-error-detection.ts properly redact sensitive data before including error details.
  • Error handling - Clean. The readResponseTextUpTo empty catch for reader.cancel() follows established codebase patterns (multiple identical patterns in the same file). The EmptyResponseError re-throw fix is a legitimate bug fix.
  • Type safety - Clean. No any usage in new code. Types are properly narrowed.
  • Documentation accuracy - Clean. Comments accurately describe the detection logic and its limitations.
  • Test coverage - Adequate. New tests cover the fake 200 HTML detection and the missing_content swallowing fix. The isLikelyHtmlDocument function has both positive and negative test cases (including a test ensuring JSON containing <html> text is not falsely detected).
  • Code clarity - Good. The readResponseTextUpTo utility is well-separated, the inspection flow is linear and easy to follow.

Automated review by Claude AI

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/app/v1/_lib/proxy/forwarder.ts`:
- Around line 769-775: The ProxyError is currently constructed with the internal
detection code (detected.code) in the isStrongFake200 branch which exposes
internal tokens like "FAKE_200_HTML_BODY" to clients; update the throw in
forwarder.ts so that when isStrongFake200 is true you pass a user-friendly
message (e.g. "Upstream service returned an invalid response") instead of
detected.code while keeping the 502 status and the existing metadata (body,
providerId, providerName), or alternately add a 502-specific safe mapping inside
getClientSafeMessage() to map any 502/FAKE_200_* detector codes to a generic
client message; target symbols: forwarder.ts where isStrongFake200 is checked,
ProxyError constructor invocation, and getClientSafeMessage if choosing the
mapping approach.
🧹 Nitpick comments (2)
src/app/v1/_lib/proxy/forwarder.ts (2)

102-150: readResponseTextUpTo 实现整体合理,有一个小问题值得注意

remaining <= 0(第 122 行)的检查发生在读取到一个新 chunk 之后、但尚未解码之前——这意味着该 chunk 会被静默丢弃。对于启发式嗅探来说这是可接受的,但如果后续有人复用此函数做精确截断可能会产生困惑,建议加一行注释说明。

另外,当 truncated = false(即流正常结束)时,reader 既没有 cancel() 也没有 releaseLock()。由于调用方使用的是 response.clone() 的 body,GC 应能回收,但显式释放更稳健:

建议:正常结束时也释放 reader
   if (truncated) {
     try {
       await reader.cancel();
     } catch {
       // ignore
     }
+  } else {
+    reader.releaseLock();
   }

740-750: !contentLength 且响应体较大时,response.clone() 会导致整个 body 在内存中 tee 化

第 742 行的条件 isHtml || !contentLength 会对所有缺少 Content-Length 的非 SSE 响应触发 response.clone()。虽然 readResponseTextUpTo 在 32 KiB 后会 cancel() 已克隆的分支,但 Response.clone() 底层使用 ReadableStream.tee()——在某些运行时中,被取消分支的背压处理可能导致原始分支缓冲额外数据。

对于正常的 LLM JSON 响应(通常几 KB ~ 几十 KB),这不是问题;但如果上游返回了一个很大的非流式响应(无 Content-Length),可能会有短暂的内存峰值。当前 32 KiB 上限已经做了合理的防护,只是建议在注释中补充说明这一 trade-off,便于后续维护者理解。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

- endpoint_pool_exhausted 写入 attemptNumber,避免被 initial_selection/session_reuse 去重吞掉\n- 决策链/技术时间线补齐 resource_not_found 的失败态与说明\n- 更新 provider-chain i18n 文案并新增单测覆盖
@tesgth032 tesgth032 changed the title fix(proxy): 识别 200+HTML 的假200并触发故障转移 fix(proxy/logs): 假200识别 + endpoint_pool_exhausted/404 错因展示 [未完成] Feb 11, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
messages/ru/provider-chain.json (1)

42-42: ⚠️ Potential issue | 🟡 Minor

俄语 locale 中存在预先存在的语法问题(非本 PR 引入)。

"Пул конечная точкаов исчерпан" 语法不正确,正确的俄语应为 "Пул конечных точек исчерпан"("конечная точка" 的复数属格形式应为 "конечных точек")。同样的拼写错误出现在文件多处(如 endpoint_pool_exhaustedendpointStats*strictBlock* 等行)。建议在后续 PR 中统一修正。

🧹 Nitpick comments (2)
src/lib/utils/provider-chain-formatter.ts (1)

443-482: resource_not_found 时间线块逻辑正确,可考虑抽取与 retry_failed 的公共格式化逻辑。

该块与下方 retry_failed(485-550 行)在 provider 详情格式化、耗时计算、upstreamParsed/upstreamBody 渲染、请求详情附加等方面高度重复,主要区别在于缺少熔断器状态展示和使用不同的 header/note 文案。当前实现功能无误,但如果后续有更多类似 reason 类型加入,建议提取一个通用的 formatProviderErrorBlock 辅助函数以减少重复。

src/lib/utils/provider-chain-formatter.test.ts (1)

278-327: resource_not_found 测试覆盖良好,与现有测试模式一致。

新增测试覆盖了三个核心格式化函数(formatProviderSummaryformatProviderDescriptionformatProviderTimeline)对 resource_not_found 原因的处理,结构清晰,与上方 endpoint_pool_exhausted 的测试组织方式保持一致。

可考虑补充以下场景以增强覆盖率:

  • resource_not_found 与成功重试组合的多 provider 链路(类似 endpoint_pool_exhausted 中的 renders exhausted alongside successful retry in multi-provider chain
  • 缺少 errorDetails.provider 时的降级渲染路径(formatProviderTimeline 中的 else 分支)

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

17 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

- 对 application/json 且 Content-Length<=32KiB 的 2xx 响应也做强信号嗅探\n- 补齐 200+JSON error(带 Content-Length)触发故障转移的回归测试
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

31 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@tesgth032
Copy link
Contributor Author

tesgth032 commented Feb 11, 2026

已按 Greptile 的两条建议修正:

  • forwarder.ts: Content-Length 仅接受纯数字且 safe integer;非法值视为无效,并按“缺失/非法 Content-Length”走 body inspection,避免漏检 HTML 假200/空响应;已补回归单测。
  • errors.ts: FAKE_200_* 的 client safe detail 增加防御性处理(whitespace 归一化 + 200 字符二次截断 + 轻量脱敏),避免未来误把未脱敏大段原文塞进 upstreamError.body 导致泄露。

本地已跑 npm run lint / typecheck / test / build 均通过。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

31 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 11, 2026

Additional Comments (1)

src/lib/utils/upstream-error-detection.ts
Export sanitizeErrorTextForDetail for use in error-handler

This function needs to be exported so error-handler.ts can sanitize rawBody before exposing it to clients via the verboseProviderError feature.

export function sanitizeErrorTextForDetail(text: string): string {
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/utils/upstream-error-detection.ts
Line: 107:107

Comment:
Export `sanitizeErrorTextForDetail` for use in error-handler

This function needs to be exported so `error-handler.ts` can sanitize `rawBody` before exposing it to clients via the `verboseProviderError` feature.

```suggestion
export function sanitizeErrorTextForDetail(text: string): string {
```

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032
Copy link
Contributor Author

tesgth032 commented Feb 11, 2026

补充说明:

  • error.details.upstreamError.rawBody 仅在系统设置 verboseProviderError 开启时返回;默认关闭时仍只返回 client-safe 错误信息(含二次截断 + 轻量脱敏的上游片段)。
  • rawBody 仅在内存中短暂保留,用于本次错误响应排查;不会写入数据库/决策链/日志。
  • 出于安全考虑,rawBody 回传前会做基础脱敏(Bearer/key/JWT/email 等),但仍可能包含敏感信息,仅建议在可信环境启用。
  • 已更新设置页 i18n 文案,提示该模式的风险与适用场景。

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

31 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

31 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 12, 2026

Additional Comments (1)

src/app/v1/_lib/proxy/forwarder.ts
Streaming responses return immediately before fake-200 detection can occur. Detection happens later internally to update circuit breaker stats, but client already received HTTP 200. This is correctly documented in the UI.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/forwarder.ts
Line: 733:741

Comment:
Streaming responses return immediately before fake-200 detection can occur. Detection happens later internally to update circuit breaker stats, but client already received HTTP 200. This is correctly documented in the UI.

How can I resolve this? If you propose a fix, please make it concise.

@tesgth032 tesgth032 changed the title fix(proxy/logs): 假200识别 + endpoint_pool_exhausted/404 错因展示 fix(proxy/logs): 假200识别 + 状态码推断 + endpoint_pool_exhausted/404 错因展示 [未完成] Feb 12, 2026
@tesgth032 tesgth032 changed the title fix(proxy/logs): 假200识别 + 状态码推断 + endpoint_pool_exhausted/404 错因展示 [未完成] fix(proxy/logs): 假200识别 + 状态码推断 + endpoint_pool_exhausted/404 错因展示 Feb 12, 2026
@ding113 ding113 changed the title fix(proxy/logs): 假200识别 + 状态码推断 + endpoint_pool_exhausted/404 错因展示 fix(logs): improve fake 200 error logs Feb 15, 2026
@ding113 ding113 changed the base branch from dev to refactor/fake-200-error-logs February 15, 2026 09:07
@ding113 ding113 merged commit 099cce9 into ding113:refactor/fake-200-error-logs Feb 15, 2026
2 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 15, 2026
ding113 added a commit that referenced this pull request Feb 15, 2026
* fix(proxy): 识别 200+HTML 假200并触发故障转移

* fix(utils): 收紧 HTML 文档识别避免误判

* fix(proxy): 非流式假200补齐强信号 JSON error 检测

* fix(utils): 假200检测兼容 BOM

* perf(proxy): 降低非流式嗅探读取上限

* fix(proxy): 客户端隐藏 FAKE_200_* 内部码

* fix(logs): 补齐 endpoint_pool_exhausted/404 错因展示

- endpoint_pool_exhausted 写入 attemptNumber,避免被 initial_selection/session_reuse 去重吞掉\n- 决策链/技术时间线补齐 resource_not_found 的失败态与说明\n- 更新 provider-chain i18n 文案并新增单测覆盖

* fix(proxy): 非流式 JSON 假200检测覆盖 Content-Length

- 对 application/json 且 Content-Length<=32KiB 的 2xx 响应也做强信号嗅探\n- 补齐 200+JSON error(带 Content-Length)触发故障转移的回归测试

* chore: format code (fix-issue-749-fake-200-html-detection-005fad3)

* fix(i18n): 修正 ru 端点池耗尽文案

- 修正俄语中 endpoint 的复数属格拼写(конечных точек)\n- 不影响 key,仅更新展示文案

* test(formatter): 补齐 resource_not_found 组合场景覆盖

- 覆盖 resource_not_found + retry_success 多供应商链路\n- 覆盖缺少 errorDetails.provider 的降级渲染路径

* fix(proxy): FAKE_200 客户端提示附带脱敏片段

* fix: 改进 FAKE_200 错误原因提示

* fix(proxy): verboseProviderError 回传假200原文

- fake-200/空响应:verboseProviderError 开启时在 error.details 返回详细报告与上游原文(不落库)\n- forwarder: 将检测到的原文片段挂到 ProxyError.upstreamError.rawBody\n- tests: 覆盖 verbose details 与 rawBody 透传

* fix(proxy): 强化 Content-Length 校验与假200片段防泄露

- forwarder: 将非法 Content-Length 视为无效,避免漏检 HTML/空响应\n- errors: FAKE_200 客户端 detail 二次截断 + 轻量脱敏(防御性)\n- tests: 覆盖非法 Content-Length 漏检回归

* docs(proxy): 说明非流式假200检测上限

* docs(settings): 补充 verboseProviderError 安全提示

* fix(proxy): verboseProviderError rawBody 基础脱敏

* chore: format code (fix-issue-749-fake-200-html-detection-b56b790)

* docs(settings): 说明 verboseProviderError 基础脱敏

* fix(proxy/logs): 假200 推断状态码并显著标记

* fix(i18n): 回退 verboseProviderErrorDesc 原始文案

* fix(stream): 404 资源不存在不计入熔断

---------

Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ding <44717411+ding113@users.noreply.github.com>
ding113 added a commit that referenced this pull request Feb 15, 2026
* fix(logs): improve fake 200 error logs (#765)

* fix(proxy): 识别 200+HTML 假200并触发故障转移

* fix(utils): 收紧 HTML 文档识别避免误判

* fix(proxy): 非流式假200补齐强信号 JSON error 检测

* fix(utils): 假200检测兼容 BOM

* perf(proxy): 降低非流式嗅探读取上限

* fix(proxy): 客户端隐藏 FAKE_200_* 内部码

* fix(logs): 补齐 endpoint_pool_exhausted/404 错因展示

- endpoint_pool_exhausted 写入 attemptNumber,避免被 initial_selection/session_reuse 去重吞掉\n- 决策链/技术时间线补齐 resource_not_found 的失败态与说明\n- 更新 provider-chain i18n 文案并新增单测覆盖

* fix(proxy): 非流式 JSON 假200检测覆盖 Content-Length

- 对 application/json 且 Content-Length<=32KiB 的 2xx 响应也做强信号嗅探\n- 补齐 200+JSON error(带 Content-Length)触发故障转移的回归测试

* chore: format code (fix-issue-749-fake-200-html-detection-005fad3)

* fix(i18n): 修正 ru 端点池耗尽文案

- 修正俄语中 endpoint 的复数属格拼写(конечных точек)\n- 不影响 key,仅更新展示文案

* test(formatter): 补齐 resource_not_found 组合场景覆盖

- 覆盖 resource_not_found + retry_success 多供应商链路\n- 覆盖缺少 errorDetails.provider 的降级渲染路径

* fix(proxy): FAKE_200 客户端提示附带脱敏片段

* fix: 改进 FAKE_200 错误原因提示

* fix(proxy): verboseProviderError 回传假200原文

- fake-200/空响应:verboseProviderError 开启时在 error.details 返回详细报告与上游原文(不落库)\n- forwarder: 将检测到的原文片段挂到 ProxyError.upstreamError.rawBody\n- tests: 覆盖 verbose details 与 rawBody 透传

* fix(proxy): 强化 Content-Length 校验与假200片段防泄露

- forwarder: 将非法 Content-Length 视为无效,避免漏检 HTML/空响应\n- errors: FAKE_200 客户端 detail 二次截断 + 轻量脱敏(防御性)\n- tests: 覆盖非法 Content-Length 漏检回归

* docs(proxy): 说明非流式假200检测上限

* docs(settings): 补充 verboseProviderError 安全提示

* fix(proxy): verboseProviderError rawBody 基础脱敏

* chore: format code (fix-issue-749-fake-200-html-detection-b56b790)

* docs(settings): 说明 verboseProviderError 基础脱敏

* fix(proxy/logs): 假200 推断状态码并显著标记

* fix(i18n): 回退 verboseProviderErrorDesc 原始文案

* fix(stream): 404 资源不存在不计入熔断

---------

Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ding <44717411+ding113@users.noreply.github.com>

* fix: add missing import for inferUpstreamErrorStatusCodeFromText

The function inferUpstreamErrorStatusCodeFromText was used in
response-handler.ts but was not imported, causing a TypeScript
compilation error during build.

Fixed:
- Added inferUpstreamErrorStatusCodeFromText to imports from @/lib/utils/upstream-error-detection

CI Run: https://github.com/ding113/claude-code-hub/actions/runs/22033028838

* fix(proxy): deduplicate getFake200ReasonKey and strengthen client-facing sanitization

Extract duplicated getFake200ReasonKey() from SummaryTab and
ProviderChainPopover into a shared fake200-reason.ts utility,
eliminating the risk of silent drift when new FAKE_200_* codes are added.

Replace the 3-pattern manual sanitization in getClientSafeMessage()
with the existing sanitizeErrorTextForDetail() (6 patterns), closing
a gap where JWT tokens, emails, and password/config paths could leak
to clients via the FAKE_200 error detail path.

Add unit tests verifying JWT, email, and password sanitization.

* fix(proxy): address bugbot review comments on fake-200 error handling

- Add i18n for HTTP status prefix in LogicTraceTab (5 languages)
- Wrap verbose details gathering in try-catch to prevent cascading failures
- Truncate rawBody to 4096 chars before sanitization in error-handler
- Tighten not_found regex to require contextual prefixes, preventing false 404 inference
- Add debug logging to silent catch blocks in readResponseTextUpTo
- Add test assertion for fake200DetectedReason display

---------

Co-authored-by: tesgth032 <tesgth032@hotmail.com>
Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core bug Something isn't working size/M Medium PR (< 500 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants