Skip to content

Conversation

Copy link

Copilot AI commented Dec 10, 2025

The LaTeX preprocessor regex was matching currency symbols like $20/月 and $10/月 as LaTeX delimiters, breaking markdown tables containing pricing information.

Changes

Two-pass processing

  • Pass 1: Match display math ($$...$$) and bracket notation (\[...\], \(...\))
  • Pass 2: Match inline math ($...$) with negative lookbehind/lookahead to prevent $$ conflicts

Inline pattern with adjacency guards

// Prevents matching $ that's adjacent to another $
let inlinePattern = "(?<!\\$)\\$(?!\$)([^$\\n]+?)\\$(?!\\$)"

Content validation

  • Reject currency patterns: digits with separators (/, -, ,) and CJK characters (\u4e00-\u9fff)
  • Reject content containing table delimiters (|)
  • Require LaTeX indicators: backslashes, ^, _, or math operators with letters

Example

Before: Currency symbols incorrectly wrapped

| ($20/月) | (最低$10/月) |
→ | (<LaTex>$20/月) | (最低$</LaTex>10/月) |

After: Currency preserved, LaTeX still works

| ($20/月) | (最低$10/月) |
→ | ($20/月) | (最低$10/月) |

$x^2 + y^2 = z^2$ and $$E = mc^2$$
→ <LaTex>$x^2 + y^2 = z^2$</LaTex> and <LaTex>$$E = mc^2$$</LaTex>
Original prompt

This section details on the original issue you should resolve

<issue_title>LaTeX的正则配有问题</issue_title>
<issue_description>```

📝 核心维度深度对比表

维度 ChatGPT DeepSeek 豆包 (Doubao) Midjourney
核心优势 综合能力六边形战士,生态最完善 性价比与推理能力之王,开源技术普惠 语音交互与情感陪伴,移动端体验极其顺滑 图像审美与艺术性全球独一档
典型短板 价格昂贵,国内使用门槛高 多模态生成(画图/视频)目前不是主攻方向 复杂逻辑推理和编程任务处理能力一般 控制精准度低(难以指哪改哪),交互复杂
AI架构 Transformer (Decoder-only)
动态思维链
MoE (Mixture of Experts)
MLA (多头潜在注意力)
MoE (云雀大模型)
针对语音/短文本优化
Diffusion Model
潜在扩散 + CLIP
主要场景 工作流自动化、企业应用、全能问答 程序员辅助、学术研究、低成本API接入 英语口语练习、闲聊、生活助手 创意设计、游戏原画、海报制作
收费模式 免费基础版 / Plus订阅 ($20/月) 完全免费 (Web端) / API超低价 完全免费 必须付费订阅 (最低$10/月)

🔍 2025年市场趋势分析

  1. 架构演进:MoE成为主流
    DeepSeek、豆包、Kimi等头部应用均采用 MoE (混合专家模型) 架构 [^1][^19]。这种架构能在保持高性能的同时大幅降低推理成本,是DeepSeek能够免费开放且响应迅速的关键技术原因。
  2. 交互变革:从Chat到Agent
    Claude和ChatGPT正在推动从单纯的"对话"向"代理 (Agent)"转变。Claude的"Computer Use"功能允许AI直接操作电脑软件,这代表了AI应用的新方向——不仅是生成内容,更是执行任务 [^9]。
  3. 中国力量崛起
    DeepSeek在2025年初日活超越豆包并逼近ChatGPT,标志着国产AI从应用层(如美图、豆包)向底层硬核技术层(推理模型)的全面突破 [^11][^12]。Top 10榜单中国产应用占据半壁江山(DeepSeek, 豆包, Kimi, 可灵, 夸克, 元宝)。
上面这段 markdown 中,"$20/月" 和 "最低$10/月" 中,GMarkPreprocessor将两个$$之间的内容匹配为LaTeX</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits December 10, 2025 18:16
- Implemented two-pass processing: first for display math ($$), then for inline math ($)
- Added negative lookbehind/lookahead to prevent matching $ adjacent to another $
- Added validation to distinguish currency (e.g., $20/月, $10) from actual LaTeX expressions
- Currency patterns with just numbers and separators are now excluded
- LaTeX expressions must contain backslashes, superscripts, subscripts, or math operators with letters
- Added check to reject matches containing table cell delimiters (|)
- All test cases pass including the original issue scenario

Co-authored-by: GIKICoder <12043730+GIKICoder@users.noreply.github.com>
- Remove redundant negative lookbehind in inline pattern regex
- Add documentation comment for CJK Unicode range (\u4e00-\u9fff)
- Fix hyphen placement in character class to avoid escaping issues
- All tests still pass after these improvements

Co-authored-by: GIKICoder <12043730+GIKICoder@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix issues with LaTeX regular expressions Fix LaTeX regex incorrectly matching currency symbols ($20/月, $10/月) Dec 10, 2025
Copilot AI requested a review from GIKICoder December 10, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LaTeX的正则配有问题

2 participants