BYVoid · BYVoid · Jan 26, 2026 · Jan 1, 2026 · Jan 26, 2026
diff --git a/.claude/skills/opencc-fix-translation-workflow.md b/.claude/skills/opencc-fix-translation-workflow.md
@@ -0,0 +1,85 @@
+---
+name: opencc-fix-translation-workflow
+description: OpenCC translation fix and complete release workflow
+tags: [opencc, workflow, debugging]
+---
+
+# OpenCC Translation Fix Standard Operating Procedure
+
+This skill describes the complete lifecycle for fixing OpenCC conversion errors (such as "方程式" becoming "方程序"), including core dictionary correction, testing, and verification.
+
+## 1. Problem Diagnosis
+
+When a conversion error is discovered (e.g., A is incorrectly converted to B):
+
+1.  **Search for existing mappings**:
+    Use `grep` to search for the error source in `data/dictionary`.
+    ```bash
+    grep "error_term" data/dictionary/*.txt
+    ```
+2.  **Identify the interference source**:
+    Usually because in Maximum Forward Matching (MaxMatch), a "longer word" contains the target word, or a "shorter word" mapping causes the incorrect result.
+    *Example*: "方程式" is incorrectly converted to "方程序" because the mapping "程式" → "程序" exists, and "方程式" is not defined as a proper noun, causing it to be segmented as "方" + "程式".
+
+## 2. Fix Solution (Explicit Mapping)
+
+If the error originates from segmentation logic (as in the example above), the most robust fix is to **add an Explicit Mapping**.
+
+1.  **Select the correct dictionary file**:
+    - For s2twp and tw2sp: `TWPhrases.txt`
+
+2.  **Add the mapping**:
+    Map the vocabulary to itself to prevent incorrect segmentation or conversion.
+    ```text
+    方程式	方程式
+    ```
+    *Note*: Maintain dictionary alphabetical sorting (if applicable).
+
+## 3. Test-Driven (Test Cases)
+
+Before the modification takes effect, create test cases to ensure the fix and prevent regression.
+
+1.  **Core tests**:
+    Edit `test/testcases/testcases.json`.
+    ```json
+    {
+      "id": "case_XXX",
+      "input": "方程式",
+      "expected": {
+        "tw2sp": "方程式"
+      }
+    }
+    ```
+
+## 4. Build and Verify
+
+OpenCC uses the CMake/Make system to build dictionaries.
+
+1.  **Rebuild dictionaries**:
+    ```bash
+    cd build/dbg  # or your build directory
+    make Dictionaries
+    ```
+    This step regenerates the `.ocd2` binary dictionaries.
+
+2.  **Manual verification**:
+    Test directly using the generated `opencc` tool.
+    ```bash
+    echo "方程式" | ./src/tools/opencc -c root/share/opencc/tw2sp.json
+    # Expected output: 方程式
+    ```
+
+3.  **Automated testing** (optional but recommended):
+    Run `make test` or `ctest`.
+
+
+## 5. Commit
+
+When committing, it is recommended to clearly separate or combine, but must include:
+- Dictionary text file changes (`.txt`)
+- Core test changes (`test/testcases/testcases.json`)
+
+```bash
+git add data/dictionary/TWPhrases.txt test/testcases/testcases.json
+git commit -m "Fix(Dictionary): correct conversion for 'XYZ'"
+```
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,52 @@
+# OpenCC Project Overview
+
+This document compiles the Open Chinese Convert (OpenCC) project information to help quickly familiarize with the code structure, data organization, and accompanying tools.
+
+## Project Overview
+- OpenCC is an open-source Chinese Simplified-Traditional and regional variant conversion tool, supporting Simplified↔Traditional, Hong Kong/Macau/Taiwan regional differences, Japanese Shinjitai/Kyujitai character forms, and other conversion schemes.
+- The project provides a C++ core library, C language interface, command-line tools, as well as Python, Node.js and other language bindings. The dictionary and program are decoupled for easy customization and extension.
+- Main dependencies: `rapidjson` for configuration parsing, `marisa-trie` for high-performance dictionaries (`.ocd2`), optional `Darts` for legacy `.ocd` support.
+
+## Data and Configuration
+- Dictionaries are maintained in `data/dictionary/*.txt`, covering phrases, characters, regional differences, Japanese new characters, and other topic files; converted to `.ocd2` during build for acceleration.
+- Default configurations are located in `data/config/`, such as `s2t.json`, `t2s.json`, `s2tw.json`, etc., defining segmenter types, dictionaries used, and combination methods.
+- `data/scheme` and `data/scripts` provide dictionary compilation scripts and specification validation tools.
+
+### Dictionary Binary Formats: `.ocd` and `.ocd2`
+- `.ocd` (legacy format) has `OPENCCDARTS1` as the file header, with the main body being serialized Darts double-array trie data, combined with `BinaryDict` structure to store key-value offsets and concatenation buffers. Loading process is detailed in `src/DartsDict.cpp` and `src/BinaryDict.cpp`. Commonly used in environments requiring `ENABLE_DARTS` for compatibility.
+- `.ocd2` (default format) has `OPENCC_MARISA_0.2.5` as the file header, followed by `marisa::Trie` data, then uses the `SerializedValues` module to store all candidate value lists. See `src/MarisaDict.cpp`, `src/SerializedValues.cpp` for details. This format is smaller and loads faster (e.g., `NEWS.md` records `STPhrases` reduced from 4.3MB to 924KB).
+- The command-line tool `opencc_dict` supports `text ↔ ocd2` (and optionally `ocd`) conversion. When adding or adjusting dictionaries, first edit `.txt`, then run the tool to generate the target format.
+
+## Development and Testing
+- The top-level build system supports CMake, Bazel, Node.js `binding.gyp`, Python `pyproject.toml`, with cross-platform CI integration.
+- `src/*Test.cpp`, `test/` directories contain Google Test-style unit tests covering dictionary matching, conversion chains, segmentation, and other key logic.
+- Tools `opencc_dict`, `opencc_phrase_extract` (`src/tools/`) help developers convert dictionary formats and extract phrases.
+
+## Ecosystem Bindings
+- Python module is located in `python/`, providing the `OpenCC` class through the C API.
+- Node.js extension is in the `node/` directory, using N-API/Node-API to call the core library.
+- README lists third-party Swift, Java, Go, WebAssembly and other porting projects, showcasing ecosystem breadth.
+
+## Common Customization Steps
+1. Edit or add dictionary entries in `data/dictionary/*.txt`.
+2. Use `opencc_dict` to convert to `.ocd2`.
+3. Copy/modify configuration JSON in `data/config` and specify new dictionary files.
+4. Load custom configuration through `SimpleConverter`, command-line tools, or language bindings to verify results.
+
+> For deeper understanding, read the module documentation in `src/README.md`, or refer to test cases in `test/` to understand conversion chain combinations.
+
+### Common Deviations in Third-Party Implementations (Speculation)
+- **Missing segmentation and conversion chain order**: If `group` configuration or dictionary priority is not restored, compound words may be split apart or overwritten by single characters.
+- **Missing longest prefix logic**: Character-by-character replacement alone will miss idioms and multi-character word results.
+- **Improper UTF-8 handling**: Overlooking multi-byte characters or surrogate pair handling can easily cause offset or truncation issues.
+- **Incomplete dictionaries/configuration**: Missing segmentation dictionaries, regional differences and other `.ocd2` files will result in missing words in output.
+- **Path and loading process differences**: If OpenCC's path search and configuration parsing details are not followed, the actual loaded resources will differ from official ones, naturally leading to different results.
+
+## Further Reading
+
+### Contribution Guide
+- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Complete guide on how to contribute dictionary entries to OpenCC, write test cases, and execute testing procedures.
+
+### Project Documents
+- **[src/README.md](src/README.md)** - Detailed technical documentation for core modules.
+- **[README.md](README.md)** - Project overview, installation and usage guide.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1 @@
+@AGENTS.md
diff --git a/src/README.md b/src/README.md
@@ -1,5 +1,29 @@
 # Source code
 
+## Code Modules and Flow
+1. **Configuration Loading (`src/Config.cpp`)**
+   - Reads JSON configuration (located in `data/config/*.json`), parses segmenter definitions and conversion chains.
+   - Loads different dictionary formats (plain text, `ocd2`, dictionary groups) based on the `type` field, with support for additional search paths.
+   - Creates `Converter` objects that hold segmenters and conversion chains.
+
+2. **Segmentation (`src/MaxMatchSegmentation.cpp`)**
+   - The default segmentation type is `mmseg`, i.e., Maximum Forward Matching.
+   - Performs longest prefix matching using the dictionary, splitting input into `Segments`; unmatched UTF-8 fragments are preserved by character length.
+
+3. **Conversion Chain (`src/ConversionChain.cpp`, `src/Conversion.cpp`)**
+   - The conversion chain is an ordered list of `Conversion` objects, each node relies on a dictionary to replace segments with target values through longest prefix matching.
+   - Supports advanced scenarios like phrase priority, variant character replacement, and multi-stage composition.
+
+4. **Dictionary System**
+   - Abstract interface `Dict` unifies prefix matching, all-prefix matching, and dictionary traversal.
+   - `TextDict` (`.txt`) builds dictionaries from tab-delimited plain text; `MarisaDict` (`.ocd2`) provides high-performance trie structures; `DictGroup` can compose multiple dictionaries into a sequential collection.
+   - `SerializableDict` defines serialization and file loading logic, which command-line tools use to convert between different formats.
+
+5. **API Encapsulation**
+   - `SimpleConverter` (high-level C++ interface) encapsulates `Config + Converter`, providing various overloads for string, pointer buffer, and partial length conversion.
+   - `opencc.h` exposes the C API: `opencc_open`, `opencc_convert_utf8`, etc., for language bindings and command-line reuse.
+   - The command-line program `opencc` (`src/tools/CommandLine.cpp`) demonstrates batch conversion, stream reading, auto-flushing, and same-file input/output handling.
+
 ## Dictionary
 
 ### Interface