Skip to content

Releases: cometkim/unicode-segmenter

unicode-segmenter@0.15.0

28 Jan 19:46
fa2356d

Choose a tag to compare

Minor Changes

  • 97a871e: Update to Unicode® 17.0.0

    Unicode® Standard Annex #29 - Revision 47

    Tested with Node.js v25.5.0 (icu 78.2)

Patch Changes

  • 38a37f2: Fix TypeScript Node16 module resolution for CommonJS modules.

    More specifically, the "Masquerading as CJS" issue has been fixed by including re-export declaration files.

    Due to the library continues to support CommonJS (at least up to v1), this change is necessary and slightly increases the size of node_modules.

    Also, pre-bundled files (unicode-segmenter/bundle/*) are included for browsers and miniprograms. They were missing in previous versions due to a path typo in the build script.

unicode-segmenter@0.14.5

29 Dec 00:02
a47e430

Choose a tag to compare

Patch Changes

  • 9d482aa: Inlined the grapheme boundary checking
    to avoid unnecessary function calls in the hotpath and consolidating internal state.

    This achieved the runtime perf by 2% and a slight bundle size reduction.

  • d737dfe: Inlined the InCB=Linker checking for Indic scripts

unicode-segmenter@0.14.4

14 Dec 23:25
e50d821

Choose a tag to compare

Patch Changes

  • 41a7920: Inlining more Hangul ranges (Hangul Jamo Extended-B) to reduce index memory usage (8.5KB -> 7.4KB)
    Slightly improved the bundle size as well.

unicode-segmenter@0.14.3

14 Dec 22:18
6c8b4b4

Choose a tag to compare

Patch Changes

  • 65c38ce: Move GB9c rule checking to be after the main boundary checking.
    To try to avoid unnecessary work as much as possible.

    No noticeable changes, but perf seems to be improved by ~2% for most cases.

  • 8b23df9: Two further optimizations:

    1. Remove inlined ranges from the data file.
    2. Add inlined range: 0xAC00-0xD7A3 (Hangul syllables) can easily be inlined.

    The 1 is something I forgot in #104 task, but it was a slight chance.

    Btw, the number 2 is a huge finding. It is a pretty extensive range to be newly inlined.
    Applying both optimizations significantly reduced the bundle size and memory footprint.

    • Size(min): 12,549 bytes -> 6,846 bytes (-45.5%)
    • Size(min+gz): 5,314 bytes -> 3,449 bytes (-35.1%)
    • Index memory usage: 14,272 bytes -> 8,686 bytes (-39.2%)

    Of course, without perf regression.

unicode-segmenter@0.14.2

12 Dec 20:59
c06976d

Choose a tag to compare

Patch Changes

  • b7a6e12: Optimizing grapheme break category lookup for better runtime trade-offs.

    See issue for the explanation.

    With this change, the library's constant memory footprint is reduced from 64 KB to 14 KB without performance regressions.
    However, the code size increases slightly due to inlining. It's still relatively small.

unicode-segmenter@0.14.1

05 Dec 15:00
ab6ff9b

Choose a tag to compare

Patch Changes

  • ac96013: Removed inefficient optimization code from grapheme segmenter.

    The single range cache is barely hit after the entire BMP cache is hit.
    So removed it to reduce code size, and to reduce comparison count.

    Worth occupying 64KB of linear memory for BMP. It should definitely be acceptable, as it still uses less heap memory size than executing graphemer's uncompressed code.

unicode-segmenter@0.14.0

06 Aug 17:01
5269397

Choose a tag to compare

Minor Changes

  • cbd1a07: Deprecated unicode-segmenter/utils entry.

    Never used internally anymore. It's too simple, better to inline if needed.

Patch Changes

  • dbca35f: Improve runtime perf on the Unicode text processing.

    By using a precomputed lookup table for the grapheme categries of BMP characters, it improves perf by more than 10% for common cases, even ~30% for some extream cases.

    The lookup table consumes an additional 64 KB of memory, which is acceptable for most JavaScript runtime environments.

    This optimization is introduced by OpenCode w/ OpenAI's GPT-OSS-120B. It is the second successful attempt at meaningful optimization in this library.
    (The first one was the Claude Code w/ Claude Opus 4.0)

  • 782290b: Several minor perf improvements and internal cleanup.

    Even with the new optimization paths, the bundle size has barely increased.

unicode-segmenter@0.13.2

30 Jul 20:13
5bda67c

Choose a tag to compare

Patch Changes

  • f2018ed: Optimize grapheme segmenter.

    By eliminating unnecessary string concatenation, it significantly improved performance when creating large segments. (e.g. Demonic, Hindi, Flags, Skin tones)
    Also reduced the memory footprint by internal segment buffer.

  • fa9d58e: Optimize grapheme cluster boundary checking.

unicode-segmenter@0.13.1

20 Jun 01:22
4653a00

Choose a tag to compare

Patch Changes

  • 88a22e2: grapheme: improve runtime perf by ~9% for most common use cases

unicode-segmenter@0.13.0

20 May 00:51
4b8d1d1

Choose a tag to compare

Minor Changes

  • 75492dc: Expose an internal state: _hd;

    The first codepoint of a segment, which is often need to be checked its bounds.

    For example,

    for (const { segment } of graphemeSegments(text)) {
      const cp = segment.codePointAt(0)!;
      // Also need to `!` assertions in TypeScript.
      if (isBMP(cp)) {
        // ...
      }
    }

    It can be replaced by _hd state. no additional overhead.

Patch Changes

  • cd63858: Export bundled entries (/bundle/*.js)