Releases · cometkim/unicode-segmenter

28 Jan 19:46

github-actions

unicode-segmenter@0.15.0

fa2356d

unicode-segmenter@0.15.0 Latest

Latest

Minor Changes

97a871e: Update to Unicode® 17.0.0

Unicode® Standard Annex #29 - Revision 47

Tested with Node.js v25.5.0 (icu 78.2)

Patch Changes

38a37f2: Fix TypeScript Node16 module resolution for CommonJS modules.

More specifically, the "Masquerading as CJS" issue has been fixed by including re-export declaration files.

Due to the library continues to support CommonJS (at least up to v1), this change is necessary and slightly increases the size of node_modules.

Also, pre-bundled files (unicode-segmenter/bundle/*) are included for browsers and miniprograms. They were missing in previous versions due to a path typo in the build script.

Assets 2

29 Dec 00:02

github-actions

unicode-segmenter@0.14.5

a47e430

unicode-segmenter@0.14.5

Patch Changes

9d482aa: Inlined the grapheme boundary checking
to avoid unnecessary function calls in the hotpath and consolidating internal state.

This achieved the runtime perf by 2% and a slight bundle size reduction.
d737dfe: Inlined the InCB=Linker checking for Indic scripts

Assets 2

14 Dec 23:25

github-actions

unicode-segmenter@0.14.4

e50d821

unicode-segmenter@0.14.4

Patch Changes

41a7920: Inlining more Hangul ranges (Hangul Jamo Extended-B) to reduce index memory usage (8.5KB -> 7.4KB)
Slightly improved the bundle size as well.

Assets 2

14 Dec 22:18

github-actions

unicode-segmenter@0.14.3

6c8b4b4

unicode-segmenter@0.14.3

Patch Changes

65c38ce: Move GB9c rule checking to be after the main boundary checking.
To try to avoid unnecessary work as much as possible.

No noticeable changes, but perf seems to be improved by ~2% for most cases.
8b23df9: Two further optimizations:
1. Remove inlined ranges from the data file.
2. Add inlined range: 0xAC00-0xD7A3 (Hangul syllables) can easily be inlined.
The 1 is something I forgot in #104 task, but it was a slight chance.

Btw, the number 2 is a huge finding. It is a pretty extensive range to be newly inlined.
Applying both optimizations significantly reduced the bundle size and memory footprint.
- Size(min): 12,549 bytes -> 6,846 bytes (-45.5%)
- Size(min+gz): 5,314 bytes -> 3,449 bytes (-35.1%)
- Index memory usage: 14,272 bytes -> 8,686 bytes (-39.2%)
Of course, without perf regression.

Assets 2

12 Dec 20:59

github-actions

unicode-segmenter@0.14.2

c06976d

unicode-segmenter@0.14.2

Patch Changes

b7a6e12: Optimizing grapheme break category lookup for better runtime trade-offs.

See issue for the explanation.

With this change, the library's constant memory footprint is reduced from 64 KB to 14 KB without performance regressions.
However, the code size increases slightly due to inlining. It's still relatively small.

Assets 2

05 Dec 15:00

github-actions

unicode-segmenter@0.14.1

ab6ff9b

unicode-segmenter@0.14.1

Patch Changes

ac96013: Removed inefficient optimization code from grapheme segmenter.

The single range cache is barely hit after the entire BMP cache is hit.
So removed it to reduce code size, and to reduce comparison count.

Worth occupying 64KB of linear memory for BMP. It should definitely be acceptable, as it still uses less heap memory size than executing graphemer's uncompressed code.

Assets 2

06 Aug 17:01

github-actions

unicode-segmenter@0.14.0

5269397

unicode-segmenter@0.14.0

Minor Changes

cbd1a07: Deprecated unicode-segmenter/utils entry.

Never used internally anymore. It's too simple, better to inline if needed.

Patch Changes

dbca35f: Improve runtime perf on the Unicode text processing.

By using a precomputed lookup table for the grapheme categries of BMP characters, it improves perf by more than 10% for common cases, even ~30% for some extream cases.

The lookup table consumes an additional 64 KB of memory, which is acceptable for most JavaScript runtime environments.

This optimization is introduced by OpenCode w/ OpenAI's GPT-OSS-120B. It is the second successful attempt at meaningful optimization in this library.
(The first one was the Claude Code w/ Claude Opus 4.0)
782290b: Several minor perf improvements and internal cleanup.

Even with the new optimization paths, the bundle size has barely increased.

Assets 2

30 Jul 20:13

github-actions

unicode-segmenter@0.13.2

5bda67c

unicode-segmenter@0.13.2

Patch Changes

f2018ed: Optimize grapheme segmenter.

By eliminating unnecessary string concatenation, it significantly improved performance when creating large segments. (e.g. Demonic, Hindi, Flags, Skin tones)
Also reduced the memory footprint by internal segment buffer.
fa9d58e: Optimize grapheme cluster boundary checking.

Assets 2

20 Jun 01:22

github-actions

unicode-segmenter@0.13.1

4653a00

unicode-segmenter@0.13.1

Patch Changes

88a22e2: grapheme: improve runtime perf by ~9% for most common use cases

Assets 2

20 May 00:51

github-actions

unicode-segmenter@0.13.0

4b8d1d1

unicode-segmenter@0.13.0

Minor Changes

75492dc: Expose an internal state: _hd;

The first codepoint of a segment, which is often need to be checked its bounds.

For example,

for (const { segment } of graphemeSegments(text)) {
  const cp = segment.codePointAt(0)!;
  // Also need to `!` assertions in TypeScript.
  if (isBMP(cp)) {
    // ...
  }
}

It can be replaced by _hd state. no additional overhead.

Patch Changes

cd63858: Export bundled entries (/bundle/*.js)

Assets 2

Releases: cometkim/unicode-segmenter

unicode-segmenter@0.15.0

Minor Changes

Patch Changes

Uh oh!

unicode-segmenter@0.14.5

Patch Changes

Uh oh!

unicode-segmenter@0.14.4

Patch Changes

Uh oh!

unicode-segmenter@0.14.3

Patch Changes

Uh oh!

unicode-segmenter@0.14.2

Patch Changes

Uh oh!

unicode-segmenter@0.14.1

Patch Changes

Uh oh!

unicode-segmenter@0.14.0

Minor Changes

Patch Changes

Uh oh!

unicode-segmenter@0.13.2

Patch Changes

Uh oh!

unicode-segmenter@0.13.1

Patch Changes

Uh oh!

unicode-segmenter@0.13.0

Minor Changes

Patch Changes

Uh oh!