-
-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Summary
string-width 8.1.0 crashes with TypeError: Expected a code point, got 'undefined' when processing strings that contain certain Unicode Format characters that are not in the Default_Ignorable_Code_Point category.
Environment
- string-width version: 8.1.0
- get-east-asian-width version: 1.3.0
- Node.js version: 22.x / Bun 1.x
- OS: macOS, Linux, Windows (all affected)
Steps to Reproduce
import stringWidth from 'string-width';
// These Unicode Format characters cause the crash:
// They are in \p{Format} but NOT in \p{Default_Ignorable_Code_Point}
stringWidth('\u0600'); // ARABIC NUMBER SIGN - CRASH!
stringWidth('\u0601'); // ARABIC SIGN SANAH - CRASH!
stringWidth('\u0602'); // ARABIC FOOTNOTE MARKER - CRASH!
stringWidth('\u0603'); // ARABIC SIGN SAFHA - CRASH!
stringWidth('\u0604'); // ARABIC SIGN SAMVAT - CRASH!
stringWidth('\u0605'); // ARABIC NUMBER MARK ABOVE - CRASH!
stringWidth('\u06DD'); // ARABIC END OF AYAH - CRASH!
stringWidth('\u070F'); // SYRIAC ABBREVIATION MARK - CRASH!
stringWidth('\u0890'); // ARABIC POUND MARK ABOVE - CRASH!
stringWidth('\u0891'); // ARABIC PIASTRE MARK ABOVE - CRASH!
stringWidth('\u08E2'); // ARABIC DISPUTED END OF AYAH - CRASH!
stringWidth('\u110BD'); // KAITHI NUMBER SIGN - CRASH!
stringWidth('\u110CD'); // KAITHI NUMBER SIGN ABOVE - CRASH!
// Real-world example - Arabic text often contains these:
const arabicText = '١٢٣'; // Arabic number sign followed by digits
console.log(stringWidth(arabicText)); // CRASH!Expected Behavior
The function should return 0 for zero-width Format characters, or handle them gracefully without throwing.
Actual Behavior
TypeError: Expected a code point, got `undefined`.
at validate (node_modules/get-east-asian-width/index.js:5:13)
at eastAsianWidth (node_modules/get-east-asian-width/index.js:16:2)
at node_modules/string-width/index.js:82:12
Root Cause Analysis
The bug is a mismatch between zeroWidthClusterRegex and leadingNonPrintingRegex in string-width/index.js.
The Problem
-
zeroWidthClusterRegexuses\p{Default_Ignorable_Code_Point}:const zeroWidthClusterRegex = /^(?:\p{Default_Ignorable_Code_Point}|\p{Control}|\p{Mark}|\p{Surrogate})+$/v;
-
leadingNonPrintingRegexuses\p{Format}:const leadingNonPrintingRegex = /^[\p{Default_Ignorable_Code_Point}\p{Control}\p{Format}\p{Mark}\p{Surrogate}]+/v;
-
The gap: Some Unicode
\p{Format}characters are NOT in\p{Default_Ignorable_Code_Point}. These include:- U+0600-U+0605 (Arabic number signs)
- U+06DD (Arabic end of ayah)
- U+070F (Syriac abbreviation mark)
- U+0890-U+0891 (Arabic pound/piastre marks)
- U+08E2 (Arabic disputed end of ayah)
- U+110BD, U+110CD (Kaithi number signs)
- And others...
-
The fatal sequence (lines 66-73 in index.js):
for (const {segment} of segmenter.segment(string)) { // Zero-width / non-printing clusters if (isZeroWidthCluster(segment)) { continue; // ❌ Does NOT catch \u0600 (not Default_Ignorable) } // ... const codePoint = baseVisible(segment).codePointAt(0); // ↑ Strips \u0600 via \p{Format}, returns '' // ↑ Returns undefined for '' width += eastAsianWidth(codePoint); // 💥 CRASH! }
Why these specific characters?
The Unicode Standard defines Default_Ignorable_Code_Point as characters that should be ignored in rendering if not supported. However, some Format characters like Arabic number signs are not default-ignorable because they carry semantic meaning in certain contexts (e.g., Quranic text formatting).
The leadingNonPrintingRegex correctly includes \p{Format} to strip these characters, but zeroWidthClusterRegex doesn't use \p{Format}, creating a gap where a segment consisting only of these Format characters passes the zero-width check but gets completely stripped by baseVisible().
Suggested Fix
There are two possible fixes:
Option 1: Guard against empty string after baseVisible() (Minimal fix)
// In string-width/index.js, after baseVisible() call:
const base = baseVisible(segment);
if (base.length === 0) {
continue; // Skip segments that become empty after stripping ignorables
}
const codePoint = base.codePointAt(0);Option 2: Expand isZeroWidthCluster() to catch all Format characters (More thorough)
// Add explicit check for Format-only segments before baseVisible():
const isFormatOnlyCluster = segment => /^[\p{Cf}]+$/u.test(segment);
// Then in the loop:
if (isControlCluster(segment) || isZeroWidthCluster(segment) || isFormatOnlyCluster(segment)) {
continue;
}Option 3: Fix in get-east-asian-width (Defense in depth)
The validate() function in get-east-asian-width could handle undefined gracefully:
function validate(codePoint) {
if (codePoint === undefined || codePoint === null) {
return; // Allow undefined, caller will handle
}
if (!Number.isSafeInteger(codePoint)) {
throw new TypeError(`Expected a code point, got \`${typeof codePoint}\`.`);
}
}
export function eastAsianWidth(codePoint, {ambiguousAsWide = false} = {}) {
validate(codePoint);
if (codePoint === undefined || codePoint === null) {
return 1; // Default to narrow width for invalid input
}
// ... rest of function
}Recommended Fix
Option 1 is the cleanest fix for string-width - it's minimal, targeted, and handles the root cause (empty string after stripping ignorables).
Workaround
Until this is fixed upstream, users can sanitize input before calling string-width:
// Remove problematic Format characters before measuring
const PROBLEMATIC_FORMAT_CHARS = /[\u061C\u200E\u200F\u202A-\u202E\u2066-\u2069]/g;
function safeStringWidth(str) {
return stringWidth(str.replace(PROBLEMATIC_FORMAT_CHARS, ''));
}Impact
This bug affects any application using string-width 8.x that processes:
- Text copied from web pages (often contains invisible direction marks)
- Text from PDFs (frequently include formatting characters)
- Internationalized text (RTL languages use bidirectional marks)
- User-generated content (may contain any Unicode characters)
Popular affected packages include:
ink(React for CLI) - crashes during terminal renderingcli-table3,boxen,ora- any CLI tool measuring string widthsprettier,eslint- when processing files with these characters
Related Issues
- Similar crash reported in pi crashes on reading specific characters badlogic/pi-mono#371
Test Cases
import stringWidth from 'string-width';
import { describe, it, expect } from 'your-test-framework';
describe('Format characters not in Default_Ignorable_Code_Point', () => {
// These are the characters that ACTUALLY crash in 8.1.0
it('should not crash on ARABIC NUMBER SIGN (U+0600)', () => {
expect(() => stringWidth('\u0600')).not.toThrow();
});
it('should not crash on ARABIC SIGN SANAH (U+0601)', () => {
expect(() => stringWidth('\u0601')).not.toThrow();
});
it('should not crash on ARABIC FOOTNOTE MARKER (U+0602)', () => {
expect(() => stringWidth('\u0602')).not.toThrow();
});
it('should not crash on ARABIC SIGN SAFHA (U+0603)', () => {
expect(() => stringWidth('\u0603')).not.toThrow();
});
it('should not crash on ARABIC SIGN SAMVAT (U+0604)', () => {
expect(() => stringWidth('\u0604')).not.toThrow();
});
it('should not crash on ARABIC NUMBER MARK ABOVE (U+0605)', () => {
expect(() => stringWidth('\u0605')).not.toThrow();
});
it('should not crash on ARABIC END OF AYAH (U+06DD)', () => {
expect(() => stringWidth('\u06DD')).not.toThrow();
});
it('should not crash on SYRIAC ABBREVIATION MARK (U+070F)', () => {
expect(() => stringWidth('\u070F')).not.toThrow();
});
it('should not crash on ARABIC POUND MARK ABOVE (U+0890)', () => {
expect(() => stringWidth('\u0890')).not.toThrow();
});
it('should not crash on ARABIC PIASTRE MARK ABOVE (U+0891)', () => {
expect(() => stringWidth('\u0891')).not.toThrow();
});
it('should not crash on ARABIC DISPUTED END OF AYAH (U+08E2)', () => {
expect(() => stringWidth('\u08E2')).not.toThrow();
});
it('should handle Arabic text with number signs', () => {
// U+0600 followed by Arabic-Indic digits
expect(() => stringWidth('\u0600\u0661\u0662\u0663')).not.toThrow();
});
});Complete List of Affected Characters
All Unicode \p{Format} characters that are NOT in \p{Default_Ignorable_Code_Point}:
| Code Point | Name | Script |
|---|---|---|
| U+0600 | ARABIC NUMBER SIGN | Arabic |
| U+0601 | ARABIC SIGN SANAH | Arabic |
| U+0602 | ARABIC FOOTNOTE MARKER | Arabic |
| U+0603 | ARABIC SIGN SAFHA | Arabic |
| U+0604 | ARABIC SIGN SAMVAT | Arabic |
| U+0605 | ARABIC NUMBER MARK ABOVE | Arabic |
| U+06DD | ARABIC END OF AYAH | Arabic |
| U+070F | SYRIAC ABBREVIATION MARK | Syriac |
| U+0890 | ARABIC POUND MARK ABOVE | Arabic |
| U+0891 | ARABIC PIASTRE MARK ABOVE | Arabic |
| U+08E2 | ARABIC DISPUTED END OF AYAH | Arabic |
| U+110BD | KAITHI NUMBER SIGN | Kaithi |
| U+110CD | KAITHI NUMBER SIGN ABOVE | Kaithi |