TypeError when string contains certain Unicode Format characters

## Summary

`string-width` 8.1.0 crashes with `TypeError: Expected a code point, got 'undefined'` when processing strings that contain certain Unicode Format characters that are **not** in the `Default_Ignorable_Code_Point` category.

## Environment

- **string-width version**: 8.1.0
- **get-east-asian-width version**: 1.3.0
- **Node.js version**: 22.x / Bun 1.x
- **OS**: macOS, Linux, Windows (all affected)

## Steps to Reproduce

```javascript
import stringWidth from 'string-width';

// These Unicode Format characters cause the crash:
// They are in \p{Format} but NOT in \p{Default_Ignorable_Code_Point}

stringWidth('\u0600'); // ARABIC NUMBER SIGN - CRASH!
stringWidth('\u0601'); // ARABIC SIGN SANAH - CRASH!
stringWidth('\u0602'); // ARABIC FOOTNOTE MARKER - CRASH!
stringWidth('\u0603'); // ARABIC SIGN SAFHA - CRASH!
stringWidth('\u0604'); // ARABIC SIGN SAMVAT - CRASH!
stringWidth('\u0605'); // ARABIC NUMBER MARK ABOVE - CRASH!
stringWidth('\u06DD'); // ARABIC END OF AYAH - CRASH!
stringWidth('\u070F'); // SYRIAC ABBREVIATION MARK - CRASH!
stringWidth('\u0890'); // ARABIC POUND MARK ABOVE - CRASH!
stringWidth('\u0891'); // ARABIC PIASTRE MARK ABOVE - CRASH!
stringWidth('\u08E2'); // ARABIC DISPUTED END OF AYAH - CRASH!
stringWidth('\u110BD'); // KAITHI NUMBER SIGN - CRASH!
stringWidth('\u110CD'); // KAITHI NUMBER SIGN ABOVE - CRASH!

// Real-world example - Arabic text often contains these:
const arabicText = '؀١٢٣'; // Arabic number sign followed by digits
console.log(stringWidth(arabicText)); // CRASH!
```

## Expected Behavior

The function should return `0` for zero-width Format characters, or handle them gracefully without throwing.

## Actual Behavior

```
TypeError: Expected a code point, got `undefined`.
    at validate (node_modules/get-east-asian-width/index.js:5:13)
    at eastAsianWidth (node_modules/get-east-asian-width/index.js:16:2)
    at node_modules/string-width/index.js:82:12
```

## Root Cause Analysis

The bug is a mismatch between `zeroWidthClusterRegex` and `leadingNonPrintingRegex` in `string-width/index.js`.

### The Problem

1. **`zeroWidthClusterRegex`** uses `\p{Default_Ignorable_Code_Point}`:
   ```javascript
   const zeroWidthClusterRegex = /^(?:\p{Default_Ignorable_Code_Point}|\p{Control}|\p{Mark}|\p{Surrogate})+$/v;
   ```

2. **`leadingNonPrintingRegex`** uses `\p{Format}`:
   ```javascript
   const leadingNonPrintingRegex = /^[\p{Default_Ignorable_Code_Point}\p{Control}\p{Format}\p{Mark}\p{Surrogate}]+/v;
   ```

3. **The gap**: Some Unicode `\p{Format}` characters are **NOT** in `\p{Default_Ignorable_Code_Point}`. These include:
   - U+0600-U+0605 (Arabic number signs)
   - U+06DD (Arabic end of ayah)
   - U+070F (Syriac abbreviation mark)
   - U+0890-U+0891 (Arabic pound/piastre marks)
   - U+08E2 (Arabic disputed end of ayah)
   - U+110BD, U+110CD (Kaithi number signs)
   - And others...

4. **The fatal sequence** (lines 66-73 in index.js):
   ```javascript
   for (const {segment} of segmenter.segment(string)) {
       // Zero-width / non-printing clusters
       if (isZeroWidthCluster(segment)) {
           continue;  // ❌ Does NOT catch \u0600 (not Default_Ignorable)
       }
       // ...
       const codePoint = baseVisible(segment).codePointAt(0);
       //                 ↑ Strips \u0600 via \p{Format}, returns ''
       //                                      ↑ Returns undefined for ''
       width += eastAsianWidth(codePoint);  // 💥 CRASH!
   }
   ```

### Why these specific characters?

The Unicode Standard defines `Default_Ignorable_Code_Point` as characters that should be ignored in rendering if not supported. However, some Format characters like Arabic number signs are **not** default-ignorable because they carry semantic meaning in certain contexts (e.g., Quranic text formatting).

The `leadingNonPrintingRegex` correctly includes `\p{Format}` to strip these characters, but `zeroWidthClusterRegex` doesn't use `\p{Format}`, creating a gap where a segment consisting only of these Format characters passes the zero-width check but gets completely stripped by `baseVisible()`.

## Suggested Fix

There are two possible fixes:

### Option 1: Guard against empty string after `baseVisible()` (Minimal fix)

```javascript
// In string-width/index.js, after baseVisible() call:
const base = baseVisible(segment);
if (base.length === 0) {
    continue; // Skip segments that become empty after stripping ignorables
}
const codePoint = base.codePointAt(0);
```

### Option 2: Expand `isZeroWidthCluster()` to catch all Format characters (More thorough)

```javascript
// Add explicit check for Format-only segments before baseVisible():
const isFormatOnlyCluster = segment => /^[\p{Cf}]+$/u.test(segment);

// Then in the loop:
if (isControlCluster(segment) || isZeroWidthCluster(segment) || isFormatOnlyCluster(segment)) {
    continue;
}
```

### Option 3: Fix in `get-east-asian-width` (Defense in depth)

The `validate()` function in `get-east-asian-width` could handle `undefined` gracefully:

```javascript
function validate(codePoint) {
    if (codePoint === undefined || codePoint === null) {
        return; // Allow undefined, caller will handle
    }
    if (!Number.isSafeInteger(codePoint)) {
        throw new TypeError(`Expected a code point, got \`${typeof codePoint}\`.`);
    }
}

export function eastAsianWidth(codePoint, {ambiguousAsWide = false} = {}) {
    validate(codePoint);
    if (codePoint === undefined || codePoint === null) {
        return 1; // Default to narrow width for invalid input
    }
    // ... rest of function
}
```

## Recommended Fix

**Option 1** is the cleanest fix for `string-width` - it's minimal, targeted, and handles the root cause (empty string after stripping ignorables).

## Workaround

Until this is fixed upstream, users can sanitize input before calling `string-width`:

```javascript
// Remove problematic Format characters before measuring
const PROBLEMATIC_FORMAT_CHARS = /[\u061C\u200E\u200F\u202A-\u202E\u2066-\u2069]/g;

function safeStringWidth(str) {
    return stringWidth(str.replace(PROBLEMATIC_FORMAT_CHARS, ''));
}
```

## Impact

This bug affects any application using `string-width` 8.x that processes:
- Text copied from web pages (often contains invisible direction marks)
- Text from PDFs (frequently include formatting characters)
- Internationalized text (RTL languages use bidirectional marks)
- User-generated content (may contain any Unicode characters)

Popular affected packages include:
- `ink` (React for CLI) - crashes during terminal rendering
- `cli-table3`, `boxen`, `ora` - any CLI tool measuring string widths
- `prettier`, `eslint` - when processing files with these characters

## Related Issues

- Similar crash reported in badlogic/pi-mono#371

## Test Cases

```javascript
import stringWidth from 'string-width';
import { describe, it, expect } from 'your-test-framework';

describe('Format characters not in Default_Ignorable_Code_Point', () => {
    // These are the characters that ACTUALLY crash in 8.1.0
    it('should not crash on ARABIC NUMBER SIGN (U+0600)', () => {
        expect(() => stringWidth('\u0600')).not.toThrow();
    });

    it('should not crash on ARABIC SIGN SANAH (U+0601)', () => {
        expect(() => stringWidth('\u0601')).not.toThrow();
    });

    it('should not crash on ARABIC FOOTNOTE MARKER (U+0602)', () => {
        expect(() => stringWidth('\u0602')).not.toThrow();
    });

    it('should not crash on ARABIC SIGN SAFHA (U+0603)', () => {
        expect(() => stringWidth('\u0603')).not.toThrow();
    });

    it('should not crash on ARABIC SIGN SAMVAT (U+0604)', () => {
        expect(() => stringWidth('\u0604')).not.toThrow();
    });

    it('should not crash on ARABIC NUMBER MARK ABOVE (U+0605)', () => {
        expect(() => stringWidth('\u0605')).not.toThrow();
    });

    it('should not crash on ARABIC END OF AYAH (U+06DD)', () => {
        expect(() => stringWidth('\u06DD')).not.toThrow();
    });

    it('should not crash on SYRIAC ABBREVIATION MARK (U+070F)', () => {
        expect(() => stringWidth('\u070F')).not.toThrow();
    });

    it('should not crash on ARABIC POUND MARK ABOVE (U+0890)', () => {
        expect(() => stringWidth('\u0890')).not.toThrow();
    });

    it('should not crash on ARABIC PIASTRE MARK ABOVE (U+0891)', () => {
        expect(() => stringWidth('\u0891')).not.toThrow();
    });

    it('should not crash on ARABIC DISPUTED END OF AYAH (U+08E2)', () => {
        expect(() => stringWidth('\u08E2')).not.toThrow();
    });

    it('should handle Arabic text with number signs', () => {
        // U+0600 followed by Arabic-Indic digits
        expect(() => stringWidth('\u0600\u0661\u0662\u0663')).not.toThrow();
    });
});
```

## Complete List of Affected Characters

All Unicode `\p{Format}` characters that are NOT in `\p{Default_Ignorable_Code_Point}`:

| Code Point | Name | Script |
|------------|------|--------|
| U+0600 | ARABIC NUMBER SIGN | Arabic |
| U+0601 | ARABIC SIGN SANAH | Arabic |
| U+0602 | ARABIC FOOTNOTE MARKER | Arabic |
| U+0603 | ARABIC SIGN SAFHA | Arabic |
| U+0604 | ARABIC SIGN SAMVAT | Arabic |
| U+0605 | ARABIC NUMBER MARK ABOVE | Arabic |
| U+06DD | ARABIC END OF AYAH | Arabic |
| U+070F | SYRIAC ABBREVIATION MARK | Syriac |
| U+0890 | ARABIC POUND MARK ABOVE | Arabic |
| U+0891 | ARABIC PIASTRE MARK ABOVE | Arabic |
| U+08E2 | ARABIC DISPUTED END OF AYAH | Arabic |
| U+110BD | KAITHI NUMBER SIGN | Kaithi |
| U+110CD | KAITHI NUMBER SIGN ABOVE | Kaithi |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TypeError when string contains certain Unicode Format characters #70

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis

The Problem

Why these specific characters?

Suggested Fix

Option 1: Guard against empty string after `baseVisible()` (Minimal fix)

Option 2: Expand `isZeroWidthCluster()` to catch all Format characters (More thorough)

Option 3: Fix in `get-east-asian-width` (Defense in depth)

Recommended Fix

Workaround

Impact

Related Issues

Test Cases

Complete List of Affected Characters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Code Point	Name	Script
U+0600	ARABIC NUMBER SIGN	Arabic
U+0601	ARABIC SIGN SANAH	Arabic
U+0602	ARABIC FOOTNOTE MARKER	Arabic
U+0603	ARABIC SIGN SAFHA	Arabic
U+0604	ARABIC SIGN SAMVAT	Arabic
U+0605	ARABIC NUMBER MARK ABOVE	Arabic
U+06DD	ARABIC END OF AYAH	Arabic
U+070F	SYRIAC ABBREVIATION MARK	Syriac
U+0890	ARABIC POUND MARK ABOVE	Arabic
U+0891	ARABIC PIASTRE MARK ABOVE	Arabic
U+08E2	ARABIC DISPUTED END OF AYAH	Arabic
U+110BD	KAITHI NUMBER SIGN	Kaithi
U+110CD	KAITHI NUMBER SIGN ABOVE	Kaithi

Uh oh!

TypeError when string contains certain Unicode Format characters #70

Description

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis

The Problem

Why these specific characters?

Suggested Fix

Option 1: Guard against empty string after baseVisible() (Minimal fix)

Option 2: Expand isZeroWidthCluster() to catch all Format characters (More thorough)

Option 3: Fix in get-east-asian-width (Defense in depth)

Recommended Fix

Workaround

Impact

Related Issues

Test Cases

Complete List of Affected Characters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Option 1: Guard against empty string after `baseVisible()` (Minimal fix)

Option 2: Expand `isZeroWidthCluster()` to catch all Format characters (More thorough)

Option 3: Fix in `get-east-asian-width` (Defense in depth)