parse`[that]`

Parser combinators for TypeScript and Rust.

Write your grammar in BBNF (Better Backus-Naur Form) .

Handles left recursion and left factoring. Performance-focused with an emphasis on readability.

Usage

TypeScript:

import { string, regex } from "@mkbabb/parse-that";

const heyy = regex(/hey+t/);
heyy.parse("heyyyyyyyyyt"); // => "heyyyyyyyyyt"

Rust:

use parse_that::string;

let heyy = string("heyyyyyyyyyt");
heyy.parse("heyyyyyyyyyt"); // => "heyyyyyyyyyt"

Domain parsers ship with the library:

import { jsonParser, csvParser } from "@mkbabb/parse-that";

jsonParser().parse('{"key": [1, 2, 3]}');   // combinator-based
csvParser().parse('a,b,c\n1,2,3');          // RFC 4180

Or with a grammar (via bbnf-lang):

import { generateParserFromBBNF } from "@mkbabb/bbnf-lang";

const grammar = `
    expr = term, { ("+" | "-"), term };
    term = factor, { ("*" | "/"), factor };
    factor = number | "(", expr, ")";
    number = /[0-9]+/;
`;

const [nonterminals, ast] = generateParserFromBBNF(grammar);
const expr = nonterminals.expr;
expr.parse("1 + 2 * 3"); // => [1, "+", [2, "*", 3]]

Structure

typescript/            TS library (@mkbabb/parse-that v0.7.0)
  src/parse/           Isomorphic module layout (see below)
  test/                Vitest tests + benchmark comparators
rust/                  Rust workspace (nightly, edition 2024)
  parse_that/          Core lib — isomorphic module layout (see below)
  src/                 CLI binary (parse_that_cli)
grammar/               Shared BBNF grammar files (16 .bbnf)
  tests/json/          Shared JSON test vectors
  tests/css/           CSS recovery test vectors
  tests/debug/         Shared diagnostic output vectors
docs/                  Perf chronicles, API reference

Both languages share a mirrored module structure:

Module	TypeScript (`src/parse/`)	Rust (`parse_that/src/`)
Core	`parser.ts` — Parser<T> class	`parse.rs` — Parser<'a, O> struct
Combinators	methods on Parser class (incl. recover)	`combinators.rs` — impl blocks (incl. recover)
Leaf parsers	`leaf.ts` — string, regex, dispatch	`leaf.rs` — string, regex, dispatch_byte
Lazy eval	`lazy.ts` — lazy(), getLazyParser	`lazy.rs` — LazyParser, lazy()
Span / zero-copy	`span.ts` — regexSpan, manySpan	`span_parser.rs` — SpanParser enum
State	`state.ts` — ParserState, Span	`state.rs` — ParserState, Span
Debug	`debug.ts` — diagnostics, ANSI output	`debug.rs` — diagnostics (feature-gated)
Domain parsers	`parsers/` — JSON, CSV, CSS	`parsers/` — JSON + scanners, CSV, CSS

Performance

All benchmarks run on Apple M-series (AArch64). JSON parsing with full DOM materialization and string escape decoding. Higher is better.

Rust

MB/s throughput. bencher crate with black_box on inputs and b.bytes set.

11-parser JSON matrix

Parser	data.json	apache	twitter	citm_catalog	canada	data-xl
sonic-rs	2,291	1,814	2,496	2,896	1,495	2,644
simd-json	1,459	1,477	1,498	1,292	488	1,637
jiter	1,249	1,127	1,022	1,016	566	1,319
serde_json_borrow	1,181	1,120	1,272	1,256	613	1,194
parse_that	926	920	867	701	358	1,006
nom	587	704	504	602	396	599
serde_json	566	539	557	857	554	606
winnow	539	610	531	583	394	579
parse_that (BBNF)	540	541	524	541	312	703
pest	256	275	240	257	156	261

parse_that uses SIMD string scanning (memchr2), integer fast path (madd + ucvtf), Vec objects (no HashMap), u32 keyword loads, Cow<str> zero-copy strings, and #[cold] escape decoding. 1.4–1.7x faster than nom/winnow (work-equivalent peers).

The BBNF-generated parser uses #[derive(Parser)] from a .bbnf grammar file—zero hand-written Rust. Hybrid codegen phases (number regex substitution, transparent alternation elimination, inline match dispatch, SpanParser dual methods, recursive SpanParser codegen) reach 55–70% of the hand-written parser.

See docs/perf-optimization-rust.md for the full optimization chronicle.

TypeScript

Relative to JSON.parse (native C++). Vitest bench, 5 iterations, 5s warmup.

Parser	data (35KB)	apache (124KB)	twitter (617KB)	citm (1.7MB)	canada (2.1MB)
JSON.parse (native)	1.00x	1.00x	1.00x	1.00x	1.00x
parse-that	4.94x	5.27x	6.47x	6.64x	2.78x
Chevrotain	6.17x	7.19x	9.06x	8.97x	4.65x
Peggy	22.5x	24.6x	24.6x	29.1x	8.87x
Parsimmon	25.9x	30.3x	36.0x	41.7x	20.0x
Nearley + moo	64.5x	91.6x	73.0x	85.2x	33.4x

parse-that is the fastest JS parser combinator library — 1.3–1.5x faster than Chevrotain, 4–12x faster than PEG/Earley parsers.

Key optimizations: mutable ParserState (zero-alloc), Tarjan's SCC for minimal lazy wrappers, FIRST-set dispatch tables (O(1) alternation), regex test()+substring() (no RegExpMatchArray alloc), inline wrap().

See docs/perf-optimization-ts.md for the full optimization chronicle.

CSS

Rust MB/s on normalize.css (6KB), bootstrap.css (281KB), tailwind-output.css (3.6MB):

Parser	normalize	bootstrap	tailwind	Level
parse_that (hand-rolled)	457	229	240	L1.75 — typed AST
BBNF-generated	159	61	—	L1 — opaque spans
lightningcss	221	104	—	L2 — semantic
cssparser	678	428	258	L0 — tokenizer only

parse_that (L1.75) builds a fully typed AST: selectors (compound/complex), values (dimension/color/function), typed media queries (conditions, features, range ops), typed @supports conditions, and specificity computation. Monolithic byte-level scanners with memchr SIMD. 2.2x faster than lightningcss on bootstrap, within 0.93x of cssparser (tokenizer-only) on tailwind.

TypeScript (relative to parse-that):

Parser	normalize (6KB)	bootstrap (274KB)
parse-that (L1.75)	1.00x	1.00x
postcss (L1)	1.63x slower	1.33x slower
css-tree (L1-L2)	2.34x slower	2.03x slower

Debugging

The debug combinator pretty-prints parser state during execution.

As output, you'll see:

Header: parsing status (Ok/Err/Done), current offset, node name, stringified parser
Body: surrounding lines of input with the cursor at the active column, line-numbered

Color-coded: BBNF nonterminals in blue, stringified parsers in yellow.

Diagnostics

Both implementations ship a structured diagnostics system — Rust behind a diagnostics Cargo feature, TypeScript via enableDiagnostics() / disableDiagnostics(). Zero overhead when off.

When enabled, parsers accumulate at the furthest offset:

Expected sets — leaf parsers pre-compute labels at construction time ("hello", /[0-9]+/, one of ['a'-'z']). Alternation chains merge them into expected X, Y, or Z (Oxford comma).
Suggestions — wrap() detects unclosed delimiters and emits help: unclosed '(' — insert matching ')'; EOF checks flag trailing content.
Secondary spans — point back to related source locations (unclosed '{' opened here) for multi-site error context.

Rich rendering: ANSI color output with TTY detection and NO_COLOR respect, center-truncation of long lines around the error column, ±4 lines of context with gutter line numbers. Shared test vectors in grammar/tests/debug/ ensure isomorphic output between TypeScript and Rust.

Error Recovery

The recover(sync, sentinel) combinator enables multi-error parsing — the standard technique used by production compilers (rustc, GCC, clang) to parse past failures and keep going.

const decl = declaration.trim().recover(declSync, "RECOVERED");
const block = decl.many(0).wrap("{", "}");

On failure, recover() snapshots the current diagnostic state into a Diagnostic object (expected sets, suggestions, secondary spans, source location), pushes it to a collection, then invokes the sync parser to skip forward to a known recovery point (;, }, etc.) and returns the sentinel. many() loops keep going — each failed element produces a diagnostic but doesn't halt the overall parse.

clearCollectedDiagnostics();
stylesheet.parse(cssWithErrors);
const diagnostics = getCollectedDiagnostics();
console.error(formatAllDiagnostics(diagnostics, css));

Both TypeScript and Rust expose the same API: collectDiagnostic() / push_diagnostic(), getCollectedDiagnostics() / get_collected_diagnostics(), formatDiagnostic() / format_diagnostic(). Rust uses thread-local storage; TypeScript uses module-level globals. See grammar/tests/css/complex-errors.css for a multi-error test vector.

BBNF and the Great Parser Generator

Better Backus-Naur Form: a readable, practical grammar notation. An extension of EBNF with skip/next operators, regex terminals, mapping functions, and an @import system.

The BBNF grammar for BBNF itself lives at bbnf.bbnf.

With your grammar in hand, call generateParserFromBBNF (TypeScript) or use #[derive(Parser)] (Rust):

const [nonterminals, ast] = generateParserFromBBNF(grammar);

#[derive(Parser)]
#[parser(path = "grammar/json.bbnf")]
pub struct Json;

let result = Json::value().parse(input);

Each nonterminal is a Parser object. The BBNF parser-generator is itself written in BBNF—self-hosting. The BBNF ecosystem (compiler, LSP, VS Code extension) lives in the separate bbnf-lang repo.

Operators

Syntax	Meaning
`A?` / `[ A ]`	Optional A
`A*` / `{ A }`	Repeated A (0 or more)
`A+`	Repeated A (1 or more)
`A \| B`	A or B (higher precedence than `,`)
`A, B`	A followed by B
`A - B`	A, but not B
`A >> B`	A then B, return B only
`A << B`	A then B, return A only
`( A )`	Grouping

Emojis supported. Epsilon has a special value: ε.

Left recursion & more

Direct and indirect left recursion are fully supported, as are highly ambiguous grammars:

expr = expr , "+" , expr
     | integer
     | string ;

Using BBNF

The BBNF compiler optimizes the grammar automatically:

Topological sort via Tarjan's SCC
Remove indirect left recursion
Remove direct left recursion
Left factorize

Combinator support

Left recursion via memoize and mergeMemos:

const expression = Parser.lazy(() =>
    all(expression, operators.then(expression).opt()).mergeMemos().or(number)
).memoize();

See memoize.test.ts for details.

Caveats

Left recursion works but isn't optimal. If it can be factored out via BBNF, performance will be fine; otherwise expect slowdowns, since JavaScript lacks proper tail call optimization.

API & examples

See api.md for API information.

See the TypeScript tests and Rust tests for working examples.

Sources, acknowledgements, &c.

Theory

Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles, Techniques, and Tools (2nd ed.). Addison-Wesley. — The Dragon Book. Left recursion elimination, left factoring, FIRST/FOLLOW sets.
Frost, R., Hafiz, R., & Callaghan, P. (2008). Parser combinators for ambiguous left-recursive grammars. PADL '08. — Memoized top-down parsing with left recursion support.
Frost, R. & Hafiz, R. (2006). A new top-down parsing algorithm to accommodate ambiguity and left recursion in polynomial time. ACM SIGPLAN Notices.
Moore, R. C. (2000). Removing left recursion from context-free grammars. NAACL '00.
Tarjan, R. E. (1972). Depth-first search and linear graph algorithms. SIAM Journal on Computing. — SCC detection for grammar cycle analysis and FIRST-set computation.
Power, J. (n.d.). Formal theory of parsing. NUI Maynooth lecture notes.

Notation

Extended Backus-Naur form — ISO 14977. BBNF's ancestor.
Wheeler, D. A. Don't Use ISO 14977 EBNF. — Motivation for BBNF's deviations from the standard.

Performance

Eisel, D. & Lemire, D. (2021). Number parsing at a gigabyte per second. — fast_float2 crate for Rust JSON float parsing.
memchr — SIMD-accelerated byte scanning. Used for memchr2-based JSON string scanning on AArch64 (NEON) and x86 (SSE2/AVX2).
rustc_ast/format.rs — Rust compiler source. Reference for u32 keyword matching pattern.

Parser libraries (competitors and influences)

Parsimmon — Monadic TS parser combinators. Benchmarked as baseline.
Chevrotain — TS parser toolkit with CST. Closest TS competitor.
nom — Rust parser combinators. The Rust ecosystem standard.
winnow — nom's successor with improved dispatch.
pest — PEG parser for Rust. Grammar-driven.
sonic-rs — ByteDance's SIMD JSON parser. The ceiling in our benchmarks.
simd-json — Rust port of simdjson's 3-phase architecture.
jiter — Pydantic's scalar JSON parser. Closest peer to our fast path.

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
docs		docs
grammar/tests		grammar/tests
rust		rust
typescript		typescript
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parse`[that]`

Usage

Table of Contents

Structure

Performance

Rust

11-parser JSON matrix

TypeScript

CSS

Debugging

Diagnostics

Error Recovery

BBNF and the Great Parser Generator

Operators

Left recursion & more

Using BBNF

Combinator support

Caveats

API & examples

Sources, acknowledgements, &c.

Theory

Notation

Performance

Parser libraries (competitors and influences)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

parse[that]

Usage

Table of Contents

Structure

Performance

Rust

11-parser JSON matrix

TypeScript

CSS

Debugging

Diagnostics

Error Recovery

BBNF and the Great Parser Generator

Operators

Left recursion & more

Using BBNF

Combinator support

Caveats

API & examples

Sources, acknowledgements, &c.

Theory

Notation

Performance

Parser libraries (competitors and influences)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

parse`[that]`

Packages