Parser combinators for TypeScript and Rust.
Write your grammar in BBNF (Better
Backus-Naur Form)
.
Handles left recursion and left factoring. Performance-focused with an emphasis on readability.
TypeScript:
import { string, regex } from "@mkbabb/parse-that";
const heyy = regex(/hey+t/);
heyy.parse("heyyyyyyyyyt"); // => "heyyyyyyyyyt"Rust:
use parse_that::string;
let heyy = string("heyyyyyyyyyt");
heyy.parse("heyyyyyyyyyt"); // => "heyyyyyyyyyt"Domain parsers ship with the library:
import { jsonParser, csvParser } from "@mkbabb/parse-that";
jsonParser().parse('{"key": [1, 2, 3]}'); // combinator-based
csvParser().parse('a,b,c\n1,2,3'); // RFC 4180Or with a grammar (via bbnf-lang):
import { generateParserFromBBNF } from "@mkbabb/bbnf-lang";
const grammar = `
expr = term, { ("+" | "-"), term };
term = factor, { ("*" | "/"), factor };
factor = number | "(", expr, ")";
number = /[0-9]+/;
`;
const [nonterminals, ast] = generateParserFromBBNF(grammar);
const expr = nonterminals.expr;
expr.parse("1 + 2 * 3"); // => [1, "+", [2, "*", 3]]- Structure
- Performance
- Debugging
- BBNF and the Great Parser Generator
- Left Recursion
- API & Examples
- Sources
typescript/ TS library (@mkbabb/parse-that v0.7.0)
src/parse/ Isomorphic module layout (see below)
test/ Vitest tests + benchmark comparators
rust/ Rust workspace (nightly, edition 2024)
parse_that/ Core lib — isomorphic module layout (see below)
src/ CLI binary (parse_that_cli)
grammar/ Shared BBNF grammar files (16 .bbnf)
tests/json/ Shared JSON test vectors
tests/css/ CSS recovery test vectors
tests/debug/ Shared diagnostic output vectors
docs/ Perf chronicles, API reference
Both languages share a mirrored module structure:
| Module | TypeScript (src/parse/) |
Rust (parse_that/src/) |
|---|---|---|
| Core | parser.ts — Parser<T> class |
parse.rs — Parser<'a, O> struct |
| Combinators | methods on Parser class (incl. recover) | combinators.rs — impl blocks (incl. recover) |
| Leaf parsers | leaf.ts — string, regex, dispatch |
leaf.rs — string, regex, dispatch_byte |
| Lazy eval | lazy.ts — lazy(), getLazyParser |
lazy.rs — LazyParser, lazy() |
| Span / zero-copy | span.ts — regexSpan, manySpan |
span_parser.rs — SpanParser enum |
| State | state.ts — ParserState, Span |
state.rs — ParserState, Span |
| Debug | debug.ts — diagnostics, ANSI output |
debug.rs — diagnostics (feature-gated) |
| Domain parsers | parsers/ — JSON, CSV, CSS |
parsers/ — JSON + scanners, CSV, CSS |
All benchmarks run on Apple M-series (AArch64). JSON parsing with full DOM materialization and string escape decoding. Higher is better.
MB/s throughput. bencher crate with black_box on inputs and b.bytes set.
| Parser | data.json | apache | citm_catalog | canada | data-xl | |
|---|---|---|---|---|---|---|
| sonic-rs | 2,291 | 1,814 | 2,496 | 2,896 | 1,495 | 2,644 |
| simd-json | 1,459 | 1,477 | 1,498 | 1,292 | 488 | 1,637 |
| jiter | 1,249 | 1,127 | 1,022 | 1,016 | 566 | 1,319 |
| serde_json_borrow | 1,181 | 1,120 | 1,272 | 1,256 | 613 | 1,194 |
| parse_that | 926 | 920 | 867 | 701 | 358 | 1,006 |
| nom | 587 | 704 | 504 | 602 | 396 | 599 |
| serde_json | 566 | 539 | 557 | 857 | 554 | 606 |
| winnow | 539 | 610 | 531 | 583 | 394 | 579 |
| parse_that (BBNF) | 540 | 541 | 524 | 541 | 312 | 703 |
| pest | 256 | 275 | 240 | 257 | 156 | 261 |
parse_that uses SIMD string scanning (memchr2), integer fast path (madd +
ucvtf), Vec objects (no HashMap), u32 keyword loads, Cow<str> zero-copy
strings, and #[cold] escape decoding. 1.4–1.7x faster than nom/winnow
(work-equivalent peers).
The BBNF-generated parser uses #[derive(Parser)] from a .bbnf grammar file—zero
hand-written Rust. Hybrid codegen phases (number regex substitution, transparent
alternation elimination, inline match dispatch, SpanParser dual methods, recursive
SpanParser codegen) reach 55–70% of the hand-written parser.
See docs/perf-optimization-rust.md for the full optimization chronicle.
Relative to JSON.parse (native C++). Vitest bench, 5 iterations, 5s warmup.
| Parser | data (35KB) | apache (124KB) | twitter (617KB) | citm (1.7MB) | canada (2.1MB) |
|---|---|---|---|---|---|
| JSON.parse (native) | 1.00x | 1.00x | 1.00x | 1.00x | 1.00x |
| parse-that | 4.94x | 5.27x | 6.47x | 6.64x | 2.78x |
| Chevrotain | 6.17x | 7.19x | 9.06x | 8.97x | 4.65x |
| Peggy | 22.5x | 24.6x | 24.6x | 29.1x | 8.87x |
| Parsimmon | 25.9x | 30.3x | 36.0x | 41.7x | 20.0x |
| Nearley + moo | 64.5x | 91.6x | 73.0x | 85.2x | 33.4x |
parse-that is the fastest JS parser combinator library — 1.3–1.5x faster than Chevrotain, 4–12x faster than PEG/Earley parsers.
Key optimizations: mutable ParserState (zero-alloc), Tarjan's SCC for minimal
lazy wrappers, FIRST-set dispatch tables (O(1) alternation), regex
test()+substring() (no RegExpMatchArray alloc), inline wrap().
See docs/perf-optimization-ts.md for the full optimization chronicle.
Rust MB/s on normalize.css (6KB), bootstrap.css (281KB), tailwind-output.css (3.6MB):
| Parser | normalize | bootstrap | tailwind | Level |
|---|---|---|---|---|
| parse_that (hand-rolled) | 457 | 229 | 240 | L1.75 — typed AST |
| BBNF-generated | 159 | 61 | — | L1 — opaque spans |
| lightningcss | 221 | 104 | — | L2 — semantic |
| cssparser | 678 | 428 | 258 | L0 — tokenizer only |
parse_that (L1.75) builds a fully typed AST: selectors (compound/complex), values (dimension/color/function), typed media queries (conditions, features, range ops), typed @supports conditions, and specificity computation. Monolithic byte-level scanners with memchr SIMD. 2.2x faster than lightningcss on bootstrap, within 0.93x of cssparser (tokenizer-only) on tailwind.
TypeScript (relative to parse-that):
| Parser | normalize (6KB) | bootstrap (274KB) |
|---|---|---|
| parse-that (L1.75) | 1.00x | 1.00x |
| postcss (L1) | 1.63x slower | 1.33x slower |
| css-tree (L1-L2) | 2.34x slower | 2.03x slower |
The debug combinator pretty-prints parser state during execution.
As output, you'll see:
- Header: parsing status (
Ok/Err/Done), current offset, node name, stringified parser - Body: surrounding lines of input with the cursor at the active column, line-numbered
Color-coded: BBNF nonterminals in blue, stringified parsers in yellow.
Both implementations ship a structured diagnostics system — Rust behind a
diagnostics Cargo feature, TypeScript via enableDiagnostics() /
disableDiagnostics(). Zero overhead when off.
When enabled, parsers accumulate at the furthest offset:
- Expected sets — leaf parsers pre-compute labels at construction time
(
"hello",/[0-9]+/,one of ['a'-'z']). Alternation chains merge them intoexpected X, Y, or Z(Oxford comma). - Suggestions —
wrap()detects unclosed delimiters and emitshelp: unclosed '(' — insert matching ')'; EOF checks flag trailing content. - Secondary spans — point back to related source locations
(
unclosed '{' opened here) for multi-site error context.
Rich rendering: ANSI color output with TTY detection and NO_COLOR respect,
center-truncation of long lines around the error column, ±4 lines of context
with gutter line numbers. Shared test vectors in grammar/tests/debug/ ensure
isomorphic output between TypeScript and Rust.
The recover(sync, sentinel) combinator enables multi-error parsing — the
standard technique used by production compilers (rustc, GCC, clang) to parse
past failures and keep going.
const decl = declaration.trim().recover(declSync, "RECOVERED");
const block = decl.many(0).wrap("{", "}");On failure, recover() snapshots the current diagnostic state into a
Diagnostic object (expected sets, suggestions, secondary spans, source
location), pushes it to a collection, then invokes the sync parser to skip
forward to a known recovery point (;, }, etc.) and returns the sentinel.
many() loops keep going — each failed element produces a diagnostic but
doesn't halt the overall parse.
clearCollectedDiagnostics();
stylesheet.parse(cssWithErrors);
const diagnostics = getCollectedDiagnostics();
console.error(formatAllDiagnostics(diagnostics, css));Both TypeScript and Rust expose the same API: collectDiagnostic() /
push_diagnostic(), getCollectedDiagnostics() / get_collected_diagnostics(),
formatDiagnostic() / format_diagnostic(). Rust uses thread-local storage;
TypeScript uses module-level globals. See
grammar/tests/css/complex-errors.css for a multi-error test vector.
Better Backus-Naur Form: a readable, practical grammar notation. An extension of
EBNF with skip/next
operators, regex terminals, mapping functions, and an @import system.
The BBNF grammar for BBNF itself lives at bbnf.bbnf.
With your grammar in hand, call generateParserFromBBNF (TypeScript) or use
#[derive(Parser)] (Rust):
const [nonterminals, ast] = generateParserFromBBNF(grammar);#[derive(Parser)]
#[parser(path = "grammar/json.bbnf")]
pub struct Json;
let result = Json::value().parse(input);Each nonterminal is a Parser object. The BBNF parser-generator is itself written in
BBNF—self-hosting. The BBNF ecosystem (compiler, LSP, VS Code extension) lives in the
separate bbnf-lang repo.
| Syntax | Meaning |
|---|---|
A? / [ A ] |
Optional A |
A* / { A } |
Repeated A (0 or more) |
A+ |
Repeated A (1 or more) |
A | B |
A or B (higher precedence than ,) |
A, B |
A followed by B |
A - B |
A, but not B |
A >> B |
A then B, return B only |
A << B |
A then B, return A only |
( A ) |
Grouping |
Emojis supported. Epsilon has a special value: ε.
Direct and indirect left recursion are fully supported, as are highly ambiguous grammars:
expr = expr , "+" , expr
| integer
| string ;
The BBNF compiler optimizes the grammar automatically:
- Topological sort via Tarjan's SCC
- Remove indirect left recursion
- Remove direct left recursion
- Left factorize
Left recursion via memoize and mergeMemos:
const expression = Parser.lazy(() =>
all(expression, operators.then(expression).opt()).mergeMemos().or(number)
).memoize();See memoize.test.ts for details.
Left recursion works but isn't optimal. If it can be factored out via BBNF, performance will be fine; otherwise expect slowdowns, since JavaScript lacks proper tail call optimization.
See api.md for API information.
See the TypeScript tests and Rust tests for working examples.
- Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles, Techniques, and Tools (2nd ed.). Addison-Wesley. — The Dragon Book. Left recursion elimination, left factoring, FIRST/FOLLOW sets.
- Frost, R., Hafiz, R., & Callaghan, P. (2008). Parser combinators for ambiguous left-recursive grammars. PADL '08. — Memoized top-down parsing with left recursion support.
- Frost, R. & Hafiz, R. (2006). A new top-down parsing algorithm to accommodate ambiguity and left recursion in polynomial time. ACM SIGPLAN Notices.
- Moore, R. C. (2000). Removing left recursion from context-free grammars. NAACL '00.
- Tarjan, R. E. (1972). Depth-first search and linear graph algorithms. SIAM Journal on Computing. — SCC detection for grammar cycle analysis and FIRST-set computation.
- Power, J. (n.d.). Formal theory of parsing. NUI Maynooth lecture notes.
- Extended Backus-Naur form — ISO 14977. BBNF's ancestor.
- Wheeler, D. A. Don't Use ISO 14977 EBNF. — Motivation for BBNF's deviations from the standard.
- Eisel, D. & Lemire, D. (2021). Number parsing at a gigabyte per second. —
fast_float2crate for Rust JSON float parsing. - memchr — SIMD-accelerated byte scanning. Used for
memchr2-based JSON string scanning on AArch64 (NEON) and x86 (SSE2/AVX2). rustc_ast/format.rs— Rust compiler source. Reference foru32keyword matching pattern.
- Parsimmon — Monadic TS parser combinators. Benchmarked as baseline.
- Chevrotain — TS parser toolkit with CST. Closest TS competitor.
- nom — Rust parser combinators. The Rust ecosystem standard.
- winnow — nom's successor with improved dispatch.
- pest — PEG parser for Rust. Grammar-driven.
- sonic-rs — ByteDance's SIMD JSON parser. The ceiling in our benchmarks.
- simd-json — Rust port of simdjson's 3-phase architecture.
- jiter — Pydantic's scalar JSON parser. Closest peer to our fast path.
