Skip to content

feat: tree-sitter integration — fn, class, smells commands + enhanced file/symbol/status#17

Draft
butttons wants to merge 30 commits intomainfrom
feat/v2.0.0
Draft

feat: tree-sitter integration — fn, class, smells commands + enhanced file/symbol/status#17
butttons wants to merge 30 commits intomainfrom
feat/v2.0.0

Conversation

@butttons
Copy link
Owner

@butttons butttons commented Mar 4, 2026

Summary

Adds tree-sitter powered static analysis to dora via web-tree-sitter. Introduces three new commands (dora fn, dora class, dora smells) and enhances existing commands with richer output from AST parsing.

Changes

Core infrastructure

  • grammar.ts: Grammar discovery — checks explicit config, local node_modules, then global bun packages
  • parser.ts: Core parse pipeline — wasm init, language cache, SCIP correlation, file metrics calculation
  • languages/: TypeScript/TSX and JavaScript/JSX S-expression queries with full capture parsers and language registry

New commands

  • dora fn: Lists functions in a file with parameters, return type, async flag, cyclomatic complexity, LOC, and JSDoc
  • dora class: Lists classes with extends, implements, decorators, abstract flag, methods, and property count
  • dora smells: Detects high complexity, long functions, too many params, and TODO/FIXME/HACK comments with configurable thresholds

Enhanced commands

  • dora file: Now includes tree-sitter metrics (LOC, SLOC, comment lines, function/class counts, avg/max complexity)
  • dora symbol: Function symbols now include signature (parameters + return type)
  • dora status: Reports grammar availability for installed tree-sitter grammars

Docs

  • Cookbook recipe for tree-sitter commands (dora cookbook show tree-sitter)

Tests

  • parseFunctionCaptures: All 5 capture types (declaration, export, arrow, export_arrow, method), deduplication, async detection, 4 parameter node variants, return type stripping, JSDoc detection, line number math, missing-capture skipping
  • parseClassCaptures: Extends, implements via heritage traversal, abstract via child/parent inspection, decorator sibling-chain walking, method extraction with complexity, property counting
  • calculateFileMetrics: Blank line detection, single-line and block comment state machine, loc = sloc + comment_lines + blank_lines invariant across all cases, complexity aggregation and rounding

butttons added 18 commits March 4, 2026 09:11
- src/tree-sitter/types.ts: ParameterInfo, FunctionInfo, MethodInfo, ClassInfo, FileMetrics
- src/schemas/treesitter.ts: Zod schemas + FnResult, SmellsResult, ClassResult
- src/schemas/index.ts: export treesitter schemas

Refs: dora-zp1e
- typescript.ts: function + class queries with full capture extraction
- javascript.ts: same for JS/JSX (no type annotations)
- registry.ts: maps extensions to grammar WASM names and query modules

Refs: dora-xe7i
- Fix fn/class/smells calls to use single object param in index.ts and handlers.ts
- Fix import order in index.ts
- Remove duplicate src/tree-sitter/types.ts, use schemas/treesitter.ts as source of truth
- Deduplicate parseFunctions/parseClasses into shared parseFile helper
- Make getDbConnection synchronous (getDb is sync)
- Rename inBlockComment to isInBlockComment per boolean naming convention
…rics

- parseFunctionCaptures: all 5 capture types, dedup, async, params, return type, jsdoc, line numbers
- parseClassCaptures: extends, implements, abstract, decorators, methods, properties
- calculateFileMetrics: blank/comment/sloc counting, block comment state machine, complexity aggregation
- export calculateFileMetrics from parser.ts
- extend biome:format script to include ./test
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 4, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
dora-docs 8bfce36 Mar 04 2026, 03:27 AM

butttons added 11 commits March 4, 2026 10:23
The nullish coalescing operator is a binary_expression node in
tree-sitter with operator.text === '??', not a standalone '??'
node type. The previous else-if branch was dead code and never
fired. Move ?? into the binary_expression operator check
alongside && and ||.

Refs: dora-gn1s
Parser.init() was called on every getLanguage() cache miss,
meaning loading a second grammar (e.g. tsx after typescript)
would re-initialize the WASM runtime. Extract initialization
into getInitializedModule() which caches the init promise and
resolves it only once regardless of concurrent callers.

Refs: dora-kudd
Bun 1.1+ uses bun.lock (text format) instead of bun.lockb
(binary). Repos using newer Bun only have bun.lock and were
not detected as Bun workspaces, resulting in wrong
scip-typescript indexer flags.

Refs: dora-otpb
Replace template literal path construction (`${root}/${path}`)
with resolveAbsolute() from utils/paths.ts in fn, class, smells,
file, and symbol commands. String interpolation is fragile on
Windows and can produce double slashes if root has a trailing
slash.

Refs: dora-n3ob
Previously tree_sitter status only probed the project's primary
language (e.g. 'typescript'), giving a misleading picture when
multiple grammars are installed. Now probes every language in
languageRegistry in parallel and reports each one individually.

Schema change: tree_sitter.grammars replaces
tree_sitter.{available, language, grammar_path}.

Refs: dora-2stw
The previous regex-based scanner only matched single-line // and
inline /* */ comments on a single line. TODO/FIXME/HACK markers
inside multi-line block comments (/* \n TODO ... \n */) were
silently skipped. Rewrite scanTodoComments to use the same
line-by-line state machine used in calculateFileMetrics,
tracking isInBlockComment across lines.

Refs: dora-t75k
MCP protocol delivers tool args as untyped JSON - no schema is
available at the boundary, justifying the use of any.

Refs: dora-zt88
Detect god_class (too many methods) and large_class (too many
properties) using the existing parseClasses path. Both checks
run in parallel with parseFunctions via Promise.all.

New options: --methods-threshold (default 20), --properties-threshold (default 15).
Exposed in CLI and MCP.

Refs: dora-vhbg
Previously an unrecognized sort value silently fell through to
the default. Now throws with a clear message listing valid values.

Refs: dora-d6s0
findGlobalNodeModulesPath() spawned 'bun pm ls -g' on every
grammar cache miss. Multiple tree-sitter commands in one session
would each pay that subprocess cost. Cache the promise at module
level so the subprocess runs at most once per process.

Refs: dora-ommx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant