feat(agent): add DSL execution engine for LLM-scripted code analysis by buger · Pull Request #412 · probelabs/probe

buger · 2026-02-15T16:10:17Z

Summary

Adds a new DSL execution engine (execute_plan tool) that lets the AI agent write and run JavaScript-like scripts in a sandboxed environment. This replaces verbose multi-turn tool-calling loops with compact, deterministic scripts — enabling complex multi-file analysis, data pipelines, and structured output generation in a single tool call.

Why?

The current agent loop calls tools one at a time: search → extract → LLM → search → extract → ... Each round-trip costs latency and context window. With execute_plan, the agent writes a script that orchestrates all tool calls in one shot — with loops, variables, parallel batching, and direct output delivery.

Architecture

User query
  → Agent writes DSL script
    → Validator (AST whitelist via acorn)
    → Transformer (auto-inject await, loop guards)
    → SandboxJS execution with timeout
    → Self-healing retry (LLM fixes broken scripts, up to 2 retries)
  → Result returned to agent

4 core modules:

Validator (dsl/validator.js) — Whitelist-based AST validation using acorn. Blocks dangerous constructs (eval, Function, import) while allowing standard JS control flow.
Transformer (dsl/transformer.js) — Auto-injects await on tool calls, wraps code in async IIFE, injects loop guards to prevent infinite loops.
Environment (dsl/environment.js) — Generates sandbox globals: tool wrappers, session store, output buffer, utility functions (LLM, parseJSON, batch, map, etc).
Runtime (dsl/runtime.js) — Executes transformed code in SandboxJS with configurable timeout (default 120s) and loop iteration limits.

Key Features

Error-safe tools: All tool wrappers catch errors internally and return "ERROR: message" strings instead of throwing. parseJSON() returns null on failure. This avoids SandboxJS try/catch parameter binding bugs entirely.
Session store: Persistent key-value store across multiple execute_plan calls within one agent session (storeSet, storeGet, storeAppend, storeKeys, storeGetAll).
Output buffer: output() function writes large data (tables, CSV, JSON) directly to the user response, bypassing LLM context window and preventing lossy summarization.
Self-healing: When a script fails, the error + code is sent to the LLM for automatic fix (up to 2 retries).
Parallel batching: batch(items, size) + map(items, fn) for concurrent file processing.
enableExecutePlan flag: Gates the tool (like enableBash/enableDelegate). When enabled, replaces analyze_all.

Files Changed (29 files, ~5900 lines added)

Area	Files	Description
DSL Core	`dsl/validator.js`, `transformer.js`, `environment.js`, `runtime.js`	Sandbox execution engine
Tool	`tools/executePlan.js`, `tools/common.js`, `tools/index.js`	execute_plan tool + schema
Agent Integration	`ProbeAgent.js`, `probeTool.js`, `tools.js`, `index.js`	Tool registration, output buffer, session store
Tests	`tests/unit/dsl-*.test.js` (3 files, ~1260 lines)	110 unit tests covering validator, transformer, runtime
Manual Tests	`dsl/*-test.mjs` (7 files)	Agent tests, pipeline tests, trigger tests, diagnostics
Docs	`docs/llm-script.md`	Full DSL reference with patterns and examples
CI	`.github/workflows/visor.yml`	Update Visor model to glm-5

Test plan

110 DSL unit tests pass (npm test --prefix npm -- --testPathPattern="dsl-")
1990 total npm tests pass, zero regressions
All tests pass on both Node 20 (CI) and Node 22 (local)
6/6 agent tests pass (LLM writes its own scripts)
Manual pipeline test: multi-file analysis with output buffer delivery

🤖 Generated with Claude Code

- Fix XML tag stripping: stripCodeWrapping() removes <execute_plan><code> tags LLMs wrap code in, preventing validation errors and wasted self-healing retries - Add regex literal validation: Validator rejects /pattern/ with clear error message, LLMs steered to String methods (indexOf, includes, startsWith) - Fix async error crashes: Delay unhandledRejection handler removal by 500ms to catch late SandboxJS errors that escape promise chain - Fix ambiguous test prompts: Test 2 now explicitly requests DSL code output instead of plain text answers - Update llm-script.md docs: Add output() function, parseJSON() utility, regex limitation, and Pattern 6 (Direct Output for Large Data) - All 6 agent tests now pass (up from 4/6) - 110 DSL unit tests pass (+2 regex validation tests) - 1974 total npm tests pass with zero regressions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

probelabs · 2026-02-15T16:17:56Z

PR Overview: DSL Execution Engine for LLM-Scripted Code Analysis

Summary

This PR introduces a DSL execution engine (execute_plan tool) that enables AI agents to write and execute JavaScript-like scripts in a sandboxed environment. This replaces verbose multi-turn tool-calling loops with compact, deterministic scripts for complex multi-file analysis, data pipelines, and structured output generation.

Files Changed (28 files, +5,894/-7)

Area	Files	Description
DSL Core	`validator.js`, `transformer.js`, `environment.js`, `runtime.js`	Sandbox execution engine (4 modules, ~932 lines)
Tool	`executePlan.js`, `common.js`, `index.js`	execute_plan tool + schema (761 lines)
Agent Integration	`ProbeAgent.js`, `probeTool.js`, `tools.js`, `index.js`	Tool registration, output buffer, session store
Tests	`dsl-*.test.js` (3 files)	110 unit tests covering validator, transformer, runtime (~1,261 lines)
Manual Tests	`*-test.mjs` (7 files)	Agent tests, pipeline tests, trigger tests
Docs	`llm-script.md`	Full DSL reference with patterns and examples (512 lines)
Config	`package.json`, `CLAUDE.md`	Dependencies + tool development guide

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        User Query                                │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Agent writes DSL script                      │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐         │
│  │  Validator   │ → │ Transformer  │ → │   Runtime    │         │
│  │ (AST check)  │   │(await inject)│   │ (SandboxJS)  │         │
│  └──────────────┘   └──────────────┘   └──────────────┘         │
│         │                  │                  │                  │
│         ▼                  ▼                  ▼                  │
│    Whitelist         Auto-await +        Timeout +              │
│    (acorn)           loop guards         isolation              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
              ┌───────────────────────────┐
              │   Self-healing retry      │
              │   (up to 2 retries)       │
              └───────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Result to Agent                             │
│  ┌─────────────────┐  ┌─────────────────┐                       │
│  │  Return value   │  │  Output buffer  │ (bypasses LLM)        │
│  │  (to context)   │  │  (direct to UI) │                       │
│  └─────────────────┘  └─────────────────┘                       │
└─────────────────────────────────────────────────────────────────┘

Key Technical Changes

Security Model: Multi-layer sandbox
- AST validation via acorn (whitelist approach)
- Blocked identifiers: eval, require, process, globalThis, __proto__, constructor, prototype, Function
- Loop guards: 5,000 max iterations
- Execution timeout: 2 minutes default
Error-Safe Tools: All tool wrappers catch errors internally and return "ERROR: message" strings instead of throwing (avoids SandboxJS try/catch parameter binding bugs)
Session Store: Persistent key-value store across execute_plan calls
- storeSet, storeGet, storeAppend, storeKeys, storeGetAll
Output Buffer: output() function writes large data directly to user response, bypassing LLM context window
Self-Healing: On script failure, error + code sent to LLM for automatic fix (up to 2 retries)
Parallel Processing: map(array, fn) with concurrency control (default 3)

Affected Components

Component	Impact
Agent orchestration	`execute_plan` supersedes `analyze_all` when `enableExecutePlan: true`
Tool system	All tools (search, query, extract, LLM, map) accessible from DSL
Session management	Cross-execution data persistence via session store
Output delivery	Direct content delivery via output buffer
Observability	OTEL tracing throughout pipeline

New Dependencies

@nyariv/sandboxjs - JavaScript sandbox execution
acorn + acorn-walk - AST parsing and traversal
astring - Code generation

Breaking Changes

None. The enableExecutePlan flag is opt-in and defaults to false.

Test Coverage

✅ 110 DSL unit tests pass
✅ 1,990 total npm tests pass, zero regressions
✅ 6/6 agent tests pass (LLM writes its own scripts)
✅ Manual pipeline test verified

Usage

const agent = new ProbeAgent({
  path: '/path/to/codebase',
  provider: 'google',
  enableExecutePlan: true  // Enable DSL orchestration
});

probe agent "Find all API endpoints" --enable-execute-plan

Scope Discovery & Related Files

The DSL modules are self-contained in npm/src/agent/dsl/. Key integration points:

npm/src/agent/ProbeAgent.js - Tool registration and output buffer handling
npm/src/agent/tools.js - Tool factory with enableExecutePlan flag
npm/src/tools/executePlan.js - Vercel AI SDK tool wrapper
npm/tests/unit/dsl-*.test.js - Comprehensive unit test coverage

Metadata

Review Effort: 4 / 5
Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-02-15T19:25:31.533Z | Triggered by: pr_updated | Commit: c80a0a7

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2026-02-15T16:17:58Z

Security Issues (1)

Severity	Location	Issue
🟠 Error	`system:0`	ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 2. No output generated. Check the stream for errors.

Architecture Issues (1)

Severity	Location	Issue
🟠 Error	`system:0`	ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

Performance Issues (1)

Severity	Location	Issue
🟠 Error	`system:0`	ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

Quality Issues (1)

Severity	Location	Issue
🟠 Error	`system:0`	ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

Powered by Visor from Probelabs

Last updated: 2026-02-15T19:25:34.060Z | Triggered by: pr_updated | Commit: c80a0a7

💡 TIP: You can chat with Visor using /visor ask <your question>

SandboxJS doesn't bind catch clause parameters. The workaround injects `var e = __getLastError()` in catch bodies, but `var e` inside `catch (e)` conflicts on Node 20. Fix by renaming the catch parameter to `__catchParam` so the var declaration doesn't shadow it. Also set Visor code review max-parallelism to 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tool functions now catch errors internally and return "ERROR: ..." strings instead of throwing. This eliminates the fragile SandboxJS catch parameter workaround (__getLastError/__setLastError/__catchParam) which broke on Node 20. - traceToolCall: catch + return error string instead of rethrowing - parseJSON: returns null on failure instead of throwing - Removed transformer passes: catch param rename, throw rewrite - Removed errorHolder, __getLastError, __setLastError globals - Updated tool definition and docs to document error-return pattern - Net -147 lines of complexity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

buger force-pushed the buger/agent-dsl-orchestration branch from afbc45a to 059ff17 Compare February 15, 2026 16:15

buger and others added 3 commits February 15, 2026 19:33

chore: update Visor model to glm-5

2e88ea0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

buger changed the title ~~feat(dsl): agent test fixes and DSL improvements~~ feat(agent): add DSL execution engine for LLM-scripted code analysis Feb 15, 2026

Merge branch 'main' into buger/agent-dsl-orchestration

c80a0a7

buger merged commit 0bf79ce into main Feb 15, 2026
15 of 19 checks passed

buger deleted the buger/agent-dsl-orchestration branch February 15, 2026 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): add DSL execution engine for LLM-scripted code analysis#412

feat(agent): add DSL execution engine for LLM-scripted code analysis#412
buger merged 5 commits intomainfrom
buger/agent-dsl-orchestration

buger commented Feb 15, 2026 •

edited

Loading

Uh oh!

probelabs bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

probelabs bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buger commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why?

Architecture

Key Features

Files Changed (29 files, ~5900 lines added)

Test plan

Uh oh!

probelabs bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview: DSL Execution Engine for LLM-Scripted Code Analysis

Summary

Files Changed (28 files, +5,894/-7)

Architecture

Key Technical Changes

Affected Components

New Dependencies

Breaking Changes

Test Coverage

Usage

Scope Discovery & Related Files

Uh oh!

probelabs bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Issues (1)

Architecture Issues (1)

Performance Issues (1)

Quality Issues (1)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

buger commented Feb 15, 2026 •

edited

Loading

probelabs bot commented Feb 15, 2026 •

edited

Loading

probelabs bot commented Feb 15, 2026 •

edited

Loading