Skip to content

feat(agent): add DSL execution engine for LLM-scripted code analysis#412

Merged
buger merged 5 commits intomainfrom
buger/agent-dsl-orchestration
Feb 15, 2026
Merged

feat(agent): add DSL execution engine for LLM-scripted code analysis#412
buger merged 5 commits intomainfrom
buger/agent-dsl-orchestration

Conversation

@buger
Copy link
Collaborator

@buger buger commented Feb 15, 2026

Summary

Adds a new DSL execution engine (execute_plan tool) that lets the AI agent write and run JavaScript-like scripts in a sandboxed environment. This replaces verbose multi-turn tool-calling loops with compact, deterministic scripts — enabling complex multi-file analysis, data pipelines, and structured output generation in a single tool call.

Why?

The current agent loop calls tools one at a time: search → extract → LLM → search → extract → ... Each round-trip costs latency and context window. With execute_plan, the agent writes a script that orchestrates all tool calls in one shot — with loops, variables, parallel batching, and direct output delivery.

Architecture

User query
  → Agent writes DSL script
    → Validator (AST whitelist via acorn)
    → Transformer (auto-inject await, loop guards)
    → SandboxJS execution with timeout
    → Self-healing retry (LLM fixes broken scripts, up to 2 retries)
  → Result returned to agent

4 core modules:

  • Validator (dsl/validator.js) — Whitelist-based AST validation using acorn. Blocks dangerous constructs (eval, Function, import) while allowing standard JS control flow.
  • Transformer (dsl/transformer.js) — Auto-injects await on tool calls, wraps code in async IIFE, injects loop guards to prevent infinite loops.
  • Environment (dsl/environment.js) — Generates sandbox globals: tool wrappers, session store, output buffer, utility functions (LLM, parseJSON, batch, map, etc).
  • Runtime (dsl/runtime.js) — Executes transformed code in SandboxJS with configurable timeout (default 120s) and loop iteration limits.

Key Features

  • Error-safe tools: All tool wrappers catch errors internally and return "ERROR: message" strings instead of throwing. parseJSON() returns null on failure. This avoids SandboxJS try/catch parameter binding bugs entirely.
  • Session store: Persistent key-value store across multiple execute_plan calls within one agent session (storeSet, storeGet, storeAppend, storeKeys, storeGetAll).
  • Output buffer: output() function writes large data (tables, CSV, JSON) directly to the user response, bypassing LLM context window and preventing lossy summarization.
  • Self-healing: When a script fails, the error + code is sent to the LLM for automatic fix (up to 2 retries).
  • Parallel batching: batch(items, size) + map(items, fn) for concurrent file processing.
  • enableExecutePlan flag: Gates the tool (like enableBash/enableDelegate). When enabled, replaces analyze_all.

Files Changed (29 files, ~5900 lines added)

Area Files Description
DSL Core dsl/validator.js, transformer.js, environment.js, runtime.js Sandbox execution engine
Tool tools/executePlan.js, tools/common.js, tools/index.js execute_plan tool + schema
Agent Integration ProbeAgent.js, probeTool.js, tools.js, index.js Tool registration, output buffer, session store
Tests tests/unit/dsl-*.test.js (3 files, ~1260 lines) 110 unit tests covering validator, transformer, runtime
Manual Tests dsl/*-test.mjs (7 files) Agent tests, pipeline tests, trigger tests, diagnostics
Docs docs/llm-script.md Full DSL reference with patterns and examples
CI .github/workflows/visor.yml Update Visor model to glm-5

Test plan

  • 110 DSL unit tests pass (npm test --prefix npm -- --testPathPattern="dsl-")
  • 1990 total npm tests pass, zero regressions
  • All tests pass on both Node 20 (CI) and Node 22 (local)
  • 6/6 agent tests pass (LLM writes its own scripts)
  • Manual pipeline test: multi-file analysis with output buffer delivery

🤖 Generated with Claude Code

- Fix XML tag stripping: stripCodeWrapping() removes <execute_plan><code> tags LLMs wrap code in, preventing validation errors and wasted self-healing retries
- Add regex literal validation: Validator rejects /pattern/ with clear error message, LLMs steered to String methods (indexOf, includes, startsWith)
- Fix async error crashes: Delay unhandledRejection handler removal by 500ms to catch late SandboxJS errors that escape promise chain
- Fix ambiguous test prompts: Test 2 now explicitly requests DSL code output instead of plain text answers
- Update llm-script.md docs: Add output() function, parseJSON() utility, regex limitation, and Pattern 6 (Direct Output for Large Data)
- All 6 agent tests now pass (up from 4/6)
- 110 DSL unit tests pass (+2 regex validation tests)
- 1974 total npm tests pass with zero regressions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@buger buger force-pushed the buger/agent-dsl-orchestration branch from afbc45a to 059ff17 Compare February 15, 2026 16:15
@probelabs
Copy link
Contributor

probelabs bot commented Feb 15, 2026

PR Overview: DSL Execution Engine for LLM-Scripted Code Analysis

Summary

This PR introduces a DSL execution engine (execute_plan tool) that enables AI agents to write and execute JavaScript-like scripts in a sandboxed environment. This replaces verbose multi-turn tool-calling loops with compact, deterministic scripts for complex multi-file analysis, data pipelines, and structured output generation.


Files Changed (28 files, +5,894/-7)

Area Files Description
DSL Core validator.js, transformer.js, environment.js, runtime.js Sandbox execution engine (4 modules, ~932 lines)
Tool executePlan.js, common.js, index.js execute_plan tool + schema (761 lines)
Agent Integration ProbeAgent.js, probeTool.js, tools.js, index.js Tool registration, output buffer, session store
Tests dsl-*.test.js (3 files) 110 unit tests covering validator, transformer, runtime (~1,261 lines)
Manual Tests *-test.mjs (7 files) Agent tests, pipeline tests, trigger tests
Docs llm-script.md Full DSL reference with patterns and examples (512 lines)
Config package.json, CLAUDE.md Dependencies + tool development guide

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        User Query                                │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Agent writes DSL script                      │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐         │
│  │  Validator   │ → │ Transformer  │ → │   Runtime    │         │
│  │ (AST check)  │   │(await inject)│   │ (SandboxJS)  │         │
│  └──────────────┘   └──────────────┘   └──────────────┘         │
│         │                  │                  │                  │
│         ▼                  ▼                  ▼                  │
│    Whitelist         Auto-await +        Timeout +              │
│    (acorn)           loop guards         isolation              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
              ┌───────────────────────────┐
              │   Self-healing retry      │
              │   (up to 2 retries)       │
              └───────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Result to Agent                             │
│  ┌─────────────────┐  ┌─────────────────┐                       │
│  │  Return value   │  │  Output buffer  │ (bypasses LLM)        │
│  │  (to context)   │  │  (direct to UI) │                       │
│  └─────────────────┘  └─────────────────┘                       │
└─────────────────────────────────────────────────────────────────┘

Key Technical Changes

  1. Security Model: Multi-layer sandbox

    • AST validation via acorn (whitelist approach)
    • Blocked identifiers: eval, require, process, globalThis, __proto__, constructor, prototype, Function
    • Loop guards: 5,000 max iterations
    • Execution timeout: 2 minutes default
  2. Error-Safe Tools: All tool wrappers catch errors internally and return "ERROR: message" strings instead of throwing (avoids SandboxJS try/catch parameter binding bugs)

  3. Session Store: Persistent key-value store across execute_plan calls

    • storeSet, storeGet, storeAppend, storeKeys, storeGetAll
  4. Output Buffer: output() function writes large data directly to user response, bypassing LLM context window

  5. Self-Healing: On script failure, error + code sent to LLM for automatic fix (up to 2 retries)

  6. Parallel Processing: map(array, fn) with concurrency control (default 3)


Affected Components

Component Impact
Agent orchestration execute_plan supersedes analyze_all when enableExecutePlan: true
Tool system All tools (search, query, extract, LLM, map) accessible from DSL
Session management Cross-execution data persistence via session store
Output delivery Direct content delivery via output buffer
Observability OTEL tracing throughout pipeline

New Dependencies

  • @nyariv/sandboxjs - JavaScript sandbox execution
  • acorn + acorn-walk - AST parsing and traversal
  • astring - Code generation

Breaking Changes

None. The enableExecutePlan flag is opt-in and defaults to false.


Test Coverage

  • ✅ 110 DSL unit tests pass
  • ✅ 1,990 total npm tests pass, zero regressions
  • ✅ 6/6 agent tests pass (LLM writes its own scripts)
  • ✅ Manual pipeline test verified

Usage

const agent = new ProbeAgent({
  path: '/path/to/codebase',
  provider: 'google',
  enableExecutePlan: true  // Enable DSL orchestration
});
probe agent "Find all API endpoints" --enable-execute-plan

Scope Discovery & Related Files

The DSL modules are self-contained in npm/src/agent/dsl/. Key integration points:

  • npm/src/agent/ProbeAgent.js - Tool registration and output buffer handling
  • npm/src/agent/tools.js - Tool factory with enableExecutePlan flag
  • npm/src/tools/executePlan.js - Vercel AI SDK tool wrapper
  • npm/tests/unit/dsl-*.test.js - Comprehensive unit test coverage
Metadata
  • Review Effort: 4 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-02-15T19:25:31.533Z | Triggered by: pr_updated | Commit: c80a0a7

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Contributor

probelabs bot commented Feb 15, 2026

Security Issues (1)

Severity Location Issue
🟠 Error system:0
ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 2. No output generated. Check the stream for errors.

Architecture Issues (1)

Severity Location Issue
🟠 Error system:0
ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

Performance Issues (1)

Severity Location Issue
🟠 Error system:0
ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

Quality Issues (1)

Severity Location Issue
🟠 Error system:0
ProbeAgent execution failed: Error: Failed to get response from AI model during iteration 1. No output generated. Check the stream for errors.

Powered by Visor from Probelabs

Last updated: 2026-02-15T19:25:34.060Z | Triggered by: pr_updated | Commit: c80a0a7

💡 TIP: You can chat with Visor using /visor ask <your question>

buger and others added 3 commits February 15, 2026 19:33
SandboxJS doesn't bind catch clause parameters. The workaround injects
`var e = __getLastError()` in catch bodies, but `var e` inside
`catch (e)` conflicts on Node 20. Fix by renaming the catch parameter
to `__catchParam` so the var declaration doesn't shadow it.

Also set Visor code review max-parallelism to 1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tool functions now catch errors internally and return "ERROR: ..."
strings instead of throwing. This eliminates the fragile SandboxJS
catch parameter workaround (__getLastError/__setLastError/__catchParam)
which broke on Node 20.

- traceToolCall: catch + return error string instead of rethrowing
- parseJSON: returns null on failure instead of throwing
- Removed transformer passes: catch param rename, throw rewrite
- Removed errorHolder, __getLastError, __setLastError globals
- Updated tool definition and docs to document error-return pattern
- Net -147 lines of complexity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@buger buger changed the title feat(dsl): agent test fixes and DSL improvements feat(agent): add DSL execution engine for LLM-scripted code analysis Feb 15, 2026
@buger buger merged commit 0bf79ce into main Feb 15, 2026
15 of 19 checks passed
@buger buger deleted the buger/agent-dsl-orchestration branch February 15, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant