diff --git a/.github/workflows/visor.yml b/.github/workflows/visor.yml index 1fa2373c..3cebf2f8 100644 --- a/.github/workflows/visor.yml +++ b/.github/workflows/visor.yml @@ -25,6 +25,7 @@ jobs: app-id: ${{ secrets.APP_ID }} private-key: ${{ secrets.APP_PRIVATE_KEY }} installation-id: ${{ secrets.APP_INSTALLATION_ID }} + max-parallelism: '1' debug: 'true' env: # AI Provider API Keys (configure one of these in your repository secrets) @@ -33,7 +34,7 @@ jobs: ANTHROPIC_API_URL: 'https://api.z.ai/api/anthropic/v1' # OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} # Optional: Specify the AI model to use - MODEL_NAME: 'glm-4.6' + MODEL_NAME: 'glm-5' - name: Upload telemetry traces if: always() diff --git a/CLAUDE.md b/CLAUDE.md index ad290d07..3ae6db6b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -122,6 +122,55 @@ make fix-all # Runs format + lint fixes + tests 4. Register in `src/language/factory.rs` 5. Update docs in `docs/supported-languages.md` +### 6. Adding a New Agent Tool (NPM) + +When adding a new tool to the probe agent (like `search`, `execute_plan`, etc.), you must update **all** of the following files. Missing any will cause the tool to silently not work. + +**Schema & definition (`npm/src/tools/common.js`):** +- Add Zod schema (e.g. `export const myToolSchema = z.object({...})`) +- Add tool name to `DEFAULT_VALID_TOOLS` array +- Add schema to `getValidParamsForTool()` switch map +- Add XML tool definition string if using XML-based parsing + +**Implementation (`npm/src/tools/myTool.js`):** +- Create tool file with `execute` function +- Export the tool creator function and any definition generators + +**Vercel AI SDK wrapper (`npm/src/tools/vercel.js`) — if tool is a Vercel AI `tool()`:** +- Add tool wrapper using the `tool()` helper from `ai` package + +**Tool index (`npm/src/tools/index.js`):** +- Add exports for the tool creator, definition, and schema + +**SDK public API (`npm/src/index.js`):** +- Import the tool creator, schema, and definition +- Add all three to the `export {}` block + +**Agent tool creation (`npm/src/agent/tools.js`):** +- Import the tool creator and schema from `../index.js` +- Add tool instantiation in `createTools()` with `isToolAllowed()` check +- Add schema and definition to re-exports + +**Agent tool wrapping (`npm/src/agent/probeTool.js`):** +- Wrap the tool with `wrapToolWithEmitter()` for event emission + +**Agent registration (`npm/src/agent/ProbeAgent.js`) — 4 places:** +1. **toolImplementations** (~line 840): Register wrapped tool instance +2. **System prompt tool definitions** (~line 2420): Add tool definition to prompt +3. **Available tools list** (~line 2500): Add one-line description +4. **Valid tools list** (~line 3220): Add to `validTools` array + +**Tests:** +- Unit tests in `npm/tests/unit/` +- Verify with: `npm test --prefix npm -- --testPathPattern="myTool"` + +**Build & verify:** +```bash +npm run build --prefix npm # Rebuild agent bundle +npm test --prefix npm # Run all tests +node npm/bin/probe agent "test query" --path . --provider google # End-to-end test +``` + ## Common Commands ```bash diff --git a/docs/llm-script.md b/docs/llm-script.md new file mode 100644 index 00000000..b67ba2a6 --- /dev/null +++ b/docs/llm-script.md @@ -0,0 +1,512 @@ +# LLM Script + +LLM Script is Probe's programmable orchestration engine for running complex, multi-step code analysis tasks. Instead of relying on unpredictable multi-turn AI conversations, LLM Script lets you (or the AI) write short, deterministic programs that orchestrate search, extraction, and LLM calls in a sandboxed environment. + +Think of it as **stored procedures for code intelligence** — predictable, reproducible, and capable of processing entire codebases in a single execution. + +## Why LLM Script? + +Traditional AI agent workflows have a fundamental problem: each step is a separate LLM call that can drift, hallucinate, or lose context. When you ask "find all API endpoints and classify them by auth method," a typical agent might: + +1. Search once, get partial results +2. Lose track of what it already found +3. Produce inconsistent classifications across calls +4. Take dozens of expensive LLM round-trips + +LLM Script solves this by letting the AI write a **complete program** upfront that: + +- Searches systematically across the entire codebase +- Processes results in parallel with controlled concurrency +- Uses LLM calls only where needed (classification, summarization) +- Accumulates structured data in a persistent store +- Computes statistics with pure JavaScript — no LLM needed +- Returns formatted, predictable results + +## How It Works + +LLM Script programs look like simple JavaScript but run in a secure sandbox with special capabilities: + +```javascript +// Find all error handling patterns across the codebase +const results = search("error handling try catch") +const chunks = chunk(results) + +var patterns = [] +for (const c of chunks) { + const found = LLM( + "Extract error handling patterns as JSON: [{type, file, description}]. ONLY JSON.", + c + ) + try { + const parsed = JSON.parse(String(found)) + for (const item of parsed) { patterns.push(item) } + } catch (e) { log("Parse error, skipping chunk") } +} + +const byType = groupBy(patterns, "type") +var table = "| Pattern | Count |\n|---------|-------|\n" +for (const type of Object.keys(byType)) { + table = table + "| " + type + " | " + byType[type].length + " |\n" +} + +return table +``` + +**The execution pipeline:** + +1. **Validate** — AST-level whitelist ensures only safe constructs are used (no `eval`, `require`, `import`, `class`, `new`, etc.) +2. **Transform** — Automatically injects `await` before async tool calls and adds loop guards to prevent infinite loops +3. **Execute** — Runs in a SandboxJS environment with a configurable timeout (default 2 minutes) +4. **Self-heal** — If execution fails, the AI automatically gets the error and fixes the script (up to 2 retries) + +## Two Ways to Use LLM Script + +### 1. Through Prompting (AI-Generated Scripts) + +The most common way — you describe what you want in natural language, and the AI writes the script for you: + +``` +You: "Find all API endpoints in this codebase, classify each by HTTP method, + and produce a markdown table with counts per method." +``` + +The AI generates and executes a script like: + +```javascript +// Discover repo structure first +const files = listFiles("**/*.{js,ts,py,go,rs}") +const sample = search("API endpoint route handler") + +// Let LLM determine the best search strategy +const strategy = LLM( + "Based on this codebase structure, what search queries would find ALL API endpoints? Return as JSON array of strings.", + files.join("\n") + "\n\nSample results:\n" + sample +) + +const queries = JSON.parse(String(strategy)) +var allResults = "" +for (const q of queries) { + allResults = allResults + "\n" + search(q) +} + +// Process in chunks with LLM classification +const chunks = chunk(allResults) +const classified = map(chunks, (c) => LLM( + "Extract API endpoints as JSON: [{method, path, handler, file}]. ONLY JSON.", c +)) + +var endpoints = [] +for (const batch of classified) { + try { + const parsed = JSON.parse(String(batch)) + for (const ep of parsed) { endpoints.push(ep) } + } catch (e) { log("Parse error") } +} + +// Pure JS statistics — no LLM needed +endpoints = unique(endpoints) +const byMethod = groupBy(endpoints, "method") +var table = "| Method | Count | Example |\n|--------|-------|---------|\n" +for (const method of Object.keys(byMethod)) { + const examples = byMethod[method] + table = table + "| " + method + " | " + examples.length + " | " + examples[0].path + " |\n" +} + +return table + "\nTotal: " + endpoints.length + " endpoints" +``` + +### 2. User-Provided Scripts + +You can also write scripts directly — useful for repeatable analysis tasks, CI pipelines, or when you want precise control over the execution: + +```javascript +// Audit: find all TODO/FIXME comments with their context +const todos = search("TODO OR FIXME") +const chunks = chunk(todos) +const items = map(chunks, (c) => LLM( + "Extract TODO/FIXME items as JSON: [{text, file, priority, category}]. " + + "Priority: high/medium/low. Category: bug/feature/refactor/debt. ONLY JSON.", c +)) + +var all = [] +for (const batch of items) { + try { + const parsed = JSON.parse(String(batch)) + for (const item of parsed) { all.push(item) } + } catch (e) { log("Parse error") } +} + +const byPriority = groupBy(all, "priority") +var report = "# TODO Audit Report\n\n" +for (const priority of ["high", "medium", "low"]) { + const group = byPriority[priority] || [] + report = report + "## " + priority.toUpperCase() + " (" + group.length + ")\n\n" + for (const item of group) { + report = report + "- **" + item.file + "**: " + item.text + " [" + item.category + "]\n" + } + report = report + "\n" +} + +return report +``` + +## Available Functions + +### Search & Extraction (async, auto-awaited) + +| Function | Description | Returns | +|----------|-------------|---------| +| `search(query)` | Semantic code search with Elasticsearch-like syntax | `string` — code snippets with file paths | +| `query(pattern)` | AST-based structural code search (tree-sitter) | `string` — matching code elements | +| `extract(targets)` | Extract code by file path + line number | `string` — extracted code content | +| `listFiles(pattern)` | List files matching a glob pattern | `array` — array of file path strings | +| `bash(command)` | Execute a shell command | `string` — command output | + +### AI (async, auto-awaited) + +| Function | Description | Returns | +|----------|-------------|---------| +| `LLM(instruction, data)` | Make a focused LLM call to process/classify/summarize data | `string` — AI response | +| `map(array, fn)` | Process items in parallel with concurrency control (default 3) | `array` — results | + +### Data Utilities (sync) + +| Function | Description | Returns | +|----------|-------------|---------| +| `chunk(data, tokens?)` | Split a large string into token-sized chunks (default 20,000 tokens) | `array` of strings | +| `batch(array, size)` | Split array into sub-arrays of given size | `array` of arrays | +| `groupBy(array, key)` | Group array items by a key or function | `object` | +| `unique(array)` | Deduplicate array items | `array` | +| `flatten(array)` | Flatten one level of nesting | `array` | +| `range(start, end)` | Generate array of integers [start, end) | `array` | +| `parseJSON(text)` | Parse JSON from LLM output (strips markdown fences). Returns `null` on parse failure. | `any\|null` | +| `log(message)` | Log a message for debugging | `void` | + +### Direct Output (sync) + +| Function | Description | Returns | +|----------|-------------|---------| +| `output(content)` | Write content directly to the user's response, bypassing LLM rewriting. Use for large tables, JSON, or CSV that should be delivered verbatim. | `void` | + +When you use `output()`, the content is appended directly to the final response after the AI's summary — the AI never sees or rewrites it. This preserves data fidelity for large structured outputs like tables with 50+ rows. + +### Session Store (sync, persists across executions) + +The session store allows data to persist across multiple script executions within the same conversation. This enables multi-phase workflows where one script collects data and a later script processes it. + +| Function | Description | Returns | +|----------|-------------|---------| +| `storeSet(key, value)` | Store a value | `void` | +| `storeGet(key)` | Retrieve a value (returns `undefined` if missing) | `any` | +| `storeAppend(key, item)` | Append to an array (auto-creates if key doesn't exist) | `void` | +| `storeKeys()` | List all stored keys | `array` of strings | +| `storeGetAll()` | Return entire store as a plain object | `object` | + +## Patterns + +### Pattern 1: Discovery-First (Recommended) + +Start by exploring the repository structure, then let the LLM determine the optimal search strategy: + +```javascript +// Phase 1: Discover +const files = listFiles("**/*.{js,ts,py}") +const sample = search("authentication") + +// Phase 2: Let LLM plan the strategy +const strategy = LLM( + "Based on this repo structure, what are the best search queries to find ALL authentication code? Return as JSON array.", + files.join("\n") + "\n\nSample:\n" + sample +) + +// Phase 3: Execute with discovered strategy +const queries = JSON.parse(String(strategy)) +var allCode = "" +for (const q of queries) { + allCode = allCode + "\n" + search(q) +} + +// Phase 4: Analyze +const analysis = LLM("Provide a comprehensive analysis of the authentication system.", allCode) +return analysis +``` + +### Pattern 2: Data Pipeline with Session Store + +Process large datasets in phases — extract, accumulate, compute, format: + +```javascript +// Phase 1: Collect and classify +const results = search("API endpoints") +const chunks = chunk(results) +const extracted = map(chunks, (c) => LLM( + "Extract endpoints as JSON array: [{method, path, handler}]. ONLY JSON.", c +)) +for (const batch of extracted) { + try { + const parsed = JSON.parse(String(batch)) + for (const item of parsed) { storeAppend("endpoints", item) } + } catch (e) { log("Parse error, skipping") } +} + +// Phase 2: Pure JS statistics (no LLM needed!) +const all = storeGet("endpoints") +const byMethod = groupBy(all, "method") +var table = "| Method | Count |\n|--------|-------|\n" +for (const method of Object.keys(byMethod)) { + table = table + "| " + method + " | " + byMethod[method].length + " |\n" +} + +// Phase 3: Small LLM summary +const summary = LLM("Write a brief summary of this API surface.", table) +return table + "\n" + summary +``` + +### Pattern 3: Batch Processing with Parallel Execution + +Process many items efficiently using `map()` for controlled concurrency: + +```javascript +const files = listFiles("src/**/*.ts") +const batches = batch(files, 5) + +var allIssues = [] +for (const b of batches) { + const results = map(b, (file) => { + const code = extract(file) + return LLM("Find potential bugs. Return JSON: [{file, line, issue, severity}]. ONLY JSON.", code) + }) + for (const r of results) { + try { + const parsed = JSON.parse(String(r)) + for (const issue of parsed) { allIssues.push(issue) } + } catch (e) { log("Parse error") } + } +} + +const bySeverity = groupBy(allIssues, "severity") +var report = "# Bug Report\n\n" +for (const sev of ["high", "medium", "low"]) { + const items = bySeverity[sev] || [] + report = report + "## " + sev.toUpperCase() + " (" + items.length + ")\n" + for (const item of items) { + report = report + "- " + item.file + ":" + item.line + " — " + item.issue + "\n" + } + report = report + "\n" +} + +return report +``` + +### Pattern 4: Multi-Search Synthesis + +Combine results from multiple targeted searches: + +```javascript +const topics = ["authentication", "authorization", "session management", "CSRF", "XSS"] +var allFindings = "" + +for (const topic of topics) { + try { + const results = search(topic + " security") + allFindings = allFindings + "\n## " + topic + "\n" + results + } catch (e) { + log("Search failed for: " + topic) + } +} + +const chunks = chunk(allFindings) +const analyses = map(chunks, (c) => LLM( + "Analyze this code for security issues. Be specific about file and line.", c +)) + +var report = "# Security Audit\n\n" +for (const a of analyses) { + report = report + a + "\n\n" +} + +return report +``` + +### Pattern 5: Iterative Deepening + +Start broad, then drill into the most interesting results: + +```javascript +// Broad search +const overview = search("database connection pool") +const summary = LLM( + "Which files are most important for understanding connection pooling? Return as JSON array of file paths.", + overview +) + +// Deep dive +const importantFiles = JSON.parse(String(summary)) +var details = "" +for (const file of importantFiles) { + try { + details = details + "\n\n" + extract(file) + } catch (e) { log("Could not extract: " + file) } +} + +// Final analysis +const analysis = LLM( + "Provide a detailed analysis of the connection pooling implementation. Include architecture decisions and potential improvements.", + details +) +return analysis +``` + +### Pattern 6: Direct Output for Large Data + +When your script produces large structured data (tables, JSON, CSV), use `output()` to deliver it directly to the user without the AI rewriting or summarizing it: + +```javascript +const results = search("customer onboarding") +const chunks = chunk(results) + +const classified = map(chunks, (c) => LLM( + "Extract customers as JSON: [{name, industry, status}]. ONLY JSON.", c +)) + +var customers = [] +for (const batch of classified) { + try { + const parsed = parseJSON(String(batch)) + if (Array.isArray(parsed)) { + for (const item of parsed) { customers.push(item) } + } + } catch (e) { log("Parse error, skipping") } +} + +// Build a markdown table +var table = "| Customer | Industry | Status |\n|----------|----------|--------|\n" +for (const c of customers) { + table = table + "| " + (c.name || "Unknown") + " | " + (c.industry || "Unknown") + " | " + (c.status || "-") + " |\n" +} + +// output() sends the full table directly to the user — no summarization +output(table) + +// return value is what the AI sees — keep it short +return "Generated table with " + customers.length + " customers" +``` + +The AI will respond with something like "Here's the customer analysis..." and the full table will be appended verbatim below its response. + +## Writing Rules + +LLM Script uses a safe subset of JavaScript. Keep these rules in mind: + +**Do:** +- Use `var` for variables (or `const`/`let`) +- Use `for...of` loops for iteration +- Use plain objects and arrays +- Check for errors with `if (result.indexOf("ERROR:") === 0)` — tool functions never throw, they return `"ERROR: ..."` strings +- Use string concatenation with `+` (not template literals with `${}`) +- Use `parseJSON()` instead of `JSON.parse()` when parsing LLM output (handles markdown fences) +- Use `output()` for large structured data that should reach the user verbatim + +**Don't:** +- Use `async`/`await` (auto-injected by the transformer) +- Use `class`, `new`, `this` +- Use `eval`, `require`, `import` +- Use `process`, `globalThis`, `__proto__` +- Define helper functions that call tools (the transformer can't inject `await` inside user-defined functions) +- Use regex literals (`/pattern/`) — use `indexOf()`, `includes()`, `startsWith()` instead +- Use `.matchAll()` or `.join()` (SandboxJS limitations — use `for...of` loops instead) + +## Safety Model + +LLM Script runs in a multi-layer security sandbox: + +1. **AST Validation** — Before execution, the script's Abstract Syntax Tree is checked against a whitelist. Only safe constructs are allowed. No `eval`, `require`, `import`, `class`, `new`, `this`, `__proto__`, `constructor`, or `prototype` access. + +2. **SandboxJS Isolation** — Scripts execute in SandboxJS, a JavaScript sandbox that prevents access to Node.js globals, the filesystem, and the network. Only the explicitly provided tool functions are available. + +3. **Loop Guards** — Automatic loop iteration limits (default 5,000) prevent infinite loops. The transformer injects a `__checkLoop()` call into every loop body. + +4. **Execution Timeout** — A configurable timeout (default 2 minutes) kills scripts that take too long. + +5. **Self-Healing** — If a script fails, the error is sent to the LLM which generates a fixed version. Up to 2 retries are attempted before returning an error. + +## Enabling LLM Script + +### ProbeAgent SDK + +```javascript +import { ProbeAgent } from '@probelabs/probe'; + +const agent = new ProbeAgent({ + path: '/path/to/your/codebase', + provider: 'anthropic', + enableExecutePlan: true // Enable LLM Script +}); + +// The agent will now use LLM Script for complex analysis tasks +const report = await agent.answer( + 'Find all API endpoints and classify them by HTTP method' +); +``` + +### CLI + +```bash +probe agent "Find all API endpoints" \ + --path /path/to/project \ + --provider google \ + --enable-execute-plan +``` + +### When Does LLM Script Trigger? + +The AI automatically chooses LLM Script (over simple search) for questions that require: + +- **Comprehensive coverage**: "Find **all** error handling patterns" +- **Complete inventories**: "Give me a **complete inventory** of API routes" +- **Multi-topic analysis**: "Compare authentication, authorization, and session handling" +- **Batch processing**: "Classify **every** TODO comment by priority" +- **Quantitative answers**: "How many functions in each module?" + +For simple, focused questions like "How does the login function work?", the AI uses direct search instead. + +## Real-World Examples + +### Codebase Health Report + +``` +You: "Generate a comprehensive health report for this codebase — + code complexity, test coverage gaps, dependency analysis, + and security concerns." +``` + +### API Documentation Generator + +``` +You: "Find every API endpoint, extract its parameters, authentication + requirements, and response types, then generate OpenAPI-style + documentation as markdown." +``` + +### Migration Planning + +``` +You: "We're migrating from Express to Fastify. Find all Express-specific + patterns (middleware, route handlers, error handlers) and produce + a migration checklist with effort estimates." +``` + +### Dependency Impact Analysis + +``` +You: "We need to upgrade the 'auth' library. Find every file that imports + from it, classify each usage pattern, and identify which ones will + break with the new API." +``` + +## Related Resources + +- [AI Integration Overview](/ai-integration) — Overview of all Probe AI features +- [Node.js SDK API Reference](/nodejs-sdk) — Programmatic access to Probe +- [ProbeAgent SDK](/ai-integration#probeagent-sdk) — Building AI-powered code analysis apps +- [AI Chat Mode](/ai-chat) — Interactive chat interface diff --git a/npm/package.json b/npm/package.json index 04491f5b..b4c4fb80 100644 --- a/npm/package.json +++ b/npm/package.json @@ -79,10 +79,14 @@ "@ai-sdk/openai": "^2.0.10", "@anthropic-ai/claude-agent-sdk": "^0.1.46", "@modelcontextprotocol/sdk": "^1.0.0", + "@nyariv/sandboxjs": "^0.8.32", "@probelabs/maid": "^0.0.24", + "acorn": "^8.15.0", + "acorn-walk": "^8.3.4", "adm-zip": "^0.5.16", "ai": "^5.0.0", "ajv": "^8.17.1", + "astring": "^1.9.0", "axios": "^1.8.3", "dotenv": "^16.4.7", "fs-extra": "^11.1.1", diff --git a/npm/src/agent/ProbeAgent.d.ts b/npm/src/agent/ProbeAgent.d.ts index 4458b830..faf14f85 100644 --- a/npm/src/agent/ProbeAgent.d.ts +++ b/npm/src/agent/ProbeAgent.d.ts @@ -41,6 +41,8 @@ export interface ProbeAgentOptions { enableDelegate?: boolean; /** Architecture context filename to embed from repo root (defaults to AGENTS.md with CLAUDE.md fallback; ARCHITECTURE.md is always included when present) */ architectureFileName?: string; + /** Enable the execute_plan DSL orchestration tool */ + enableExecutePlan?: boolean; /** Enable bash tool for command execution */ enableBash?: boolean; /** Bash tool configuration (allow/deny patterns) */ diff --git a/npm/src/agent/ProbeAgent.js b/npm/src/agent/ProbeAgent.js index 90d13173..d4e18a5e 100644 --- a/npm/src/agent/ProbeAgent.js +++ b/npm/src/agent/ProbeAgent.js @@ -48,6 +48,7 @@ import { extractToolDefinition, delegateToolDefinition, analyzeAllToolDefinition, + getExecutePlanToolDefinition, bashToolDefinition, listFilesToolDefinition, searchFilesToolDefinition, @@ -176,6 +177,7 @@ export class ProbeAgent { * @param {string} [options.promptType] - Predefined prompt type (code-explorer, code-searcher, architect, code-review, support) * @param {boolean} [options.allowEdit=false] - Allow the use of the 'implement' tool * @param {boolean} [options.enableDelegate=false] - Enable the delegate tool for task distribution to subagents + * @param {boolean} [options.enableExecutePlan=false] - Enable the execute_plan DSL orchestration tool * @param {string} [options.architectureFileName] - Architecture context filename to embed from repo root (defaults to AGENTS.md with CLAUDE.md fallback; ARCHITECTURE.md is always included when present) * @param {string} [options.path] - Search directory path * @param {string} [options.cwd] - Working directory for resolving relative paths (independent of allowedFolders) @@ -225,6 +227,7 @@ export class ProbeAgent { this.promptType = options.promptType || 'code-explorer'; this.allowEdit = !!options.allowEdit; this.enableDelegate = !!options.enableDelegate; + this.enableExecutePlan = !!options.enableExecutePlan; this.debug = options.debug || process.env.DEBUG === '1'; this.cancelled = false; this.tracer = options.tracer || null; @@ -809,6 +812,10 @@ export class ProbeAgent { initializeTools() { const isToolAllowed = (toolName) => this.allowedTools.isEnabled(toolName); + // Output buffer for DSL output() function — shared mutable object, + // reset at the start of each answer() call + this._outputBuffer = { items: [] }; + const configOptions = { sessionId: this.sessionId, debug: this.debug, @@ -820,6 +827,7 @@ export class ProbeAgent { searchDelegate: this.searchDelegate, allowEdit: this.allowEdit, enableDelegate: this.enableDelegate, + enableExecutePlan: this.enableExecutePlan, enableBash: this.enableBash, bashConfig: this.bashConfig, tracer: this.tracer, @@ -828,6 +836,7 @@ export class ProbeAgent { provider: this.clientApiProvider, model: this.clientApiModel, delegationManager: this.delegationManager, // Per-instance delegation limits + outputBuffer: this._outputBuffer, concurrencyLimiter: this.concurrencyLimiter, // Global AI concurrency limiter isToolAllowed }; @@ -853,7 +862,10 @@ export class ProbeAgent { if (this.enableDelegate && wrappedTools.delegateToolInstance && isToolAllowed('delegate')) { this.toolImplementations.delegate = wrappedTools.delegateToolInstance; } - if (wrappedTools.analyzeAllToolInstance && isToolAllowed('analyze_all')) { + if (this.enableExecutePlan && wrappedTools.executePlanToolInstance && isToolAllowed('execute_plan')) { + this.toolImplementations.execute_plan = wrappedTools.executePlanToolInstance; + } else if (wrappedTools.analyzeAllToolInstance && isToolAllowed('analyze_all')) { + // analyze_all is fallback when execute_plan is not enabled this.toolImplementations.analyze_all = wrappedTools.analyzeAllToolInstance; } @@ -2554,8 +2566,18 @@ ${extractGuidance} toolDefinitions += `${delegateToolDefinition}\n`; } - // Analyze All tool for bulk data processing - if (isToolAllowed('analyze_all')) { + // Execute Plan tool for DSL-based orchestration (requires enableExecutePlan flag, supersedes analyze_all) + if (this.enableExecutePlan && isToolAllowed('execute_plan')) { + // Build available function list based on what tools are registered + const dslFunctions = ['LLM', 'map', 'chunk', 'batch', 'log', 'range', 'flatten', 'unique', 'groupBy', 'parseJSON', 'storeSet', 'storeGet', 'storeAppend', 'storeKeys', 'storeGetAll', 'output']; + if (isToolAllowed('search')) dslFunctions.unshift('search'); + if (isToolAllowed('query')) dslFunctions.unshift('query'); + if (isToolAllowed('extract')) dslFunctions.unshift('extract'); + if (isToolAllowed('listFiles')) dslFunctions.push('listFiles'); + if (this.enableBash && isToolAllowed('bash')) dslFunctions.push('bash'); + toolDefinitions += `${getExecutePlanToolDefinition(dslFunctions)}\n`; + } else if (isToolAllowed('analyze_all')) { + // Fallback: only register analyze_all if execute_plan is not available toolDefinitions += `${analyzeAllToolDefinition}\n`; } @@ -2631,7 +2653,9 @@ The configuration is loaded from src/config.js lines 15-25 which contains the da if (this.enableDelegate && isToolAllowed('delegate')) { availableToolsList += '- delegate: Delegate big distinct tasks to specialized probe subagents.\n'; } - if (isToolAllowed('analyze_all')) { + if (this.enableExecutePlan && isToolAllowed('execute_plan')) { + availableToolsList += '- execute_plan: Execute a DSL program to orchestrate tool calls. ALWAYS use this for: questions containing "all"/"every"/"comprehensive"/"complete inventory", multi-topic analysis, open-ended discovery questions, or any task requiring full codebase coverage.\n'; + } else if (isToolAllowed('analyze_all')) { availableToolsList += '- analyze_all: Process ALL data matching a query using map-reduce (for aggregate questions needing 100% coverage).\n'; } if (this.enableBash && isToolAllowed('bash')) { @@ -2861,6 +2885,11 @@ Follow these instructions carefully: // Track initial history length for storage const oldHistoryLength = this.history.length; + // Reset output buffer for this answer() call + if (this._outputBuffer) { + this._outputBuffer.items = []; + } + // START CHECKPOINT: Initialize task management for this request if (this.enableTasks) { try { @@ -3368,8 +3397,10 @@ Follow these instructions carefully: if (this.enableDelegate && this.allowedTools.isEnabled('delegate')) { validTools.push('delegate'); } - // Analyze All tool (for bulk data processing with map-reduce) - if (this.allowedTools.isEnabled('analyze_all')) { + // Execute Plan tool (requires enableExecutePlan flag, supersedes analyze_all) + if (this.enableExecutePlan && this.allowedTools.isEnabled('execute_plan')) { + validTools.push('execute_plan'); + } else if (this.allowedTools.isEnabled('analyze_all')) { validTools.push('analyze_all'); } // Task tool (require both enableTasks flag AND allowedTools permission) @@ -4594,6 +4625,19 @@ Convert your previous response content into actual JSON data that follows this s } } + // Append DSL output buffer directly to response (bypasses LLM rewriting) + if (this._outputBuffer && this._outputBuffer.items.length > 0 && !options._schemaFormatted) { + const outputContent = this._outputBuffer.items.join('\n\n'); + finalResult = (finalResult || '') + '\n\n' + outputContent; + if (options.onStream) { + options.onStream('\n\n' + outputContent); + } + if (this.debug) { + console.log(`[DEBUG] Appended ${this._outputBuffer.items.length} output buffer items (${outputContent.length} chars) to final result`); + } + this._outputBuffer.items = []; + } + return finalResult; } catch (error) { @@ -4756,6 +4800,7 @@ Convert your previous response content into actual JSON data that follows this s promptType: this.promptType, allowEdit: this.allowEdit, enableDelegate: this.enableDelegate, + enableExecutePlan: this.enableExecutePlan, architectureFileName: this.architectureFileName, // Pass allowedFolders which will recompute workspaceRoot correctly allowedFolders: [...this.allowedFolders], diff --git a/npm/src/agent/dsl/agent-test.mjs b/npm/src/agent/dsl/agent-test.mjs new file mode 100644 index 00000000..0551ca82 --- /dev/null +++ b/npm/src/agent/dsl/agent-test.mjs @@ -0,0 +1,341 @@ +#!/usr/bin/env node +/** + * Agent-realistic test: the LLM writes DSL scripts itself. + * + * This simulates the real production flow: + * 1. We give the LLM a task + the tool definition (system prompt) + * 2. The LLM generates the DSL script + * 3. The runtime validates, transforms, and executes it + * 4. The result comes back + * + * Usage: + * node npm/src/agent/dsl/agent-test.mjs + */ + +import { createDSLRuntime } from './runtime.js'; +import { getExecutePlanToolDefinition } from '../../tools/executePlan.js'; +import { search } from '../../search.js'; +import { extract } from '../../extract.js'; +import { createGoogleGenerativeAI } from '@ai-sdk/google'; +import { generateText } from 'ai'; +import { config } from 'dotenv'; +import { resolve, dirname } from 'path'; +import { fileURLToPath } from 'url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const projectRoot = resolve(__dirname, '../../../..'); + +config({ path: resolve(projectRoot, '.env') }); + +const apiKey = process.env.GOOGLE_GENERATIVE_AI_API_KEY || process.env.GOOGLE_API_KEY; +if (!apiKey) { + console.error('ERROR: No Google API key found.'); + process.exit(1); +} + +const google = createGoogleGenerativeAI({ apiKey }); + +async function llmCall(instruction, data, options = {}) { + const dataStr = data == null ? '' : (typeof data === 'string' ? data : JSON.stringify(data, null, 2)); + const prompt = (dataStr || '(empty)').substring(0, 100000); + const result = await generateText({ + model: google('gemini-2.5-flash'), + system: instruction, + prompt, + temperature: options.temperature || 0.3, + maxTokens: options.maxTokens || 4000, + }); + return result.text; +} + +// For generating DSL scripts (the "agent" role) +async function agentGenerate(systemPrompt, userTask) { + const result = await generateText({ + model: google('gemini-2.5-flash'), + system: systemPrompt, + prompt: userTask, + temperature: 0.3, + maxTokens: 4000, + }); + return result.text; +} + +const cwd = projectRoot; + +const toolImplementations = { + search: { + execute: async (params) => { + try { + return await search({ + query: params.query, + path: params.path || cwd, + cwd, + maxTokens: 20000, + timeout: 30, + exact: params.exact || false, + }); + } catch (e) { + return "Search error: " + e.message; + } + }, + }, + extract: { + execute: async (params) => { + try { + return await extract({ + targets: params.targets, + input_content: params.input_content, + cwd, + }); + } catch (e) { + return "Extract error: " + e.message; + } + }, + }, + listFiles: { + execute: async (params) => { + try { + return await search({ + query: params.pattern || '*', + path: cwd, + cwd, + filesOnly: true, + maxTokens: 10000, + }); + } catch (e) { + return "listFiles error: " + e.message; + } + }, + }, +}; + +const runtime = createDSLRuntime({ + toolImplementations, + llmCall, + mapConcurrency: 3, + timeoutMs: 60000, // 60s timeout per execution + maxLoopIterations: 5000, // loop guard +}); + +/** + * Strip markdown fences and XML tags that LLMs sometimes wrap code in. + */ +function stripCodeWrapping(code) { + let s = String(code || ''); + s = s.replace(/^```(?:javascript|js)?\n?/gm, '').replace(/```$/gm, ''); + s = s.replace(/<\/?(?:execute_plan|code)>/g, ''); + return s.trim(); +} + +// The tool definition that goes into the agent's system prompt +const toolDef = getExecutePlanToolDefinition(['search', 'extract', 'LLM', 'map', 'chunk', 'listFiles', 'log', 'range', 'flatten', 'unique', 'groupBy']); + +const SYSTEM_PROMPT = `You are a coding assistant with access to the execute_plan tool. + +${toolDef} + +When the user asks a question that requires searching a codebase, batch processing, or handling large data, +write a DSL script to handle it. Return ONLY the JavaScript code — no markdown fences, no explanation, +no \`\`\` blocks. Just the raw code that goes into the execute_plan tool. + +CRITICAL RULES: +- Do NOT use async/await — the runtime handles it. +- Do NOT use template literals (backticks) — use string concatenation with +. +- Do NOT use shorthand properties like { key } — use { key: key }. +- search() returns a STRING, not an array. Use chunk() to split it into an array. +- map(items, fn) requires an ARRAY as first argument. Do NOT pass a string to map(). +- Do NOT use .map(), .forEach(), .filter(), .join() array methods. Use for..of loops or the global map() function. +- To join an array, use a for..of loop: var s = ""; for (const item of arr) { s = s + item + "\\n"; } +- Do NOT define helper functions that call tools. Write all logic inline or use for..of loops. +- Use String(value) to safely convert to string before calling .trim() or .split(). +- Do NOT use regex literals (/pattern/) — use String methods like indexOf, includes, startsWith instead. +- ONLY call functions listed in the tool definition. Do NOT invent or guess function names. +- ALWAYS write executable DSL code, never answer in plain text. +- Always return a value at the end.`; + +// ── Test runner ── +let testNum = 0; +let passed = 0; +let failed = 0; + +const MAX_RETRIES = 2; + +async function runAgentTest(taskDescription, check) { + testNum++; + console.log(`\n${'─'.repeat(70)}`); + console.log(`▶ Test ${testNum}: ${taskDescription}`); + + const start = Date.now(); + + try { + // Step 1: Agent generates the DSL script + console.log(' [1/4] Agent generating DSL script...'); + const generatedCode = await agentGenerate(SYSTEM_PROMPT, taskDescription); + let currentCode = stripCodeWrapping(generatedCode); + console.log(` Generated (${currentCode.split('\n').length} lines):`); + const preview = currentCode.split('\n').slice(0, 6).map(l => ' ' + l).join('\n'); + console.log(preview); + if (currentCode.split('\n').length > 6) console.log(' ...'); + + // Step 2: Execute with self-healing retries + let result; + let attempt = 0; + + while (attempt <= MAX_RETRIES) { + console.log(` [2/4] Executing DSL script${attempt > 0 ? ' (retry ' + attempt + ')' : ''}...`); + result = await runtime.execute(currentCode, taskDescription); + + if (result.status === 'success') break; + + // Execution failed — try self-healing + const logOutput = result.logs.length > 0 ? '\nLogs: ' + result.logs.join(' | ') : ''; + const errorMsg = result.error + logOutput; + console.log(` [!] Execution failed: ${errorMsg.substring(0, 150)}`); + + if (attempt >= MAX_RETRIES) break; + + console.log(` [3/4] Self-healing — asking LLM to fix (attempt ${attempt + 1})...`); + const fixPrompt = `The following DSL script failed with an error. Fix the script and return ONLY the corrected JavaScript code — no markdown, no explanation, no backtick fences. + +ORIGINAL SCRIPT: +${currentCode} + +ERROR: +${errorMsg} + +RULES REMINDER: +- search(), listFiles(), extract() all return STRINGS, not arrays. +- Use chunk(stringData) to split a string into an array of chunks. +- map(items, fn) requires an ARRAY as first argument. Do NOT pass strings to map(). +- Do NOT use .map(), .forEach(), .filter(), .join() — use for..of loops instead. +- Do NOT define helper functions that call tools — write logic inline. +- Do NOT use async/await, template literals, or shorthand properties. +- Do NOT use regex literals (/pattern/) — use String methods like indexOf, includes, startsWith instead. +- String concatenation with +, not template literals.`; + + const fixedCode = await llmCall(fixPrompt, '', { maxTokens: 4000, temperature: 0.2 }); + currentCode = stripCodeWrapping(fixedCode); + + if (!currentCode) { + console.log(' [!] Self-heal returned empty code'); + break; + } + + console.log(` Fixed code (${currentCode.split('\n').length} lines):`); + const fixPreview = currentCode.split('\n').slice(0, 4).map(l => ' ' + l).join('\n'); + console.log(fixPreview); + if (currentCode.split('\n').length > 4) console.log(' ...'); + + attempt++; + } + + const elapsed = Date.now() - start; + console.log(` [4/4] Checking result... (${elapsed}ms)`); + + if (result.status === 'error') { + console.log(` ✗ EXECUTION ERROR after ${attempt} retries (${elapsed}ms)`); + console.log(` Error: ${result.error.substring(0, 200)}`); + if (result.logs.length) console.log(` Logs: ${result.logs.join(' | ')}`); + failed++; + return; + } + + const checkResult = check(result); + if (checkResult === true || checkResult === undefined) { + const healNote = attempt > 0 ? ` (self-healed after ${attempt} ${attempt === 1 ? 'retry' : 'retries'})` : ''; + console.log(` ✓ PASSED${healNote} (${elapsed}ms)`); + const resultPreview = typeof result.result === 'string' + ? result.result.substring(0, 300) + : JSON.stringify(result.result, null, 2).substring(0, 300); + console.log(` Result: ${resultPreview}${resultPreview.length >= 300 ? '...' : ''}`); + if (result.logs && result.logs.filter(l => !l.startsWith('[runtime]')).length) { + console.log(` Logs: ${result.logs.filter(l => !l.startsWith('[runtime]')).join(' | ')}`); + } + passed++; + } else { + console.log(` ✗ CHECK FAILED (${elapsed}ms) — ${checkResult}`); + failed++; + } + } catch (e) { + console.log(` ✗ CRASHED — ${e.message}`); + failed++; + } +} + +// ── Agent tests ── +async function main() { + console.log('═'.repeat(70)); + console.log(' Agent-Realistic DSL Tests — LLM writes its own scripts'); + console.log('═'.repeat(70)); + + // Test 1: Simple search + summarize + await runAgentTest( + 'Search this codebase for how error handling is done and give me a brief summary.', + (r) => { + if (typeof r.result !== 'string') return 'Expected string result'; + if (r.result.length < 50) return 'Summary too short'; + return true; + } + ); + + // Test 2: Find and count patterns + await runAgentTest( + 'Write a DSL script to search this codebase for tool definitions (search, extract, query, etc.). Count how many unique tools are defined and return an object with the count and an array of tool names.', + (r) => { + if (!r.result) return 'No result'; + return true; + } + ); + + // Test 3: Multi-file analysis + await runAgentTest( + 'Look at the files in npm/src/agent/dsl/ directory — search for each one, and for each file give me a one-sentence description of what it does. Return as a list.', + (r) => { + if (!r.result) return 'No result'; + const s = typeof r.result === 'string' ? r.result : JSON.stringify(r.result); + if (s.length < 50) return 'Result too short'; + return true; + } + ); + + // Test 4: Code quality check + await runAgentTest( + 'Search for all TODO and FIXME comments in this codebase. Group them by urgency (TODO vs FIXME) and summarize what needs attention.', + (r) => { + if (!r.result) return 'No result'; + return true; + } + ); + + // Test 5: Complex analysis requiring chunking + await runAgentTest( + 'Analyze the test coverage of this project. Search for test files, see what modules they test, and identify any modules that might be missing tests. Give me a brief report.', + (r) => { + if (!r.result) return 'No result'; + const s = typeof r.result === 'string' ? r.result : JSON.stringify(r.result); + if (s.length < 50) return 'Report too short'; + return true; + } + ); + + // Test 6: Data extraction + classification + await runAgentTest( + 'Find all the Zod schemas defined in this codebase (search for "z.object"). For each schema, extract its name and list its fields. Return a structured summary.', + (r) => { + if (!r.result) return 'No result'; + return true; + } + ); + + // ── Summary ── + console.log(`\n${'═'.repeat(70)}`); + console.log(` Agent-Realistic Results: ${passed} passed, ${failed} failed, ${testNum} total`); + console.log('═'.repeat(70)); + + process.exit(failed > 0 ? 1 : 0); +} + +main().catch(e => { + console.error('Fatal error:', e); + process.exit(1); +}); diff --git a/npm/src/agent/dsl/analyze-test.mjs b/npm/src/agent/dsl/analyze-test.mjs new file mode 100644 index 00000000..8f864ac2 --- /dev/null +++ b/npm/src/agent/dsl/analyze-test.mjs @@ -0,0 +1,237 @@ +#!/usr/bin/env node +/** + * Real-world test of the analyze_all replacement pattern. + * + * Tests against the TykTechnologies/customer-insights repo (582 markdown files, 16MB) + * to verify the search → chunk → map(LLM) → synthesize pipeline works at scale. + * + * Usage: + * node npm/src/agent/dsl/analyze-test.mjs + */ + +import { createDSLRuntime } from './runtime.js'; +import { search } from '../../search.js'; +import { extract } from '../../extract.js'; +import { createGoogleGenerativeAI } from '@ai-sdk/google'; +import { generateText } from 'ai'; +import { config } from 'dotenv'; +import { resolve, dirname } from 'path'; +import { fileURLToPath } from 'url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const projectRoot = resolve(__dirname, '../../../..'); + +config({ path: resolve(projectRoot, '.env') }); + +const apiKey = process.env.GOOGLE_GENERATIVE_AI_API_KEY || process.env.GOOGLE_API_KEY; +if (!apiKey) { + console.error('ERROR: No Google API key found.'); + process.exit(1); +} + +const google = createGoogleGenerativeAI({ apiKey }); + +async function llmCall(instruction, data, options = {}) { + const dataStr = data == null ? '' : (typeof data === 'string' ? data : JSON.stringify(data, null, 2)); + const prompt = (dataStr || '(empty)').substring(0, 100000); + const result = await generateText({ + model: google('gemini-2.5-flash'), + system: instruction, + prompt, + temperature: options.temperature || 0.3, + maxTokens: options.maxTokens || 4000, + }); + return result.text; +} + +const TARGET_REPO = '/tmp/customer-insights'; + +const toolImplementations = { + search: { + execute: async (params) => { + try { + return await search({ + query: params.query, + path: params.path || TARGET_REPO, + cwd: TARGET_REPO, + maxTokens: 20000, + timeout: 30, + exact: params.exact || false, + }); + } catch (e) { + return "Search error: " + e.message; + } + }, + }, + extract: { + execute: async (params) => { + try { + return await extract({ + targets: params.targets, + input_content: params.input_content, + cwd: TARGET_REPO, + }); + } catch (e) { + return "Extract error: " + e.message; + } + }, + }, +}; + +const runtime = createDSLRuntime({ + toolImplementations, + llmCall, + mapConcurrency: 3, + timeoutMs: 120000, + maxLoopIterations: 5000, +}); + +// ── Tests ── +let testNum = 0; +let passed = 0; +let failed = 0; + +async function runTest(name, code, check) { + testNum++; + console.log(`\n${'─'.repeat(70)}`); + console.log(`▶ Test ${testNum}: ${name}`); + console.log(` Code (${code.trim().split('\n').length} lines):`); + const preview = code.trim().split('\n').slice(0, 8).map(l => ' ' + l.trim()).join('\n'); + console.log(preview); + if (code.trim().split('\n').length > 8) console.log(' ...'); + + const start = Date.now(); + try { + const result = await runtime.execute(code, name); + const elapsed = Date.now() - start; + + if (result.status === 'error') { + console.log(` ✗ EXECUTION ERROR (${elapsed}ms)`); + console.log(` Error: ${result.error.substring(0, 300)}`); + if (result.logs.length) console.log(` Logs: ${result.logs.join(' | ')}`); + failed++; + return; + } + + const userLogs = result.logs.filter(l => !l.startsWith('[runtime]')); + if (userLogs.length) { + console.log(` Logs: ${userLogs.join(' | ')}`); + } + + const checkResult = check(result); + if (checkResult === true) { + console.log(` ✓ PASSED (${elapsed}ms)`); + const resultStr = typeof result.result === 'string' + ? result.result.substring(0, 500) + : JSON.stringify(result.result, null, 2).substring(0, 500); + console.log(` Result: ${resultStr}${resultStr.length >= 500 ? '...' : ''}`); + passed++; + } else { + console.log(` ✗ CHECK FAILED (${elapsed}ms) — ${checkResult}`); + failed++; + } + } catch (e) { + const elapsed = Date.now() - start; + console.log(` ✗ CRASHED (${elapsed}ms) — ${e.message}`); + failed++; + } +} + +async function main() { + console.log('═'.repeat(70)); + console.log(' analyze_all Replacement — Real-World Tests'); + console.log(' Target: TykTechnologies/customer-insights (582 .md files, 16MB)'); + console.log('═'.repeat(70)); + + // Test 1: Core analyze_all pattern — search → chunk → map(LLM) → synthesize + await runTest( + 'analyze_all pattern: "api governance"', + ` + const results = search("api governance"); + log("Search returned " + String(results).length + " chars"); + const chunks = chunk(results); + log("Split into " + chunks.length + " chunks"); + const extracted = map(chunks, (c) => LLM("List every mention of API governance — who uses it, what for, any specific policies or tools mentioned. Be brief and factual.", c)); + var combined = ""; + for (const e of extracted) { combined = combined + String(e) + "\\n---\\n"; } + return LLM("Synthesize into a comprehensive report about API governance across all customers. Group by: 1) Customers using API governance, 2) Governance tools/approaches, 3) Common patterns. Be thorough.", combined); + `, + (r) => { + if (typeof r.result !== 'string') return 'Expected string result'; + if (r.result.length < 100) return 'Result too short: ' + r.result.length; + return true; + } + ); + + // Test 2: Multi-topic search — governance + rate limiting + security + await runTest( + 'Multi-topic: governance, rate limiting, security policies', + ` + const topics = ["api governance", "rate limiting", "security policy"]; + const allFindings = []; + for (const topic of topics) { + const results = search(topic); + log(topic + ": " + String(results).length + " chars"); + const chunks = chunk(results); + const findings = map(chunks, (c) => LLM("Extract key findings about " + topic + ". Include customer names and specifics. Be brief.", c)); + for (const f of findings) { allFindings.push(topic + ": " + String(f)); } + } + var combined = ""; + for (const f of allFindings) { combined = combined + f + "\\n---\\n"; } + return LLM("Create a cross-topic analysis: How do customers approach API governance, rate limiting, and security together? What patterns emerge?", combined); + `, + (r) => { + if (typeof r.result !== 'string') return 'Expected string result'; + if (r.result.length < 100) return 'Result too short'; + return true; + } + ); + + // Test 3: Extract specific data points + await runTest( + 'Extract customer use cases for API management', + ` + const results = search("use case API management"); + log("Search: " + String(results).length + " chars"); + const chunks = chunk(results); + log("Chunks: " + chunks.length); + const extracted = map(chunks, (c) => LLM("Extract a JSON array of objects with fields: customer (string), use_case (string), outcome (string or null). Only include clearly stated use cases. Return valid JSON array only.", c)); + var allUseCases = []; + for (const e of extracted) { + try { + var text = String(e).trim(); + var jsonStart = text.indexOf("["); + var jsonEnd = text.lastIndexOf("]"); + if (jsonStart >= 0 && jsonEnd > jsonStart) { + text = text.substring(jsonStart, jsonEnd + 1); + } + var parsed = JSON.parse(text); + if (Array.isArray(parsed)) { + for (const item of parsed) { allUseCases.push(item); } + } + } catch (err) { + log("Parse failed for chunk, skipping"); + } + } + log("Total use cases found: " + allUseCases.length); + return allUseCases; + `, + (r) => { + if (!Array.isArray(r.result)) return 'Expected array result'; + if (r.result.length === 0) return 'No use cases extracted'; + return true; + } + ); + + // ── Summary ── + console.log(`\n${'═'.repeat(70)}`); + console.log(` Results: ${passed} passed, ${failed} failed, ${testNum} total`); + console.log('═'.repeat(70)); + + process.exit(failed > 0 ? 1 : 0); +} + +main().catch(e => { + console.error('Fatal error:', e); + process.exit(1); +}); diff --git a/npm/src/agent/dsl/diag-test.mjs b/npm/src/agent/dsl/diag-test.mjs new file mode 100644 index 00000000..8a16783f --- /dev/null +++ b/npm/src/agent/dsl/diag-test.mjs @@ -0,0 +1,78 @@ +#!/usr/bin/env node +/** + * Diagnostic test — traces exactly what execute_plan returns through ProbeAgent. + */ + +import { ProbeAgent } from '../ProbeAgent.js'; +import { config } from 'dotenv'; +import { resolve, dirname } from 'path'; +import { fileURLToPath } from 'url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const projectRoot = resolve(__dirname, '../../../..'); + +config({ path: resolve(projectRoot, '.env') }); + +const apiKey = process.env.GOOGLE_GENERATIVE_AI_API_KEY || process.env.GOOGLE_API_KEY; +if (!apiKey) { + console.error('ERROR: No Google API key found'); + process.exit(1); +} + +const agent = new ProbeAgent({ + path: '/tmp/customer-insights', + provider: 'google', + model: 'gemini-2.5-flash', + enableExecutePlan: true, + maxIterations: 15, +}); + +let callNum = 0; + +agent.events.on('toolCall', (event) => { + if (event.status === 'started') { + if (event.name === 'execute_plan') { + callNum++; + console.log(`\n>>> EXECUTE_PLAN #${callNum} START`); + console.log(`>>> CODE:\n${String(event.args?.code || '').substring(0, 1200)}`); + if (String(event.args?.code || '').length > 1200) console.log('>>> ... (truncated)'); + } + } + if (event.status === 'error') { + console.log(`>>> TOOL ERROR: ${event.name}: ${event.error}`); + } +}); + +await agent.initialize(); + +// Monkey-patch to see full results +const origExecute = agent.toolImplementations.execute_plan.execute; +agent.toolImplementations.execute_plan.execute = async (params) => { + const result = await origExecute(params); + const resultStr = typeof result === 'string' ? result : JSON.stringify(result); + console.log(`\n>>> EXECUTE_PLAN #${callNum} RETURNED (${resultStr.length} chars):`); + console.log(`>>> ${resultStr.substring(0, 500)}`); + if (resultStr.length > 500) console.log(`>>> ... (${resultStr.length - 500} more chars)`); + return result; +}; + +const query = 'Analyze ALL customer files in this repository. For every customer, classify them by industry. Produce a markdown table with columns: Customer, Industry, Use Case.'; + +console.log(`\nQUERY: ${query}\n`); + +try { + const result = await Promise.race([ + agent.answer(query), + new Promise((_, reject) => setTimeout(() => reject(new Error('Timeout 600s')), 600000)), + ]); + + console.log(`\n${'='.repeat(60)}`); + console.log(`FINAL RESULT (${String(result).length} chars):`); + console.log(String(result).substring(0, 2000)); + console.log(`${'='.repeat(60)}`); +} catch (e) { + console.log(`\nFAILED: ${e.message}`); +} + +try { await agent.close(); } catch (e) {} +process.exit(0); diff --git a/npm/src/agent/dsl/environment.js b/npm/src/agent/dsl/environment.js new file mode 100644 index 00000000..8799a39e --- /dev/null +++ b/npm/src/agent/dsl/environment.js @@ -0,0 +1,387 @@ +/** + * Tool Environment Generator + * + * Reads Zod schemas (native tools) and MCP tool schemas to generate: + * 1. Sandbox globals object (function bindings that bridge to real tools) + * 2. Set of async function names (for the AST transformer) + */ + +import { + searchSchema, + querySchema, + extractSchema, + bashSchema, +} from '../../tools/common.js'; + +// Map of native tool names to their Zod schemas +const NATIVE_TOOL_SCHEMAS = { + search: searchSchema, + query: querySchema, + extract: extractSchema, + bash: bashSchema, +}; + +// Tools that are inherently async (make network/LLM calls) +const ALWAYS_ASYNC = new Set([ + 'search', 'query', 'extract', 'listFiles', 'searchFiles', 'bash', + 'LLM', 'map', +]); + +/** + * Generate the set of async function names from native tools and MCP tools. + * + * @param {Object} [mcpTools={}] - MCP tools keyed by name + * @returns {Set} Names of all async functions available in the DSL + */ +export function getAsyncFunctionNames(mcpTools = {}) { + const names = new Set(ALWAYS_ASYNC); + // All MCP tools are async + for (const name of Object.keys(mcpTools)) { + names.add(name); + } + return names; +} + +/** + * Wrap a tool function with OTEL tracing and error-safe return. + * On error, returns "ERROR: " instead of throwing — SandboxJS + * has unreliable try/catch for async errors, so tools never throw. + * + * @param {string} toolName - Name of the tool for the span + * @param {Function} fn - The async tool function to wrap + * @param {Object|null} tracer - SimpleAppTracer instance (or null) + * @param {Function} logFn - Function to write to execution logs + * @returns {Function} Wrapped function + */ +function traceToolCall(toolName, fn, tracer, logFn) { + if (!tracer) { + return async (...args) => { + try { + return await fn(...args); + } catch (e) { + const msg = 'ERROR: ' + (e.message || String(e)); + logFn?.('[' + toolName + '] ' + msg); + return msg; + } + }; + } + + return async (...args) => { + const span = tracer.createToolSpan?.(`dsl.${toolName}`, { + 'dsl.tool': toolName, + 'dsl.params': JSON.stringify(args).substring(0, 500), + }); + + const startTime = Date.now(); + try { + const result = await fn(...args); + const elapsed = Date.now() - startTime; + + const resultStr = typeof result === 'string' ? result : JSON.stringify(result); + span?.setAttributes?.({ + 'dsl.tool.duration_ms': elapsed, + 'dsl.tool.result_length': resultStr?.length || 0, + 'dsl.tool.success': true, + }); + span?.setStatus?.('OK'); + span?.end?.(); + + tracer.recordToolResult?.( + `dsl.${toolName}`, result, true, elapsed, + { 'dsl.context': 'execute_plan' } + ); + + return result; + } catch (e) { + const elapsed = Date.now() - startTime; + span?.setAttributes?.({ + 'dsl.tool.duration_ms': elapsed, + 'dsl.tool.success': false, + 'dsl.tool.error': e.message?.substring(0, 500), + }); + span?.setStatus?.('ERROR'); + span?.addEvent?.('exception', { + 'exception.message': e.message, + }); + span?.end?.(); + + tracer.recordToolResult?.( + `dsl.${toolName}`, e.message, false, elapsed, + { 'dsl.context': 'execute_plan' } + ); + + const msg = 'ERROR: ' + (e.message || String(e)); + logFn?.('[' + toolName + '] ' + msg); + return msg; + } + }; +} + +/** + * Generate sandbox globals that bridge DSL function calls to real tool implementations. + * + * @param {Object} options + * @param {Object} options.toolImplementations - Native tool execute functions keyed by name + * @param {Object} [options.mcpBridge] - MCP bridge with callTool method + * @param {Object} [options.mcpTools={}] - MCP tools metadata keyed by name + * @param {Function} options.llmCall - Function to make focused LLM calls: (instruction, data, options?) => Promise + * @param {number} [options.mapConcurrency=3] - Max concurrent operations in map() + * @param {Object} [options.tracer=null] - SimpleAppTracer for OTEL tracing + * @returns {Object} Globals object to pass to SandboxJS + */ +export function generateSandboxGlobals(options) { + const { + toolImplementations = {}, + mcpBridge = null, + mcpTools = {}, + llmCall, + mapConcurrency = 3, + tracer = null, + sessionStore = {}, + outputBuffer = null, + } = options; + + const globals = {}; + + // Log function — writes to the execution logs array (set by runtime before each execute()) + const logFn = (msg) => { if (globals._logs) globals._logs.push(String(msg)); }; + + // Bridge native tools + for (const [name, schema] of Object.entries(NATIVE_TOOL_SCHEMAS)) { + if (!toolImplementations[name]) continue; + + const rawFn = async (...args) => { + // Support both (params) and (arg1, arg2) calling conventions + let params; + if (args.length === 1 && typeof args[0] === 'object' && args[0] !== null && !Array.isArray(args[0])) { + params = args[0]; + } else { + // Map positional args to schema keys + const keys = Object.keys(schema.shape); + params = {}; + args.forEach((arg, i) => { + if (i < keys.length) params[keys[i]] = arg; + }); + } + + const validated = schema.safeParse(params); + if (!validated.success) { + throw new Error(`Invalid parameters for ${name}: ${validated.error.message}`); + } + return toolImplementations[name].execute(validated.data); + }; + + globals[name] = traceToolCall(name, rawFn, tracer, logFn); + } + + // Bridge listFiles and searchFiles (no Zod schema, simpler interface) + if (toolImplementations.listFiles) { + const rawListFiles = async (pattern) => { + return toolImplementations.listFiles.execute({ pattern }); + }; + globals.listFiles = traceToolCall('listFiles', rawListFiles, tracer, logFn); + } + if (toolImplementations.searchFiles) { + const rawSearchFiles = async (query) => { + return toolImplementations.searchFiles.execute({ query }); + }; + globals.searchFiles = traceToolCall('searchFiles', rawSearchFiles, tracer, logFn); + } + + // Bridge MCP tools + if (mcpBridge) { + for (const [name, tool] of Object.entries(mcpTools)) { + const rawMcpFn = async (params = {}) => { + return mcpBridge.callTool(name, params); + }; + globals[name] = traceToolCall(name, rawMcpFn, tracer, logFn); + } + } + + // LLM() built-in — delegate already has its own OTEL, but we add a DSL-level span + if (llmCall) { + const rawLLM = async (instruction, data, opts = {}) => { + return llmCall(instruction, data, opts); + }; + globals.LLM = traceToolCall('LLM', rawLLM, tracer, logFn); + } + + // map() with concurrency control + const rawMap = async (items, fn) => { + if (!Array.isArray(items)) { + throw new Error('map() first argument must be an array'); + } + const results = []; + const executing = new Set(); + + for (const item of items) { + const p = Promise.resolve(fn(item)).then(result => { + executing.delete(p); + return result; + }); + executing.add(p); + results.push(p); + + if (executing.size >= mapConcurrency) { + await Promise.race(executing); + } + } + + return Promise.all(results); + }; + globals.map = traceToolCall('map', rawMap, tracer, logFn); + + // chunk() - split data into token-sized chunks + globals.chunk = (data, tokens = 20000) => { + const CHARS_PER_TOKEN = 4; + const chunkSizeChars = tokens * CHARS_PER_TOKEN; + const text = typeof data === 'string' ? data : JSON.stringify(data); + + // Split by file blocks (``` markers) to avoid breaking mid-block + const fileBlocks = text.split(/(?=^```)/m); + const chunks = []; + let current = ''; + + for (const block of fileBlocks) { + const blockSize = block.length; + + // If a single block exceeds chunk size and we have accumulated content, flush first + if (blockSize > chunkSizeChars && current.length > 0) { + chunks.push(current.trim()); + current = ''; + } + + // If a single block exceeds chunk size, split it by character boundary + if (blockSize > chunkSizeChars) { + for (let i = 0; i < blockSize; i += chunkSizeChars) { + const slice = block.slice(i, i + chunkSizeChars); + if (slice.trim().length > 0) { + chunks.push(slice.trim()); + } + } + continue; + } + + // If adding this block exceeds chunk size, flush + if (current.length + blockSize > chunkSizeChars && current.length > 0) { + chunks.push(current.trim()); + current = ''; + } + + current += block; + } + + if (current.trim().length > 0) { + chunks.push(current.trim()); + } + + return chunks; + }; + + // Utility functions (pure, no async) + globals.log = (message) => { + // Collected by the runtime for the execution log + if (globals._logs) globals._logs.push(String(message)); + }; + + globals.range = (start, end) => { + const result = []; + for (let i = start; i < end; i++) result.push(i); + return result; + }; + + globals.flatten = (arr) => { + if (!Array.isArray(arr)) return arr; + return arr.flat(1); + }; + + globals.unique = (arr) => { + if (!Array.isArray(arr)) return arr; + const seen = new Set(); + return arr.filter(item => { + const key = JSON.stringify(item); + if (seen.has(key)) return false; + seen.add(key); + return true; + }); + }; + + globals.batch = (arr, size) => { + if (!Array.isArray(arr)) return [arr]; + if (!size || size < 1) size = 10; + const batches = []; + for (let i = 0; i < arr.length; i += size) { + batches.push(arr.slice(i, i + size)); + } + return batches; + }; + + // parseJSON — safely parse JSON from LLM responses that may be wrapped in markdown fences + // Returns null on parse failure instead of throwing (SandboxJS try/catch is unreliable) + globals.parseJSON = (text) => { + try { + let s = String(text || '').trim(); + // Strip markdown code fences (```json ... ``` or ``` ... ```) + s = s.replace(/^```(?:json|javascript|js)?\s*\n?/i, '').replace(/\n?```\s*$/i, '').trim(); + // Try to find JSON array or object within the text + const arrayStart = s.indexOf('['); + const objectStart = s.indexOf('{'); + if (arrayStart >= 0 && (objectStart < 0 || arrayStart < objectStart)) { + const end = s.lastIndexOf(']'); + if (end > arrayStart) s = s.substring(arrayStart, end + 1); + } else if (objectStart >= 0) { + const end = s.lastIndexOf('}'); + if (end > objectStart) s = s.substring(objectStart, end + 1); + } + return JSON.parse(s); + } catch (e) { + logFn('[parseJSON] ERROR: ' + e.message); + return null; + } + }; + + globals.groupBy = (arr, key) => { + if (!Array.isArray(arr)) return {}; + const groups = {}; + for (const item of arr) { + const k = typeof key === 'function' ? key(item) : item[key]; + const groupKey = String(k); + if (!groups[groupKey]) groups[groupKey] = []; + groups[groupKey].push(item); + } + return groups; + }; + + // Session-scoped store — persists across execute_plan calls within the same agent session + globals.storeSet = (key, value) => { + if (typeof key !== 'string') throw new Error('storeSet: key must be a string'); + sessionStore[key] = value; + }; + + globals.storeGet = (key) => { + if (typeof key !== 'string') throw new Error('storeGet: key must be a string'); + return sessionStore[key]; + }; + + globals.storeAppend = (key, item) => { + if (typeof key !== 'string') throw new Error('storeAppend: key must be a string'); + if (!Array.isArray(sessionStore[key])) sessionStore[key] = []; + sessionStore[key].push(item); + }; + + globals.storeKeys = () => Object.keys(sessionStore); + + globals.storeGetAll = () => ({ ...sessionStore }); + + // output() — write content directly to user's response, bypassing LLM rewriting + if (outputBuffer) { + globals.output = (content) => { + if (content === undefined || content === null) return; + const str = typeof content === 'string' ? content : JSON.stringify(content, null, 2); + outputBuffer.items.push(str); + if (globals._logs) globals._logs.push('[output] ' + str.length + ' chars written to output buffer'); + }; + } + + return globals; +} diff --git a/npm/src/agent/dsl/manual-test.mjs b/npm/src/agent/dsl/manual-test.mjs new file mode 100644 index 00000000..a32d2595 --- /dev/null +++ b/npm/src/agent/dsl/manual-test.mjs @@ -0,0 +1,662 @@ +#!/usr/bin/env node +/** + * Manual test script for the DSL runtime with real tools. + * + * Usage: + * node npm/src/agent/dsl/manual-test.mjs + * + * Requires: GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI_API_KEY in .env or env + */ + +import { createDSLRuntime } from './runtime.js'; +import { search } from '../../search.js'; +import { extract } from '../../extract.js'; +import { createGoogleGenerativeAI } from '@ai-sdk/google'; +import { generateText } from 'ai'; +import { config } from 'dotenv'; +import { resolve, dirname } from 'path'; +import { fileURLToPath } from 'url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const projectRoot = resolve(__dirname, '../../../..'); + +// Load .env from project root +config({ path: resolve(projectRoot, '.env') }); + +const apiKey = process.env.GOOGLE_GENERATIVE_AI_API_KEY || process.env.GOOGLE_API_KEY; +if (!apiKey) { + console.error('ERROR: No Google API key found. Set GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI_API_KEY'); + process.exit(1); +} + +console.log('API key found, initializing...\n'); + +// Create Google provider +const google = createGoogleGenerativeAI({ apiKey }); + +// Create real LLM call function +async function llmCall(instruction, data, options = {}) { + const prompt = typeof data === 'string' ? data : JSON.stringify(data, null, 2); + const result = await generateText({ + model: google('gemini-2.5-flash'), + system: instruction, + prompt: prompt.substring(0, 100000), + temperature: options.temperature || 0.3, + maxTokens: options.maxTokens || 4000, + }); + return result.text; +} + +// The cwd for search operations +const cwd = projectRoot; + +// Create real tool implementations +const toolImplementations = { + search: { + execute: async (params) => { + try { + return await search({ + query: params.query, + path: params.path || cwd, + cwd, + maxTokens: 20000, + timeout: 30, + exact: params.exact || false, + }); + } catch (e) { + return `Search error: ${e.message}`; + } + }, + }, + extract: { + execute: async (params) => { + try { + return await extract({ + targets: params.targets, + input_content: params.input_content, + cwd, + }); + } catch (e) { + return `Extract error: ${e.message}`; + } + }, + }, + listFiles: { + execute: async (params) => { + try { + return await search({ + query: params.pattern || '*', + path: cwd, + cwd, + filesOnly: true, + maxTokens: 10000, + }); + } catch (e) { + return `listFiles error: ${e.message}`; + } + }, + }, +}; + +// Create the DSL runtime +const runtime = createDSLRuntime({ + toolImplementations, + llmCall, + mapConcurrency: 3, +}); + +// ── Test helpers ── +let testNum = 0; +let passed = 0; +let failed = 0; + +async function runTest(name, code, check) { + testNum++; + const label = `Test ${testNum}: ${name}`; + console.log(`\n${'─'.repeat(70)}`); + console.log(`▶ ${label}`); + const codePreview = code.trim().split('\n').map(l => l.trim()).filter(Boolean).join(' ').substring(0, 140); + console.log(` Code: ${codePreview}...`); + + const start = Date.now(); + try { + const result = await runtime.execute(code, name); + const elapsed = Date.now() - start; + + const checkResult = check(result); + if (checkResult === true || checkResult === undefined) { + console.log(` ✓ PASSED (${elapsed}ms)`); + if (result.status === 'error') { + console.log(` (Expected error: ${result.error.substring(0, 120)})`); + } else { + const preview = typeof result.result === 'string' + ? result.result.substring(0, 300) + : JSON.stringify(result.result, null, 2).substring(0, 300); + console.log(` Result preview: ${preview}${preview.length >= 300 ? '...' : ''}`); + } + if (result.logs && result.logs.filter(l => !l.startsWith('[runtime]')).length) { + console.log(` Logs: ${result.logs.filter(l => !l.startsWith('[runtime]')).join(' | ')}`); + } + passed++; + } else { + console.log(` ✗ FAILED (${elapsed}ms) — ${checkResult}`); + if (result.logs && result.logs.length) { + console.log(` Logs: ${result.logs.join(' | ')}`); + } + failed++; + } + } catch (e) { + console.log(` ✗ CRASHED — ${e.message}`); + console.log(` Stack: ${e.stack?.split('\n').slice(0, 3).join(' ')}`); + failed++; + } +} + +// ── Tests ── +async function main() { + console.log('═'.repeat(70)); + console.log(' DSL Runtime — Complex Manual Tests'); + console.log('═'.repeat(70)); + + // ──────────────────────────────────────────────── + // SECTION 1: Basic sanity + // ──────────────────────────────────────────────── + + await runTest( + 'Pure computation', + 'const x = [1,2,3,4,5]; return x.filter(n => n > 2).length;', + (r) => r.result === 3 || `Expected 3, got ${r.result}` + ); + + await runTest( + 'Validation: rejects eval()', + 'eval("console.log(1)");', + (r) => r.status === 'error' ? true : `Expected error, got success` + ); + + // ──────────────────────────────────────────────── + // SECTION 2: While loops & pagination simulation + // ──────────────────────────────────────────────── + + await runTest( + 'While loop: accumulate until condition', + ` + const pages = []; + let page = 0; + while (page < 5) { + pages.push({ page: page, items: range(page * 10, page * 10 + 10) }); + page = page + 1; + } + log("Collected " + pages.length + " pages"); + return pages.length; + `, + (r) => r.result === 5 || `Expected 5, got ${r.result}` + ); + + await runTest( + 'While loop with break: simulated pagination', + ` + const allItems = []; + let page = 1; + while (true) { + // Simulate a paginated API that returns 3 pages of data + const pageData = range((page - 1) * 5, page * 5); + const hasMore = page < 3; + for (const item of pageData) { + allItems.push(item); + } + log("Page " + page + ": " + pageData.length + " items, hasMore=" + hasMore); + if (!hasMore) break; + page = page + 1; + } + return allItems; + `, + (r) => { + if (!Array.isArray(r.result)) return `Expected array, got ${typeof r.result}`; + if (r.result.length !== 15) return `Expected 15 items, got ${r.result.length}`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 3: Try/catch error handling + // ──────────────────────────────────────────────── + + await runTest( + 'Try/catch: graceful error recovery', + ` + const results = []; + const queries = ["validateDSL", "thisQueryWillProbablyReturnNothing12345xyz"]; + for (const q of queries) { + try { + const r = search(q); + results.push({ query: q, found: true, length: r.length }); + } catch (e) { + results.push({ query: q, found: false, error: "failed" }); + } + } + return results; + `, + (r) => { + if (!Array.isArray(r.result)) return `Expected array, got ${typeof r.result}`; + if (r.result.length !== 2) return `Expected 2 results, got ${r.result.length}`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 4: Multi-search & data aggregation + // ──────────────────────────────────────────────── + + await runTest( + 'Multi-search: combine results from multiple queries', + ` + const queries = ["error handling", "validation", "timeout"]; + const searchResults = map(queries, (q) => { + const r = search(q); + return { query: q, resultLength: r.length }; + }); + log("Searched " + searchResults.length + " queries"); + const totalChars = searchResults.reduce((sum, r) => sum + r.resultLength, 0); + log("Total result chars: " + totalChars); + return { queries: searchResults, totalChars: totalChars }; + `, + (r) => { + if (!r.result.queries) return `Expected queries array`; + if (r.result.queries.length !== 3) return `Expected 3 query results`; + if (r.result.totalChars < 100) return `Expected substantial results`; + return true; + } + ); + + await runTest( + 'Search + extract: find code then extract specific files', + ` + const searchResult = search("transformDSL"); + // Extract the transformer file specifically + const code = extract({ targets: "npm/src/agent/dsl/transformer.js" }); + const summary = LLM( + "How many functions are exported from this file? List their names. Be very concise.", + code + ); + return summary; + `, + (r) => { + if (typeof r.result !== 'string') return `Expected string, got ${typeof r.result}`; + if (r.result.length < 10) return `Summary too short: ${r.result}`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 5: Complex data transformation + // ──────────────────────────────────────────────── + + await runTest( + 'Complex data pipeline: group, transform, aggregate', + ` + // Simulate analyzing a batch of items with different categories + const items = []; + for (let i = 0; i < 20; i = i + 1) { + const categories = ["bug", "feature", "docs", "refactor"]; + const priorities = ["high", "medium", "low"]; + items.push({ + id: i, + category: categories[i % 4], + priority: priorities[i % 3], + title: "Item " + i + }); + } + + // Group by category + const byCategory = groupBy(items, "category"); + + // Count per category + const categoryNames = ["bug", "feature", "docs", "refactor"]; + const counts = []; + for (const cat of categoryNames) { + const count = byCategory[cat] ? byCategory[cat].length : 0; + const highCount = byCategory[cat] + ? byCategory[cat].filter((item) => item.priority === "high").length + : 0; + counts.push({ category: cat, total: count, high: highCount }); + log(cat + ": " + count + " total, " + highCount + " high priority"); + } + + return { counts: counts, totalItems: items.length }; + `, + (r) => { + if (r.status === 'error') return `Execution error: ${r.error}`; + if (!r.result) return `Result is falsy: ${JSON.stringify(r)}`; + // Debug: show what we got + if (r.result.totalItems !== 20) return `Expected 20 total items, got type=${typeof r.result} value=${JSON.stringify(r.result).substring(0, 300)}`; + if (!Array.isArray(r.result.counts)) return `Expected counts array, got ${JSON.stringify(r.result).substring(0, 300)}`; + const bugs = r.result.counts.find((c) => c.category === 'bug'); + if (!bugs || bugs.total !== 5) return `Expected 5 bugs`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 6: Nested map() and LLM chaining + // ──────────────────────────────────────────────── + + await runTest( + 'Nested processing: search multiple topics, classify each result', + ` + const topics = ["error handling", "caching"]; + + // For each topic: search, then have LLM extract key patterns + const analysis = map(topics, (topic) => { + const results = search(topic); + const patterns = LLM( + "From this code, extract exactly 3 key patterns related to '" + topic + "'. " + + "Return a brief bullet list, one pattern per line.", + results + ); + return { topic: topic, patterns: patterns }; + }); + + log("Analyzed " + analysis.length + " topics"); + return analysis; + `, + (r) => { + if (r.status === 'error') return `Execution error: ${r.error}`; + if (!Array.isArray(r.result)) return `Expected array, got ${typeof r.result}`; + if (r.result.length !== 2) return `Expected 2 topics analyzed`; + // patterns is a string from LLM, not parsed + if (typeof r.result[0].topic !== 'string') return `Missing topic`; + if (typeof r.result[0].patterns !== 'string') return `Expected patterns to be string, got ${typeof r.result[0].patterns}`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 7: Real-world scenario — code review pipeline + // ──────────────────────────────────────────────── + + await runTest( + 'Code review pipeline: find, chunk, analyze, synthesize', + ` + // Step 1: Search for the validator module + const code = search("validateDSL ALLOWED_NODE_TYPES BLOCKED_IDENTIFIERS"); + + // Step 2: Chunk if needed + const codeChunks = chunk(code, 8000); + log("Code split into " + codeChunks.length + " chunks"); + + // Step 3: Analyze each chunk for issues + const reviews = map(codeChunks, (c) => LLM( + "You are a senior code reviewer. Analyze this code for potential issues: " + + "security concerns, edge cases, performance problems. " + + "Return a JSON object with: { issues: [{ severity: 'high'|'medium'|'low', description: string }] }. " + + "Return ONLY JSON.", + c + )); + + // Step 4: Synthesize + const synthesis = LLM( + "Combine these code review findings into a prioritized summary. " + + "Group by severity (high, medium, low). Be concise — max 5 bullet points total.", + reviews.join("\\n---\\n") + ); + + return synthesis; + `, + (r) => { + if (typeof r.result !== 'string') return `Expected string`; + if (r.result.length < 50) return `Review too short`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 8: Real-world — dependency analysis + // ──────────────────────────────────────────────── + + await runTest( + 'Dependency analysis: find imports across multiple files', + ` + // Search for all imports in the DSL module files + const files = ["validator.js", "transformer.js", "environment.js", "runtime.js"]; + const imports = map(files, (file) => { + const code = extract({ targets: "npm/src/agent/dsl/" + file }); + const analysis = LLM( + "List all import statements from this file. Return a JSON object: " + + "{ file: string, imports: [{ from: string, names: string[] }] }. Return ONLY JSON.", + code + ); + return analysis; + }); + + log("Analyzed " + imports.length + " files"); + + // Have LLM create a dependency graph summary + const summary = LLM( + "Given these import analyses for DSL module files, create a brief dependency summary: " + + "which files depend on what external packages and internal modules. " + + "Format as a simple list. Be concise.", + imports.join("\\n") + ); + + return summary; + `, + (r) => { + if (typeof r.result !== 'string') return `Expected string`; + if (r.result.length < 30) return `Summary too short`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 9: Stress test — many parallel LLM calls + // ──────────────────────────────────────────────── + + await runTest( + 'Stress: 10 parallel LLM calls via map()', + ` + const items = range(1, 11); + const results = map(items, (n) => { + const answer = LLM( + "Return ONLY a single number: the square of " + n + ". Nothing else, just the number.", + "Calculate " + n + " * " + n + ); + return { n: n, squared: String(answer).trim() }; + }); + log("Completed " + results.length + " parallel LLM calls"); + return results; + `, + (r) => { + if (r.status === 'error') return `Execution error: ${r.error}`; + if (!Array.isArray(r.result)) return `Expected array, got ${typeof r.result}`; + if (r.result.length !== 10) return `Expected 10 results, got ${r.result.length}`; + const first = r.result[0]; + if (first.n === undefined || first.squared === undefined) return `Missing fields: ${JSON.stringify(first)}`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 10: Complex conditional logic + // ──────────────────────────────────────────────── + + await runTest( + 'Conditional routing: different processing based on search results', + ` + const queries = ["BLOCKED_IDENTIFIERS", "nonexistent_symbol_xyz_12345"]; + const results = []; + + for (const q of queries) { + const searchResult = search(q); + + if (searchResult.length > 500) { + // Rich results — summarize + const summary = LLM("Summarize this code in one sentence.", searchResult); + results.push({ query: q, status: "found", summary }); + } else if (searchResult.length > 100) { + // Some results — note them + results.push({ query: q, status: "partial", chars: searchResult.length }); + } else { + // No meaningful results + results.push({ query: q, status: "not_found" }); + } + log(q + " -> " + results[results.length - 1].status); + } + + return results; + `, + (r) => { + if (!Array.isArray(r.result)) return `Expected array`; + if (r.result.length !== 2) return `Expected 2 results`; + if (r.result[0].status !== 'found') return `First query should be 'found'`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 11: While + search iteration (paginated search simulation) + // ──────────────────────────────────────────────── + + await runTest( + 'Iterative deepening: search, then search within results', + ` + // First broad search + const broad = search("sandbox"); + const broadSummary = LLM( + "From these search results, identify the 2 most important function names " + + "related to sandboxing. Return ONLY the function names separated by comma.", + broad + ); + log("Broad search found key functions: " + broadSummary); + + // Now search specifically for each function + const parts = broadSummary.split(","); + const functions = []; + for (const p of parts) { + const trimmed = p.trim(); + if (trimmed.length > 0) functions.push(trimmed); + } + log("Will search for " + functions.length + " functions"); + + const details = map(functions.slice(0, 2), (fn) => { + const detail = search(fn); + const analysis = LLM( + "Explain what the function '" + fn + "' does in 1-2 sentences based on this code.", + detail + ); + return { name: fn, description: analysis }; + }); + + return details; + `, + (r) => { + if (r.status === 'error') return `Execution error: ${r.error}`; + if (!Array.isArray(r.result)) return `Expected array, got ${typeof r.result}: ${JSON.stringify(r.result).substring(0, 200)}`; + if (r.result.length < 1) return `Expected at least 1 function analyzed`; + if (!r.result[0].description) return `Missing description`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 12: Full analyze_all replacement pattern + // ──────────────────────────────────────────────── + + await runTest( + 'analyze_all replacement: comprehensive codebase question', + ` + // Question: "What testing patterns are used in the DSL module?" + + // Phase 1: Search for test-related code + const testResults = search("test DSL validator transformer runtime"); + + // Phase 2: Chunk and extract patterns + const chunks = chunk(testResults, 6000); + log("Processing " + chunks.length + " test chunks"); + + const patterns = map(chunks, (c) => LLM( + "Extract testing patterns from this code. For each pattern found, note: " + + "1) Pattern name (e.g., 'mock functions', 'assertion style', 'test structure') " + + "2) Brief description " + + "Return as a bullet list. Be concise.", + c + )); + + // Phase 3: Synthesize + const answer = LLM( + "You are answering the question: 'What testing patterns are used in the DSL module?' " + + "Based on the analysis below, provide a comprehensive but concise answer. " + + "Organize by pattern type. Use bullet points. Max 10 bullet points.", + patterns.join("\\n---\\n") + ); + + return answer; + `, + (r) => { + if (typeof r.result !== 'string') return `Expected string`; + if (r.result.length < 100) return `Answer too short`; + return true; + } + ); + + // ──────────────────────────────────────────────── + // SECTION 13: Discovery-first pattern + // ──────────────────────────────────────────────── + + await runTest( + 'Discovery-first: explore repo then plan search strategy', + ` + // Phase 1: Discover repo structure + const files = listFiles("**/*"); + const sample = search("error handling"); + log("Files length: " + String(files).length + ", sample length: " + String(sample).length); + + // Phase 2: Ask LLM to determine optimal search strategy + const plan = LLM( + "Based on this repository structure and sample search results, determine the best search strategy " + + "to answer: 'What are all the validation approaches in this codebase?' " + + "Return a JSON object with: keywords (array of 2-3 search queries that will find relevant data), " + + "extractionFocus (what to extract from each result), " + + "and aggregation (summarize or list_unique). " + + "IMPORTANT: Only suggest keywords likely to match actual content you see. Return ONLY valid JSON.", + "Repository files:\\n" + String(files).substring(0, 3000) + "\\nSample results:\\n" + String(sample).substring(0, 3000) + ); + const strategy = JSON.parse(String(plan)); + log("Strategy keywords: " + strategy.keywords.length + ", focus: " + strategy.extractionFocus); + + // Phase 3: Execute with discovered strategy + const allFindings = []; + for (const kw of strategy.keywords) { + const results = search(kw); + if (String(results).length > 500) { + const chunks = chunk(results); + const findings = map(chunks, (c) => LLM(strategy.extractionFocus, c)); + for (const f of findings) { allFindings.push(String(f)); } + log("Keyword '" + kw + "': " + chunks.length + " chunks processed"); + } else { + log("Keyword '" + kw + "': skipped (too few results)"); + } + } + var combined = ""; + for (const f of allFindings) { combined = combined + f + "\\n---\\n"; } + return LLM("Synthesize all findings about validation approaches into a comprehensive answer.", combined); + `, + (r) => { + if (typeof r.result !== 'string') return `Expected string`; + if (r.result.length < 100) return `Answer too short: ${r.result.length} chars`; + return true; + } + ); + + // ── Summary ── + console.log(`\n${'═'.repeat(70)}`); + console.log(` Results: ${passed} passed, ${failed} failed, ${testNum} total`); + console.log('═'.repeat(70)); + + process.exit(failed > 0 ? 1 : 0); +} + +main().catch(e => { + console.error('Fatal error:', e); + process.exit(1); +}); diff --git a/npm/src/agent/dsl/output-buffer-test.mjs b/npm/src/agent/dsl/output-buffer-test.mjs new file mode 100644 index 00000000..c0cf6faa --- /dev/null +++ b/npm/src/agent/dsl/output-buffer-test.mjs @@ -0,0 +1,124 @@ +#!/usr/bin/env node +/** + * Quick E2E test of the output buffer feature. + */ + +import { createDSLRuntime } from './runtime.js'; + +const outputBuffer = { items: [] }; +const runtime = createDSLRuntime({ + toolImplementations: { + search: { execute: async (p) => 'Result for: ' + p.query + '\nLine 1\nLine 2\nLine 3' }, + }, + llmCall: async (inst, data) => 'LLM processed: ' + String(data).substring(0, 50), + outputBuffer, +}); + +let passed = 0; +let failed = 0; + +function check(name, condition) { + if (condition) { + console.log(' ✓ ' + name); + passed++; + } else { + console.log(' ✗ ' + name); + failed++; + } +} + +// Test 1: output() writes to buffer, return value separate +console.log('\nTest 1: output() + return'); +outputBuffer.items = []; +const r1 = await runtime.execute(` + const data = search("test query"); + output("## Full Results"); + output(data); + return "Summary: found results"; +`, 'test 1'); + +check('status is success', r1.status === 'success'); +check('return value correct', r1.result === 'Summary: found results'); +check('buffer has 2 items', outputBuffer.items.length === 2); +check('buffer[0] is header', outputBuffer.items[0] === '## Full Results'); +check('buffer[1] has search data', outputBuffer.items[1].includes('Result for: test query')); +check('logs include [output]', r1.logs.some(l => l.startsWith('[output]'))); + +// Test 2: output() with JSON object +console.log('\nTest 2: output() with JSON'); +outputBuffer.items = []; +const r2 = await runtime.execute(` + output({ customers: ["Acme", "BigCo"], count: 2 }); + return "Found 2 customers"; +`, 'test 2'); + +check('status is success', r2.status === 'success'); +check('return is summary', r2.result === 'Found 2 customers'); +check('buffer has 1 item', outputBuffer.items.length === 1); +const parsed = JSON.parse(outputBuffer.items[0]); +check('parsed JSON correct', parsed.count === 2 && parsed.customers[0] === 'Acme'); + +// Test 3: output() persists across calls (accumulates) +console.log('\nTest 3: Accumulation across calls'); +outputBuffer.items = []; +await runtime.execute(`output("first call")`, 'call 1'); +await runtime.execute(`output("second call")`, 'call 2'); +check('buffer has 2 items from 2 calls', outputBuffer.items.length === 2); +check('items correct', outputBuffer.items[0] === 'first call' && outputBuffer.items[1] === 'second call'); + +// Test 4: output() ignores null/undefined +console.log('\nTest 4: Ignores null/undefined'); +outputBuffer.items = []; +const r4 = await runtime.execute(` + output(null); + output(undefined); + output("real content"); + return "done"; +`, 'test 4'); +check('buffer has only 1 item', outputBuffer.items.length === 1); +check('only real content', outputBuffer.items[0] === 'real content'); + +// Test 5: Large table simulation +console.log('\nTest 5: Large table'); +outputBuffer.items = []; +const r5 = await runtime.execute(` + var rows = []; + for (var i = 0; i < 100; i++) { + rows.push("| Customer " + i + " | Tech | Active |"); + } + var header = "| Customer | Industry | Status |\\n| --- | --- | --- |\\n"; + var table = header; + for (const row of rows) { + table = table + row + "\\n"; + } + output(table); + return "Generated table with 100 customers"; +`, 'test 5'); + +check('status is success', r5.status === 'success'); +check('return is summary', r5.result === 'Generated table with 100 customers'); +check('buffer has table', outputBuffer.items[0].includes('Customer 99')); +check('table is large', outputBuffer.items[0].length > 2000); + +// Test 6: No outputBuffer = no output() function +console.log('\nTest 6: No outputBuffer'); +const runtimeNoBuffer = createDSLRuntime({ + toolImplementations: { + search: { execute: async (p) => 'ok' }, + }, + llmCall: async () => 'ok', +}); + +const r6 = await runtimeNoBuffer.execute(` + if (typeof output === "undefined") { + return "output not available"; + } + return "output available"; +`, 'test 6'); +check('output not available without buffer', r6.result === 'output not available'); + +// Summary +console.log('\n' + '═'.repeat(50)); +console.log(` Output Buffer E2E: ${passed} passed, ${failed} failed`); +console.log('═'.repeat(50)); +process.exit(failed > 0 ? 1 : 0); diff --git a/npm/src/agent/dsl/pipeline-direct-test.mjs b/npm/src/agent/dsl/pipeline-direct-test.mjs new file mode 100644 index 00000000..3c1ac45d --- /dev/null +++ b/npm/src/agent/dsl/pipeline-direct-test.mjs @@ -0,0 +1,147 @@ +#!/usr/bin/env node +/** + * Direct DSL runtime test against customer-insights repo. + * Bypasses ProbeAgent — runs scripts directly against the runtime. + */ + +import { createDSLRuntime } from './runtime.js'; +import { search } from '../../search.js'; +import { extract } from '../../extract.js'; +import { createGoogleGenerativeAI } from '@ai-sdk/google'; +import { generateText } from 'ai'; +import { config } from 'dotenv'; +import { resolve, dirname } from 'path'; +import { fileURLToPath } from 'url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const projectRoot = resolve(__dirname, '../../../..'); +config({ path: resolve(projectRoot, '.env') }); + +const apiKey = process.env.GOOGLE_GENERATIVE_AI_API_KEY || process.env.GOOGLE_API_KEY; +if (!apiKey) { console.error('No API key'); process.exit(1); } + +const google = createGoogleGenerativeAI({ apiKey }); + +async function llmCall(instruction, data, options = {}) { + const dataStr = data == null ? '' : (typeof data === 'string' ? data : JSON.stringify(data, null, 2)); + const prompt = (dataStr || '(empty)').substring(0, 100000); + const result = await generateText({ + model: google('gemini-2.5-flash'), + system: instruction, + prompt, + temperature: options.temperature || 0.3, + maxTokens: options.maxTokens || 4000, + }); + return result.text; +} + +const TARGET = '/tmp/customer-insights'; + +const runtime = createDSLRuntime({ + toolImplementations: { + search: { execute: async (params) => { + try { + return await search({ query: params.query, path: TARGET, maxTokens: 20000, timeout: 60 }); + } catch(e) { return 'Search error: ' + e.message; } + }}, + extract: { execute: async (params) => { + try { + return await extract({ targets: params.targets, cwd: TARGET }); + } catch(e) { return 'Extract error: ' + e.message; } + }}, + listFiles: { execute: async (params) => { + try { + return await search({ query: params.pattern || 'customer', path: TARGET, filesOnly: true, maxTokens: 10000, timeout: 60 }); + } catch(e) { return 'listFiles error: ' + e.message; } + }}, + }, + llmCall, + mapConcurrency: 3, + timeoutMs: 300000, + maxLoopIterations: 5000, +}); + +console.log('═'.repeat(70)); +console.log(' Direct DSL Pipeline Test — customer-insights repo'); +console.log('═'.repeat(70)); + +const start = Date.now(); +const result = await runtime.execute(` +// Step 1: Broad search for customer data +const results = search("customer onboarding playbook"); +log("Search returned " + String(results).length + " chars"); + +// Step 2: Split into chunks and extract customer info using LLM +const chunks = chunk(results); +log("Split into " + chunks.length + " chunks"); + +const classified = map(chunks, (c) => LLM( + "Extract customer names and their industry from this text. " + + "Return a JSON array: [{customer: string, industry: string, notes: string}]. " + + "Return ONLY valid JSON array, no other text.", + c +)); + +// Step 3: Accumulate parsed results +var allCustomers = []; +for (const batch of classified) { + try { + var text = String(batch).trim(); + var jsonStart = text.indexOf("["); + var jsonEnd = text.lastIndexOf("]"); + if (jsonStart >= 0 && jsonEnd > jsonStart) { + text = text.substring(jsonStart, jsonEnd + 1); + } + var parsed = JSON.parse(text); + if (Array.isArray(parsed)) { + for (const item of parsed) { allCustomers.push(item); } + } + } catch (e) { + log("Parse error, skipping chunk"); + } +} + +log("Total customers extracted: " + allCustomers.length); + +// Step 4: Deduplicate +var seen = {}; +var uniqueCustomers = []; +for (const c of allCustomers) { + var key = String(c.customer || "").trim().toLowerCase(); + if (key.length > 0 && !seen[key]) { + seen[key] = true; + uniqueCustomers.push(c); + } +} + +log("Unique customers: " + uniqueCustomers.length); + +// Step 5: Build markdown table +var table = "| Customer | Industry | Notes |\\n|---|---|---|\\n"; +for (const c of uniqueCustomers) { + table = table + "| " + (c.customer || "Unknown") + " | " + (c.industry || "Unknown") + " | " + (c.notes || "-") + " |\\n"; +} + +// Step 6: Small LLM summary +const summary = LLM( + "Based on this customer table, write a brief 2-3 sentence summary of the customer base — what industries are represented, any patterns.", + table +); + +return table + "\\n" + summary; +`, 'Customer classification pipeline'); + +const elapsed = Math.round((Date.now() - start) / 1000); + +console.log('\n' + '─'.repeat(70)); +console.log(`Status: ${result.status} (${elapsed}s)`); +console.log(`Logs: ${result.logs.join(' | ')}`); + +if (result.status === 'error') { + console.log(`Error: ${result.error}`); +} else { + console.log('─'.repeat(70)); + console.log(result.result); +} + +process.exit(result.status === 'error' ? 1 : 0); diff --git a/npm/src/agent/dsl/pipeline-test.mjs b/npm/src/agent/dsl/pipeline-test.mjs new file mode 100644 index 00000000..4f7fbf82 --- /dev/null +++ b/npm/src/agent/dsl/pipeline-test.mjs @@ -0,0 +1,223 @@ +#!/usr/bin/env node +/** + * Data pipeline end-to-end test using ProbeAgent with enableExecutePlan. + * + * Tests against the TykTechnologies/customer-insights repo (/tmp/customer-insights) + * to verify the full data pipeline flow: + * 1. Agent picks execute_plan for comprehensive/inventory questions + * 2. LLM generates DSL scripts with search → chunk → LLM classify → accumulate + * 3. Session store persists data across multi-step execution + * 4. Returns structured results (tables, JSON, reports) + * + * Usage: + * node npm/src/agent/dsl/pipeline-test.mjs + * + * Requires: + * - GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI_API_KEY in .env + * - /tmp/customer-insights repo cloned + */ + +import { ProbeAgent } from '../ProbeAgent.js'; +import { config } from 'dotenv'; +import { resolve, dirname } from 'path'; +import { fileURLToPath } from 'url'; +import { existsSync } from 'fs'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const projectRoot = resolve(__dirname, '../../../..'); + +config({ path: resolve(projectRoot, '.env') }); + +const apiKey = process.env.GOOGLE_GENERATIVE_AI_API_KEY || process.env.GOOGLE_API_KEY; +if (!apiKey) { + console.error('ERROR: No Google API key found. Set GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI_API_KEY'); + process.exit(1); +} + +const TARGET_REPO = '/tmp/customer-insights'; +if (!existsSync(TARGET_REPO)) { + console.error('ERROR: customer-insights repo not found at ' + TARGET_REPO); + console.error('Clone it: git clone /tmp/customer-insights'); + process.exit(1); +} + +// ── Test definitions ── +const tests = [ + { + name: 'Customer classification — categorize all customers by industry/type', + query: 'Analyze ALL customer files in this repository. For every customer, classify them by industry (finance, tech, healthcare, government, etc.) and determine their use case type (API management, security, integration, etc.). Produce a comprehensive markdown table with columns: Customer, Industry, Use Case Type, and a brief note. Give me complete inventory.', + maxIterations: 50, + timeoutMs: 300000, + check: (result, toolCalls) => { + // Should have triggered execute_plan + const usedExecutePlan = toolCalls.some(t => t === 'execute_plan'); + if (!usedExecutePlan) return 'Did not trigger execute_plan — used: ' + toolCalls.join(', '); + // Result should be substantial + if (!result || result.length < 200) return 'Result too short: ' + (result?.length || 0); + return true; + }, + }, + { + name: 'Sentiment & pain points extraction — data pipeline pattern', + query: 'Go through every customer document in this repo. For each customer, extract their main pain points and sentiment (positive, neutral, negative) about Tyk. Produce a structured report with: 1) A summary table of sentiment distribution, 2) Top 5 most common pain points with customer counts, 3) Customers with negative sentiment and why. Be comprehensive — cover ALL customers.', + maxIterations: 50, + timeoutMs: 300000, + check: (result, toolCalls) => { + const usedExecutePlan = toolCalls.some(t => t === 'execute_plan'); + if (!usedExecutePlan) return 'Did not trigger execute_plan'; + if (!result || result.length < 200) return 'Result too short: ' + (result?.length || 0); + return true; + }, + }, + { + name: 'Feature adoption matrix — multi-search data pipeline', + query: 'Create a complete feature adoption matrix for this customer base. Search for mentions of: API gateway, dashboard, developer portal, analytics, rate limiting, authentication, policies, and GraphQL. For each feature, list which customers use it. Return a markdown table where rows are features and columns show customer count + list of customer names.', + maxIterations: 50, + timeoutMs: 300000, + check: (result, toolCalls) => { + const usedExecutePlan = toolCalls.some(t => t === 'execute_plan'); + if (!usedExecutePlan) return 'Did not trigger execute_plan'; + if (!result || result.length < 100) return 'Result too short: ' + (result?.length || 0); + return true; + }, + }, +]; + +// ── Test runner ── +let testNum = 0; +let passed = 0; +let failed = 0; + +async function runPipelineTest(test) { + testNum++; + console.log(`\n${'═'.repeat(70)}`); + console.log(`▶ Test ${testNum}/${tests.length}: ${test.name}`); + console.log(` Query: "${test.query.substring(0, 120)}..."`); + console.log('─'.repeat(70)); + + const toolCalls = []; + const toolDetails = []; + + const agent = new ProbeAgent({ + path: TARGET_REPO, + provider: 'google', + model: 'gemini-2.5-flash', + enableExecutePlan: true, + maxIterations: test.maxIterations || 50, + }); + + // Listen for tool call events + agent.events.on('toolCall', (event) => { + if (event.status === 'started') { + toolCalls.push(event.name); + const desc = event.description ? ` — ${event.description.substring(0, 80)}` : ''; + console.log(` [tool:start] ${event.name}${desc}`); + } + if (event.status === 'completed') { + const preview = event.resultPreview || ''; + console.log(` [tool:done] ${event.name} (${String(preview).length} chars preview)`); + } + if (event.status === 'error') { + console.log(` [tool:error] ${event.name}: ${event.error?.substring(0, 100)}`); + } + }); + + await agent.initialize(); + + const start = Date.now(); + let result; + try { + result = await Promise.race([ + agent.answer(test.query), + new Promise((_, reject) => + setTimeout(() => reject(new Error('Test timeout')), test.timeoutMs || 180000) + ), + ]); + } catch (e) { + const elapsed = Math.round((Date.now() - start) / 1000); + console.log(`\n [warn] Agent finished with: ${e.message?.substring(0, 150)} (${elapsed}s)`); + // Still check what we got — agent may have partial result + result = e.message; + } + + const elapsed = Math.round((Date.now() - start) / 1000); + + console.log('─'.repeat(70)); + console.log(` Duration: ${elapsed}s`); + console.log(` Tool calls: [${toolCalls.join(', ')}]`); + console.log(` execute_plan used: ${toolCalls.includes('execute_plan') ? 'YES' : 'NO'}`); + + const resultStr = typeof result === 'string' ? result : JSON.stringify(result); + console.log(` Result length: ${resultStr?.length || 0} chars`); + + // Show result preview + if (resultStr) { + console.log('─'.repeat(70)); + console.log(' Result preview:'); + const lines = resultStr.split('\n').slice(0, 25); + for (const line of lines) { + console.log(' │ ' + line.substring(0, 100)); + } + if (resultStr.split('\n').length > 25) { + console.log(' │ ... (' + (resultStr.split('\n').length - 25) + ' more lines)'); + } + } + + // Run check + const checkResult = test.check(resultStr, toolCalls); + if (checkResult === true) { + console.log(`\n ✓ PASSED (${elapsed}s)`); + passed++; + } else { + console.log(`\n ✗ FAILED — ${checkResult} (${elapsed}s)`); + failed++; + } + + // Token usage + try { + const usage = agent.getTokenUsage(); + if (usage) { + console.log(` Tokens: input=${usage.inputTokens || 0} output=${usage.outputTokens || 0} total=${usage.totalTokens || 0}`); + } + } catch (e) { + // ignore + } + + try { + await agent.close(); + } catch (e) { + // ignore cleanup errors + } +} + +// ── Main ── +async function main() { + console.log('═'.repeat(70)); + console.log(' Data Pipeline E2E Tests — ProbeAgent + execute_plan'); + console.log(' Target: TykTechnologies/customer-insights'); + console.log(' Config: enableExecutePlan=true, provider=google, model=gemini-2.5-flash'); + console.log('═'.repeat(70)); + + // Allow running a specific test by number + const testIndex = process.argv[2] ? parseInt(process.argv[2], 10) - 1 : null; + + if (testIndex !== null && testIndex >= 0 && testIndex < tests.length) { + console.log(`\nRunning test ${testIndex + 1} only: "${tests[testIndex].name}"`); + await runPipelineTest(tests[testIndex]); + } else { + for (const test of tests) { + await runPipelineTest(test); + } + } + + console.log(`\n${'═'.repeat(70)}`); + console.log(` Results: ${passed} passed, ${failed} failed, ${testNum} total`); + console.log('═'.repeat(70)); + + process.exit(failed > 0 ? 1 : 0); +} + +main().catch(e => { + console.error('Fatal error:', e); + process.exit(1); +}); diff --git a/npm/src/agent/dsl/runtime.js b/npm/src/agent/dsl/runtime.js new file mode 100644 index 00000000..d55b0199 --- /dev/null +++ b/npm/src/agent/dsl/runtime.js @@ -0,0 +1,206 @@ +/** + * DSL Runtime - SandboxJS execution engine. + * + * Orchestrates the full pipeline: + * 1. Validate (AST whitelist) + * 2. Transform (inject await, wrap in async IIFE) + * 3. Execute in SandboxJS with tool globals + timeout + * + * Returns the result or a structured error. + */ + +import SandboxModule from '@nyariv/sandboxjs'; +import { validateDSL } from './validator.js'; +import { transformDSL } from './transformer.js'; +import { generateSandboxGlobals, getAsyncFunctionNames } from './environment.js'; + +const Sandbox = SandboxModule.default || SandboxModule; + +/** + * Create a DSL runtime instance. + * + * @param {Object} options + * @param {Object} options.toolImplementations - Native tool execute functions + * @param {Object} [options.mcpBridge] - MCP bridge for calling MCP tools + * @param {Object} [options.mcpTools={}] - MCP tool metadata + * @param {Function} options.llmCall - Function for LLM() calls: (instruction, data, options?) => Promise + * @param {number} [options.mapConcurrency=3] - Concurrency limit for map() + * @param {number} [options.timeoutMs=120000] - Execution timeout in milliseconds (default 2 min) + * @param {number} [options.maxLoopIterations=5000] - Max iterations for while/for loops + * @param {Object} [options.tracer=null] - SimpleAppTracer instance for OTEL telemetry + * @returns {Object} Runtime with execute() method + */ +export function createDSLRuntime(options) { + const { + toolImplementations = {}, + mcpBridge = null, + mcpTools = {}, + llmCall, + mapConcurrency = 3, + timeoutMs = 120000, + maxLoopIterations = 5000, + tracer = null, + sessionStore = {}, + outputBuffer = null, + } = options; + + // Generate the globals and async function names, passing tracer for per-call tracing + const toolGlobals = generateSandboxGlobals({ + toolImplementations, + mcpBridge, + mcpTools, + llmCall, + mapConcurrency, + tracer, + sessionStore, + outputBuffer, + }); + + const asyncFunctionNames = getAsyncFunctionNames(mcpTools); + + /** + * Execute DSL code. + * + * @param {string} code - The LLM-generated DSL code (sync-looking) + * @param {string} [description] - Human-readable description for logging + * @returns {Promise<{ status: 'success'|'error', result?: any, error?: string, logs: string[] }>} + */ + async function execute(code, description) { + const logs = []; + const startTime = Date.now(); + + // Step 1: Validate + tracer?.addEvent?.('dsl.phase.validate_start', { + 'dsl.code_length': code.length, + }); + + const validation = validateDSL(code); + if (!validation.valid) { + tracer?.addEvent?.('dsl.phase.validate_failed', { + 'dsl.error_count': validation.errors.length, + 'dsl.errors': validation.errors.join('; ').substring(0, 500), + }); + return { + status: 'error', + error: `Validation failed:\n${validation.errors.join('\n')}`, + logs, + }; + } + + tracer?.addEvent?.('dsl.phase.validate_complete'); + + // Step 2: Transform (inject await, wrap in async IIFE) + let transformedCode; + try { + tracer?.addEvent?.('dsl.phase.transform_start'); + transformedCode = transformDSL(code, asyncFunctionNames); + tracer?.addEvent?.('dsl.phase.transform_complete', { + 'dsl.transformed_length': transformedCode.length, + }); + } catch (e) { + tracer?.addEvent?.('dsl.phase.transform_failed', { + 'dsl.error': e.message, + }); + return { + status: 'error', + error: `Transform failed: ${e.message}`, + logs, + }; + } + + // Step 3: Execute in SandboxJS with timeout + tracer?.addEvent?.('dsl.phase.execute_start', { + 'dsl.timeout_ms': timeoutMs, + 'dsl.max_loop_iterations': maxLoopIterations, + }); + + try { + // Set up log collector + toolGlobals._logs = logs; + + // Loop iteration counter for infinite loop protection + let loopIterations = 0; + toolGlobals.__checkLoop = () => { + loopIterations++; + if (loopIterations > maxLoopIterations) { + throw new Error(`Loop exceeded maximum of ${maxLoopIterations} iterations. Use break to exit loops earlier or process fewer items.`); + } + }; + + const sandbox = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + ...toolGlobals, + // Override: remove dangerous globals that SAFE_GLOBALS might include + Function: undefined, + eval: undefined, + }, + prototypeWhitelist: Sandbox.SAFE_PROTOTYPES, + }); + + const exec = sandbox.compileAsync(transformedCode); + + // Catch unhandled rejections from SandboxJS async error propagation + let escapedError = null; + const rejectionHandler = (reason) => { + escapedError = reason; + }; + process.on('unhandledRejection', rejectionHandler); + + // Race execution against timeout + let timeoutHandle; + const executionPromise = exec().run(); + const timeoutPromise = new Promise((_, reject) => { + timeoutHandle = setTimeout(() => { + reject(new Error(`Execution timed out after ${Math.round(timeoutMs / 1000)}s. Script took too long — reduce the amount of work (fewer items, smaller data) or increase timeout.`)); + }, timeoutMs); + }); + + let result; + try { + result = await Promise.race([executionPromise, timeoutPromise]); + } finally { + clearTimeout(timeoutHandle); + // Delay handler removal — SandboxJS can throw async errors after execution completes + setTimeout(() => { + process.removeListener('unhandledRejection', rejectionHandler); + }, 500); + } + + // Check for escaped async errors + if (escapedError) { + throw escapedError; + } + + const elapsed = Date.now() - startTime; + logs.push(`[runtime] Completed in ${elapsed}ms`); + + tracer?.addEvent?.('dsl.phase.execute_complete', { + 'dsl.duration_ms': elapsed, + 'dsl.loop_iterations': loopIterations, + }); + + return { + status: 'success', + result, + logs, + }; + } catch (e) { + const elapsed = Date.now() - startTime; + logs.push(`[runtime] Failed after ${elapsed}ms`); + + tracer?.addEvent?.('dsl.phase.execute_failed', { + 'dsl.duration_ms': elapsed, + 'dsl.error': e.message?.substring(0, 500), + }); + + return { + status: 'error', + error: `Execution failed: ${e.message}`, + logs, + }; + } + } + + return { execute }; +} diff --git a/npm/src/agent/dsl/sandbox-experiment.mjs b/npm/src/agent/dsl/sandbox-experiment.mjs new file mode 100644 index 00000000..8b5eb854 --- /dev/null +++ b/npm/src/agent/dsl/sandbox-experiment.mjs @@ -0,0 +1,309 @@ +/** + * Quick experiment to verify SandboxJS capabilities for our DSL runtime. + * + * Tests: + * 1. compileAsync() with host async functions as globals + * 2. Error propagation from sandbox to host + * 3. Tick limits + * 4. Sandbox.audit() for introspection + * 5. map() concurrency pattern + * 6. Nested async calls (callback inside map that calls async) + */ + +import SandboxModule from '@nyariv/sandboxjs'; +const Sandbox = SandboxModule.default || SandboxModule; + +async function test(name, fn) { + try { + const result = await fn(); + console.log(`PASS: ${name}`, result !== undefined ? `→ ${JSON.stringify(result)}` : ''); + } catch (e) { + console.log(`FAIL: ${name} → ${e.message}`); + } +} + +// Test 1: Basic async function as global +await test('Host async function as global', async () => { + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + fetchData: async (query) => { + return { results: [`result for: ${query}`] }; + } + } + }); + const exec = s.compileAsync(` + const data = await fetchData("test query"); + return data.results[0]; + `); + const result = await exec().run(); + if (result !== 'result for: test query') throw new Error(`Expected 'result for: test query', got '${result}'`); + return result; +}); + +// Test 2: Multiple sequential async calls +await test('Multiple sequential async calls', async () => { + const callLog = []; + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + step: async (n) => { + callLog.push(n); + return `done-${n}`; + } + } + }); + const exec = s.compileAsync(` + const a = await step(1); + const b = await step(2); + const c = await step(3); + return a + "," + b + "," + c; + `); + const result = await exec().run(); + if (callLog.join(',') !== '1,2,3') throw new Error(`Wrong call order: ${callLog}`); + return result; +}); + +// Test 3: Error propagation +await test('Error propagation from async global', async () => { + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + failingTool: async () => { + throw new Error('Tool failed!'); + } + } + }); + const exec = s.compileAsync(` + const result = await failingTool(); + return result; + `); + try { + await exec().run(); + throw new Error('Should have thrown'); + } catch (e) { + if (!e.message.includes('Tool failed')) throw new Error(`Wrong error: ${e.message}`); + return 'Error correctly propagated'; + } +}); + +// Test 4: Sandbox.audit() +await test('Sandbox.audit() reports accessed globals', async () => { + const audit = Sandbox.audit(` + const x = myFunc("test"); + const y = otherFunc(x); + return y; + `); + return audit; +}); + +// Test 5: Code without await (what LLM would write, before our transform) +await test('Async global called WITHOUT await', async () => { + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + fetchData: async (query) => { + return { results: [`result for: ${query}`] }; + } + } + }); + // Without await - should return a Promise object, not resolved value + const exec = s.compileAsync(` + const data = fetchData("test"); + return data; + `); + const result = await exec().run(); + // Check if it's a Promise (unresolved) or the actual value + const isPromise = result && typeof result.then === 'function'; + return { isPromise, type: typeof result, value: isPromise ? 'Promise (unresolved)' : result }; +}); + +// Test 6: Custom throw for pause-like mechanism (just to verify throws work) +await test('Custom throw propagation', async () => { + class PauseSignal { + constructor(value) { this.value = value; this.isPause = true; } + } + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + pause: (value) => { throw new PauseSignal(value); } + } + }); + const exec = s.compileAsync(` + const x = 42; + pause({ result: x }); + return "should not reach here"; + `); + try { + await exec().run(); + throw new Error('Should have thrown'); + } catch (e) { + if (e.isPause) { + return { paused: true, value: e.value }; + } + return { paused: false, error: e.message }; + } +}); + +// Test 7: for...of loop with async +await test('for...of with async calls', async () => { + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + process: async (item) => item * 2 // Note: 'process' as a name - might conflict + } + }); + const exec = s.compileAsync(` + const items = [1, 2, 3, 4, 5]; + const results = []; + for (const item of items) { + results.push(await process(item)); + } + return results; + `); + const result = await exec().run(); + const expected = [2, 4, 6, 8, 10]; + if (JSON.stringify(result) !== JSON.stringify(expected)) throw new Error(`Got ${JSON.stringify(result)}`); + return result; +}); + +// Test 8: Passing scope variables +await test('Scope variables accessible in sandbox', async () => { + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + transform: async (x) => x.toUpperCase() + } + }); + const exec = s.compileAsync(` + const result = await transform(inputData); + return result; + `); + const result = await exec({ inputData: 'hello world' }).run(); + if (result !== 'HELLO WORLD') throw new Error(`Got '${result}'`); + return result; +}); + +// Test 9: Arrow function callback with async inside +await test('Arrow function with async call inside', async () => { + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + processItem: async (item) => item * 10 + } + }); + const exec = s.compileAsync(` + const items = [1, 2, 3]; + const fn = async (item) => { + const result = await processItem(item); + return result; + }; + const results = []; + for (const item of items) { + results.push(await fn(item)); + } + return results; + `); + const result = await exec().run(); + if (JSON.stringify(result) !== '[10,20,30]') throw new Error(`Got ${JSON.stringify(result)}`); + return result; +}); + +// Test 10: map() as a custom global with concurrency +await test('Custom map() with concurrency control', async () => { + let concurrent = 0; + let maxConcurrent = 0; + + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + processItem: async (item) => { + concurrent++; + maxConcurrent = Math.max(maxConcurrent, concurrent); + await new Promise(r => setTimeout(r, 50)); // simulate work + concurrent--; + return item * 2; + }, + map: async (items, fn) => { + const concurrency = 3; + const results = []; + const executing = new Set(); + for (const item of items) { + const p = fn(item).then(result => { + executing.delete(p); + return result; + }); + executing.add(p); + results.push(p); + if (executing.size >= concurrency) { + await Promise.race(executing); + } + } + return Promise.all(results); + } + } + }); + const exec = s.compileAsync(` + const items = [1, 2, 3, 4, 5, 6, 7, 8]; + const results = await map(items, async (item) => { + return await processItem(item); + }); + return results; + `); + const result = await exec().run(); + const expected = [2, 4, 6, 8, 10, 12, 14, 16]; + if (JSON.stringify(result) !== JSON.stringify(expected)) throw new Error(`Got ${JSON.stringify(result)}`); + return { result, maxConcurrent }; +}); + +// Test 11: map() called WITHOUT async/await in the callback (what LLM would write) +await test('map() where LLM writes sync-looking callback', async () => { + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + processItem: async (item) => item * 2, + map: async (items, fn) => { + // fn might return a promise even if not declared async + const results = []; + for (const item of items) { + const result = await fn(item); + results.push(result); + } + return results; + } + } + }); + // LLM writes this - no async, no await in callback + const exec = s.compileAsync(` + const items = [1, 2, 3]; + const results = await map(items, (item) => { + return processItem(item); + }); + return results; + `); + const result = await exec().run(); + if (JSON.stringify(result) !== '[2,4,6]') throw new Error(`Got ${JSON.stringify(result)}`); + return result; +}); + +// Test 12: Verify blocked globals are truly inaccessible +await test('Blocked globals not accessible', async () => { + const s = new Sandbox({ + globals: { + ...Sandbox.SAFE_GLOBALS, + // Deliberately NOT including: require, process, setTimeout, fetch + } + }); + const exec = s.compileAsync(` + try { + const x = setTimeout; + return "FAIL: setTimeout accessible"; + } catch(e) { + return "PASS: setTimeout blocked"; + } + `); + const result = await exec().run(); + return result; +}); + +console.log('\n--- Experiment complete ---'); diff --git a/npm/src/agent/dsl/transformer.js b/npm/src/agent/dsl/transformer.js new file mode 100644 index 00000000..2ae7aa92 --- /dev/null +++ b/npm/src/agent/dsl/transformer.js @@ -0,0 +1,156 @@ +/** + * AST Transformer - Auto-injects await before async tool calls. + * + * The LLM writes synchronous-looking code. This transformer: + * 1. Parses the code into an AST + * 2. Finds all CallExpressions where the callee is a known async tool function + * 3. Inserts `await` before those calls in the source + * 4. Marks arrow functions containing async calls as `async` + * 5. Wraps the whole program in an async IIFE + * + * Uses offset-based string insertion (not AST regeneration) to preserve + * the original code structure as much as possible. + */ + +import * as acorn from 'acorn'; +import * as walk from 'acorn-walk'; + +/** + * Transform DSL code by injecting await and async wrappers. + * + * @param {string} code - The sync-looking DSL code + * @param {Set} asyncFunctionNames - Names of functions that are async (tool functions) + * @returns {string} Transformed code with await injected, wrapped in async IIFE + */ +export function transformDSL(code, asyncFunctionNames) { + let ast; + try { + ast = acorn.parse(code, { + ecmaVersion: 2022, + sourceType: 'script', + allowReturnOutsideFunction: true, + }); + } catch (e) { + throw new Error(`Transform parse error: ${e.message}`); + } + + // Collect insertions: { offset, text } sorted by offset descending + // We insert from end to start so offsets don't shift + const insertions = []; + + // Track which arrow/function expressions need to be marked async + const functionsNeedingAsync = new Set(); + + // Find the enclosing function for a given node position + function findEnclosingFunction(node) { + // Walk the AST to find parent functions + // We'll use a different approach: collect all functions and their ranges + return null; // Handled by the parent tracking below + } + + // First pass: collect all function scopes with their ranges + const functionScopes = []; + walk.full(ast, (node) => { + if (node.type === 'ArrowFunctionExpression' || node.type === 'FunctionExpression') { + functionScopes.push(node); + } + }); + + // Second pass: find async calls and determine what needs transformation + walk.full(ast, (node) => { + if (node.type !== 'CallExpression') return; + + const calleeName = getCalleeName(node); + if (!calleeName || !asyncFunctionNames.has(calleeName)) return; + + // This call needs await. Check if it's already awaited. + // (It shouldn't be since we block AwaitExpression in the validator, + // but be defensive.) + + // Insert 'await ' before the call expression + insertions.push({ offset: node.start, text: 'await ' }); + + // Find the enclosing function (if any) and mark it as needing async + for (const fn of functionScopes) { + if (fn.body.start <= node.start && fn.body.end >= node.end) { + functionsNeedingAsync.add(fn); + } + } + }); + + // Also check: if 'map' is called with a callback that contains async calls, + // mark that callback as async. The callback is typically the second argument. + walk.full(ast, (node) => { + if (node.type !== 'CallExpression') return; + const calleeName = getCalleeName(node); + if (calleeName !== 'map' || node.arguments.length < 2) return; + + const callback = node.arguments[1]; + if (callback.type === 'ArrowFunctionExpression' || callback.type === 'FunctionExpression') { + // Check if this callback contains any async tool calls + let hasAsyncCall = false; + walk.full(callback, (inner) => { + if (inner.type === 'CallExpression') { + const innerName = getCalleeName(inner); + if (innerName && asyncFunctionNames.has(innerName)) { + hasAsyncCall = true; + } + } + }); + if (hasAsyncCall) { + functionsNeedingAsync.add(callback); + } + } + }); + + // Third pass: inject loop guards (__checkLoop()) into while/for loops + walk.full(ast, (node) => { + if (node.type === 'WhileStatement' || node.type === 'ForStatement' || node.type === 'ForOfStatement' || node.type === 'ForInStatement') { + // Insert __checkLoop(); at the start of the loop body + const body = node.body; + if (body.type === 'BlockStatement' && body.body.length > 0) { + // Insert after the opening brace + insertions.push({ offset: body.start + 1, text: ' __checkLoop();' }); + } + } + }); + + // Build insertions for async markers on functions + for (const fn of functionsNeedingAsync) { + // Insert 'async ' before the function + // For arrow functions: `(x) => ...` → `async (x) => ...` + // For function expressions: `function(x) { ... }` → `async function(x) { ... }` + insertions.push({ offset: fn.start, text: 'async ' }); + } + + // Sort insertions by offset descending (apply from end to preserve offsets) + insertions.sort((a, b) => b.offset - a.offset); + + // Apply insertions to the source code + let transformed = code; + for (const ins of insertions) { + transformed = transformed.slice(0, ins.offset) + ins.text + transformed.slice(ins.offset); + } + + // Wrap in async IIFE with return so SandboxJS awaits the result + transformed = `return (async () => {\n${transformed}\n})()`; + + return transformed; +} + +/** + * Extract the function name from a CallExpression callee. + * Handles: `foo()` → 'foo', `obj.foo()` → 'foo' (for member access) + * + * @param {import('acorn').Node} callExpr + * @returns {string|null} + */ +function getCalleeName(callExpr) { + const callee = callExpr.callee; + if (callee.type === 'Identifier') { + return callee.name; + } + // For member expressions like mcp_server.tool(), get the full dotted name + // But our tools use flat names like mcp_github_create_issue, so Identifier is sufficient + return null; +} diff --git a/npm/src/agent/dsl/trigger-test.mjs b/npm/src/agent/dsl/trigger-test.mjs new file mode 100644 index 00000000..20a7965f --- /dev/null +++ b/npm/src/agent/dsl/trigger-test.mjs @@ -0,0 +1,159 @@ +#!/usr/bin/env node +/** + * Trigger test: verifies that the agent picks execute_plan for the right queries. + * + * Runs the real ProbeAgent with enableExecutePlan=true and observes which tools + * get called for different types of questions. This tests the tool-selection + * logic end-to-end — the system prompt, tool descriptions, and LLM decision-making. + * + * Usage: + * node npm/src/agent/dsl/trigger-test.mjs + * + * Requires: GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI_API_KEY in .env + */ + +import { ProbeAgent } from '../ProbeAgent.js'; +import { config } from 'dotenv'; +import { resolve, dirname } from 'path'; +import { fileURLToPath } from 'url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const projectRoot = resolve(__dirname, '../../../..'); + +config({ path: resolve(projectRoot, '.env') }); + +// Check for API key +const apiKey = process.env.GOOGLE_GENERATIVE_AI_API_KEY || process.env.GOOGLE_API_KEY; +if (!apiKey) { + console.error('ERROR: No Google API key found. Set GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI_API_KEY'); + process.exit(1); +} + +// ── Test definitions ── +// Each test has a query and an expected tool choice +const tests = [ + // ── Should trigger execute_plan ── + { + name: 'Aggregate question (all patterns)', + query: 'Find ALL error handling patterns across the entire codebase and give me a comprehensive summary covering every module.', + expectTool: 'execute_plan', + reason: 'Aggregate question needing full data coverage + "ALL" + "comprehensive" + "every module"', + }, + { + name: 'Multi-topic bulk scan', + query: 'Search for authentication, authorization, and session management patterns. Analyze each topic across the full codebase and produce a security report.', + expectTool: 'execute_plan', + reason: 'Multiple topics + full codebase scan + synthesis', + }, + { + name: 'Open-ended discovery', + query: 'What are all the different testing approaches used in this codebase? Give me a complete inventory.', + expectTool: 'execute_plan', + reason: 'Open-ended, needs discovery + comprehensive scan', + }, + + // ── Should NOT trigger execute_plan ── + { + name: 'Simple search (specific function)', + query: 'How does the validateDSL function work?', + expectTool: 'search', + reason: 'Specific function lookup, 1-2 tool calls', + }, + { + name: 'Simple search (single concept)', + query: 'What is the timeout configuration for the DSL runtime?', + expectTool: 'search', + reason: 'Narrow question, single concept', + }, +]; + +// ── Test runner ── +let testNum = 0; +let passed = 0; +let failed = 0; + +async function runTriggerTest(test) { + testNum++; + console.log(`\n${'─'.repeat(70)}`); + console.log(`▶ Test ${testNum}: ${test.name}`); + console.log(` Query: "${test.query.substring(0, 100)}${test.query.length > 100 ? '...' : ''}"`); + console.log(` Expected tool: ${test.expectTool}`); + + const toolCalls = []; + + const agent = new ProbeAgent({ + path: projectRoot, + provider: 'google', + model: 'gemini-2.5-flash', + enableExecutePlan: true, + maxIterations: 3, // Only need first few iterations to see what tool gets picked + }); + + // Listen for tool call events + agent.events.on('toolCall', (event) => { + if (event.status === 'started') { + toolCalls.push(event.name); + console.log(` [tool] ${event.name}`); + } + }); + + await agent.initialize(); + + const start = Date.now(); + try { + await agent.answer(test.query); + } catch (e) { + // May hit maxIterations limit — that's fine, we just want tool selection + if (!e.message?.includes('iteration') && !e.message?.includes('cancelled')) { + console.log(` [warn] Agent error: ${e.message?.substring(0, 150)}`); + } + } + + const elapsed = Date.now() - start; + const firstMeaningfulTool = toolCalls.find(t => + t === 'execute_plan' || t === 'analyze_all' || t === 'search' || t === 'query' + ); + + console.log(` All tool calls: [${toolCalls.join(', ')}]`); + console.log(` First meaningful tool: ${firstMeaningfulTool || '(none)'}`); + + const toolMatch = firstMeaningfulTool === test.expectTool; + + if (toolMatch) { + console.log(` ✓ PASSED — picked ${firstMeaningfulTool} as expected (${elapsed}ms)`); + passed++; + } else { + console.log(` ✗ FAILED — expected ${test.expectTool}, got ${firstMeaningfulTool || '(none)'} (${elapsed}ms)`); + console.log(` Reason it should use ${test.expectTool}: ${test.reason}`); + failed++; + } + + try { + await agent.close(); + } catch (e) { + // ignore cleanup errors + } +} + +async function main() { + console.log('═'.repeat(70)); + console.log(' Execute Plan Trigger Tests — Tool Selection Verification'); + console.log('═'.repeat(70)); + console.log(`\nRunning with: enableExecutePlan=true, provider=google, model=gemini-2.5-flash`); + console.log(`Project root: ${projectRoot}`); + + for (const test of tests) { + await runTriggerTest(test); + } + + console.log(`\n${'═'.repeat(70)}`); + console.log(` Results: ${passed} passed, ${failed} failed, ${testNum} total`); + console.log('═'.repeat(70)); + + process.exit(failed > 0 ? 1 : 0); +} + +main().catch(e => { + console.error('Fatal error:', e); + process.exit(1); +}); diff --git a/npm/src/agent/dsl/validator.js b/npm/src/agent/dsl/validator.js new file mode 100644 index 00000000..1ec4902c --- /dev/null +++ b/npm/src/agent/dsl/validator.js @@ -0,0 +1,183 @@ +/** + * DSL Validator - AST whitelist validation for LLM-generated code. + * + * Parses code with Acorn and walks the AST, rejecting any node type + * not in the whitelist. This is an allow-list approach — unknown syntax + * is rejected by default. + */ + +import * as acorn from 'acorn'; +import * as walk from 'acorn-walk'; + +// Node types the LLM is allowed to generate +const ALLOWED_NODE_TYPES = new Set([ + 'Program', + 'ExpressionStatement', + 'BlockStatement', + 'VariableDeclaration', + 'VariableDeclarator', + 'ArrowFunctionExpression', + 'FunctionExpression', + 'CallExpression', + 'MemberExpression', + 'Identifier', + 'Literal', + 'TemplateLiteral', + 'TemplateElement', + 'ArrayExpression', + 'ObjectExpression', + 'SpreadElement', + 'IfStatement', + 'ConditionalExpression', + 'ForOfStatement', + 'ForInStatement', + 'ForStatement', + 'WhileStatement', + 'TryStatement', + 'CatchClause', + 'ThrowStatement', + 'ReturnStatement', + 'BreakStatement', + 'ContinueStatement', + 'AssignmentExpression', + 'UpdateExpression', + 'BinaryExpression', + 'LogicalExpression', + 'UnaryExpression', + 'Property', + 'SequenceExpression', + 'ChainExpression', +]); + +// Identifiers that are never allowed +const BLOCKED_IDENTIFIERS = new Set([ + 'eval', + 'Function', + 'require', + 'process', + 'globalThis', + '__proto__', + 'constructor', + 'prototype', + 'import', + 'exports', + 'setTimeout', + 'setInterval', + 'setImmediate', + 'queueMicrotask', + 'Proxy', + 'Reflect', + 'Symbol', +]); + +// Property names that are never allowed on member expressions +const BLOCKED_PROPERTIES = new Set([ + '__proto__', + 'constructor', + 'prototype', + '__defineGetter__', + '__defineSetter__', + '__lookupGetter__', + '__lookupSetter__', +]); + +/** + * Validate DSL code against the whitelist. + * + * @param {string} code - The LLM-generated code to validate + * @returns {{ valid: boolean, errors: string[] }} + */ +export function validateDSL(code) { + const errors = []; + + // Step 1: Parse with Acorn + let ast; + try { + ast = acorn.parse(code, { + ecmaVersion: 2022, + sourceType: 'script', + allowReturnOutsideFunction: true, + }); + } catch (e) { + return { valid: false, errors: [`Syntax error: ${e.message}`] }; + } + + // Step 2: Walk every node and validate + walk.full(ast, (node) => { + // Check node type against whitelist + if (!ALLOWED_NODE_TYPES.has(node.type)) { + errors.push(`Blocked node type: ${node.type} at position ${node.start}`); + return; + } + + // Block async functions (LLM should not write async/await) + if ( + (node.type === 'ArrowFunctionExpression' || + node.type === 'FunctionExpression') && + node.async + ) { + errors.push(`Async functions are not allowed at position ${node.start}. Write synchronous code — the runtime handles async.`); + } + + // Block generator functions + if ( + (node.type === 'FunctionExpression') && + node.generator + ) { + errors.push(`Generator functions are not allowed at position ${node.start}`); + } + + // Block regex literals — SandboxJS doesn't support them + if (node.type === 'Literal' && node.regex) { + errors.push(`Regex literals are not supported at position ${node.start}. Use String methods like indexOf(), includes(), startsWith() instead.`); + } + + // Check identifiers against blocklist + if (node.type === 'Identifier' && BLOCKED_IDENTIFIERS.has(node.name)) { + errors.push(`Blocked identifier: '${node.name}' at position ${node.start}`); + } + + // Check member expressions for blocked properties + if (node.type === 'MemberExpression' && !node.computed) { + if (node.property.type === 'Identifier' && BLOCKED_PROPERTIES.has(node.property.name)) { + errors.push(`Blocked property access: '.${node.property.name}' at position ${node.property.start}`); + } + } + + // Block computed member expressions with blocked string literals + if (node.type === 'MemberExpression' && node.computed) { + if (node.property.type === 'Literal' && typeof node.property.value === 'string') { + if (BLOCKED_PROPERTIES.has(node.property.value) || BLOCKED_IDENTIFIERS.has(node.property.value)) { + errors.push(`Blocked computed property access: '["${node.property.value}"]' at position ${node.property.start}`); + } + } + } + + // Block variable declarations named with blocked identifiers + if (node.type === 'VariableDeclarator' && node.id.type === 'Identifier') { + if (BLOCKED_IDENTIFIERS.has(node.id.name)) { + errors.push(`Cannot declare variable with blocked name: '${node.id.name}' at position ${node.id.start}`); + } + } + }); + + return { + valid: errors.length === 0, + errors, + }; +} + +/** + * Parse DSL code into an AST. + * Exported for use by the transformer. + * + * @param {string} code + * @returns {import('acorn').Node} + */ +export function parseDSL(code) { + return acorn.parse(code, { + ecmaVersion: 2022, + sourceType: 'script', + allowReturnOutsideFunction: true, + }); +} diff --git a/npm/src/agent/index.js b/npm/src/agent/index.js index 1986e289..b3261056 100644 --- a/npm/src/agent/index.js +++ b/npm/src/agent/index.js @@ -142,6 +142,8 @@ function parseArgs() { skillDirs: null, // Comma-separated list of repo-relative skill directories // Task management enableTasks: false, // Enable task tracking for progress management + // Execute plan DSL tool + enableExecutePlan: false, // Bash tool configuration enableBash: false, bashAllow: null, @@ -218,6 +220,8 @@ function parseArgs() { config.skillDirs = args[++i].split(',').map(dir => dir.trim()).filter(Boolean); } else if (arg === '--allow-tasks') { config.enableTasks = true; + } else if (arg === '--enable-execute-plan') { + config.enableExecutePlan = true; } else if (arg === '--enable-bash') { config.enableBash = true; } else if (arg === '--bash-allow' && i + 1 < args.length) { @@ -295,6 +299,9 @@ Options: --no-mermaid-validation Disable automatic mermaid diagram validation and fixing --help, -h Show this help message +DSL Orchestration: + --enable-execute-plan Enable execute_plan DSL tool for programmatic orchestration + Bash Tool Options: --enable-bash Enable bash command execution for system exploration --bash-allow Additional bash command patterns to allow (comma-separated) @@ -843,6 +850,7 @@ async function main() { disableTools: config.disableTools, allowSkills: config.allowSkills, skillDirs: config.skillDirs, + enableExecutePlan: config.enableExecutePlan, enableBash: config.enableBash, bashConfig: bashConfig, enableTasks: config.enableTasks diff --git a/npm/src/agent/probeTool.js b/npm/src/agent/probeTool.js index e3c53189..a1c2689f 100644 --- a/npm/src/agent/probeTool.js +++ b/npm/src/agent/probeTool.js @@ -211,6 +211,15 @@ export function createWrappedTools(baseTools) { ); } + // Wrap execute_plan tool + if (baseTools.executePlanTool) { + wrappedTools.executePlanToolInstance = wrapToolWithEmitter( + baseTools.executePlanTool, + 'execute_plan', + baseTools.executePlanTool.execute + ); + } + // Wrap bash tool if (baseTools.bashTool) { wrappedTools.bashToolInstance = wrapToolWithEmitter( diff --git a/npm/src/agent/tools.js b/npm/src/agent/tools.js index 58b70a56..74fb5f09 100644 --- a/npm/src/agent/tools.js +++ b/npm/src/agent/tools.js @@ -5,6 +5,7 @@ import { extractTool, delegateTool, analyzeAllTool, + createExecutePlanTool, bashTool, editTool, createTool, @@ -16,6 +17,7 @@ import { extractSchema, delegateSchema, analyzeAllSchema, + executePlanSchema, bashSchema, editSchema, createSchema, @@ -24,6 +26,7 @@ import { extractToolDefinition, delegateToolDefinition, analyzeAllToolDefinition, + getExecutePlanToolDefinition, bashToolDefinition, editToolDefinition, createToolDefinition, @@ -58,7 +61,10 @@ export function createTools(configOptions) { if (configOptions.enableDelegate && isToolAllowed('delegate')) { tools.delegateTool = delegateTool(configOptions); } - if (isToolAllowed('analyze_all')) { + if (configOptions.enableExecutePlan && isToolAllowed('execute_plan')) { + tools.executePlanTool = createExecutePlanTool(configOptions); + } else if (isToolAllowed('analyze_all')) { + // analyze_all is fallback when execute_plan is not enabled tools.analyzeAllTool = analyzeAllTool(configOptions); } @@ -97,6 +103,7 @@ export { extractSchema, delegateSchema, analyzeAllSchema, + executePlanSchema, bashSchema, editSchema, createSchema, @@ -106,6 +113,7 @@ export { extractToolDefinition, delegateToolDefinition, analyzeAllToolDefinition, + getExecutePlanToolDefinition, bashToolDefinition, editToolDefinition, createToolDefinition, diff --git a/npm/src/index.js b/npm/src/index.js index 8a460384..a8d39a0b 100644 --- a/npm/src/index.js +++ b/npm/src/index.js @@ -26,6 +26,7 @@ import { extractSchema, delegateSchema, analyzeAllSchema, + executePlanSchema, attemptCompletionSchema, bashSchema, searchToolDefinition, @@ -46,6 +47,7 @@ import { createToolDefinition } from './tools/edit.js'; import { searchTool, queryTool, extractTool, delegateTool, analyzeAllTool } from './tools/vercel.js'; +import { createExecutePlanTool, getExecutePlanToolDefinition } from './tools/executePlan.js'; import { bashTool } from './tools/bash.js'; import { editTool, createTool } from './tools/edit.js'; import { ProbeAgent } from './agent/ProbeAgent.js'; @@ -90,6 +92,7 @@ export { extractTool, delegateTool, analyzeAllTool, + createExecutePlanTool, bashTool, editTool, createTool, @@ -102,6 +105,7 @@ export { extractSchema, delegateSchema, analyzeAllSchema, + executePlanSchema, attemptCompletionSchema, bashSchema, editSchema, @@ -112,6 +116,7 @@ export { extractToolDefinition, delegateToolDefinition, analyzeAllToolDefinition, + getExecutePlanToolDefinition, attemptCompletionToolDefinition, bashToolDefinition, editToolDefinition, diff --git a/npm/src/tools/common.js b/npm/src/tools/common.js index 52cdb3ab..34f51b37 100644 --- a/npm/src/tools/common.js +++ b/npm/src/tools/common.js @@ -54,6 +54,11 @@ export const analyzeAllSchema = z.object({ path: z.string().optional().default('.').describe('Directory path to search in') }); +export const executePlanSchema = z.object({ + code: z.string().min(1).describe('JavaScript DSL code to execute. All function calls look synchronous — do NOT use async/await. Use map(items, fn) for batch operations. Use LLM(instruction, data) for AI processing.'), + description: z.string().optional().describe('Human-readable description of what this plan does, for logging.') +}); + // Schema for the attempt_completion tool - flexible validation for direct XML response export const attemptCompletionSchema = { // Custom validation that requires result parameter but allows direct XML response @@ -425,6 +430,7 @@ export const DEFAULT_VALID_TOOLS = [ 'extract', 'delegate', 'analyze_all', + 'execute_plan', 'listSkills', 'useSkill', 'listFiles', @@ -463,6 +469,7 @@ function getValidParamsForTool(toolName) { extract: extractSchema, delegate: delegateSchema, analyze_all: analyzeAllSchema, + execute_plan: executePlanSchema, listSkills: listSkillsSchema, useSkill: useSkillSchema, bash: bashSchema, diff --git a/npm/src/tools/executePlan.js b/npm/src/tools/executePlan.js new file mode 100644 index 00000000..3f9e14a4 --- /dev/null +++ b/npm/src/tools/executePlan.js @@ -0,0 +1,761 @@ +/** + * execute_plan tool - DSL-based programmatic orchestration. + * + * Allows the LLM to write small JavaScript programs that orchestrate + * tool calls, keeping intermediate data out of the agent's context window. + */ + +import { tool } from 'ai'; +import { executePlanSchema, parseAndResolvePaths } from './common.js'; +import { createDSLRuntime } from '../agent/dsl/runtime.js'; +import { search } from '../search.js'; +import { query } from '../query.js'; +import { extract } from '../extract.js'; +import { delegate } from '../delegate.js'; +import { glob } from 'glob'; + +export { executePlanSchema }; + +/** + * Strip markdown fences and XML tags that LLMs sometimes wrap code in. + */ +function stripCodeWrapping(code) { + let s = String(code || ''); + // Strip markdown code fences + s = s.replace(/^```(?:javascript|js)?\n?/gm, '').replace(/```$/gm, ''); + // Strip XML-style tags: , , , + s = s.replace(/<\/?(?:execute_plan|code)>/g, ''); + return s.trim(); +} + +/** + * Build DSL-compatible tool implementations from the agent's configOptions. + * + * @param {Object} configOptions - Agent config (sessionId, cwd, provider, model, etc.) + * @returns {Object} toolImplementations for createDSLRuntime + */ +function buildToolImplementations(configOptions) { + const { sessionId, cwd } = configOptions; + const tools = {}; + + tools.search = { + execute: async (params) => { + try { + let searchPaths; + if (params.path) { + searchPaths = parseAndResolvePaths(params.path, cwd); + } + if (!searchPaths || searchPaths.length === 0) { + searchPaths = [cwd || '.']; + } + return await search({ + query: params.query, + path: searchPaths.join(' '), + cwd, + allowTests: true, + exact: params.exact || false, + json: false, + maxTokens: 20000, + session: sessionId, + timeout: 60, + }); + } catch (e) { + return `Search error: ${e.message}`; + } + }, + }; + + tools.query = { + execute: async (params) => { + try { + let queryPath = cwd || '.'; + if (params.path) { + const resolved = parseAndResolvePaths(params.path, cwd); + if (resolved.length > 0) queryPath = resolved[0]; + } + return await query({ + pattern: params.pattern, + path: queryPath, + cwd, + language: params.language || 'rust', + allowTests: params.allow_tests ?? true, + }); + } catch (e) { + return `Query error: ${e.message}`; + } + }, + }; + + tools.extract = { + execute: async (params) => { + try { + if (!params.targets && !params.input_content) { + return 'Extract error: no file path provided. Usage: extract("path/to/file.md")'; + } + return await extract({ + files: params.targets ? [params.targets] : undefined, + content: params.input_content || undefined, + cwd, + allowTests: params.allow_tests ?? true, + }); + } catch (e) { + return `Extract error: ${e.message}`; + } + }, + }; + + tools.listFiles = { + execute: async (params) => { + try { + const files = await glob(params.pattern || '**/*', { + cwd: cwd || '.', + ignore: ['node_modules/**', '.git/**'], + nodir: true, + }); + files.sort(); + return files; + } catch (e) { + return `listFiles error: ${e.message}`; + } + }, + }; + + return tools; +} + +/** + * Build an llmCall function using delegate with disableTools. + * + * Uses the full delegate infrastructure (OTEL, retries, fallbacks, schema support) + * but with tools disabled and maxIterations: 1 since LLM() is pure text processing. + * + * @param {Object} configOptions - Agent config + * @returns {Function} llmCall(instruction, data, options?) => Promise + */ +function buildLLMCall(configOptions) { + const { provider, model, debug, tracer, sessionId } = configOptions; + + return async (instruction, data, options = {}) => { + const dataStr = data == null ? '' : (typeof data === 'string' ? data : JSON.stringify(data, null, 2)); + const task = `${instruction}\n\n---\n\n${dataStr || '(empty)'}`; + + return delegate({ + task, + disableTools: true, + maxIterations: 1, + provider, + model, + debug, + tracer, + parentSessionId: sessionId, + schema: options.schema || null, + timeout: options.timeout || 120, + }); + }; +} + +/** + * Create the execute_plan tool for the Vercel AI SDK. + * + * Accepts EITHER: + * - Agent configOptions (sessionId, cwd, provider, model, etc.) — auto-builds tools + LLM via delegate + * - Direct DSL options (toolImplementations, llmCall, etc.) — used as-is (tests, manual scripts) + * + * @param {Object} options + * @returns {Object} Vercel AI SDK tool + */ +export function createExecutePlanTool(options) { + let runtimeOptions; + let llmCallFn; + const tracer = options.tracer || null; + + // Session-scoped store persists across execute_plan calls within the same agent session + const sessionStore = options.sessionStore || {}; + + // Output buffer for direct-to-user content (bypasses LLM context window) + const outputBuffer = options.outputBuffer || null; + + if (options.toolImplementations) { + // Direct DSL options — used by tests and manual scripts + runtimeOptions = { ...options, tracer, sessionStore, outputBuffer }; + llmCallFn = options.llmCall; + } else { + // Agent configOptions — build everything from the agent's config + llmCallFn = buildLLMCall(options); + runtimeOptions = { + toolImplementations: buildToolImplementations(options), + llmCall: llmCallFn, + mcpBridge: options.mcpBridge || null, + mcpTools: options.mcpTools || {}, + mapConcurrency: options.mapConcurrency || 5, + timeoutMs: options.timeoutMs || 300000, + maxLoopIterations: options.maxLoopIterations || 5000, + tracer, + sessionStore, + outputBuffer, + }; + } + + const runtime = createDSLRuntime(runtimeOptions); + const maxRetries = options.maxRetries ?? 2; + + return tool({ + description: 'Execute a JavaScript DSL program to orchestrate tool calls. ' + + 'Use for batch processing, paginated APIs, multi-step workflows where intermediate data is large. ' + + 'Write simple synchronous-looking code — do NOT use async/await.', + parameters: executePlanSchema, + execute: async ({ code, description }) => { + // Create top-level OTEL span for the entire execute_plan invocation + const planSpan = tracer?.createToolSpan?.('execute_plan', { + 'dsl.description': description || '', + 'dsl.code_length': code.length, + 'dsl.code': code, + 'dsl.max_retries': maxRetries, + }) || null; + + // Strip XML tags and markdown fences LLMs sometimes wrap code in + let currentCode = stripCodeWrapping(code); + let lastError = null; + let finalOutput; + + try { + for (let attempt = 0; attempt <= maxRetries; attempt++) { + // On retry, ask the LLM to fix the code + if (attempt > 0 && llmCallFn && lastError) { + planSpan?.addEvent?.('dsl.self_heal_start', { + 'dsl.attempt': attempt, + 'dsl.error': lastError.substring(0, 1000), + }); + + try { + const fixPrompt = `The following DSL script failed with an error. Fix the script and return ONLY the corrected JavaScript code — no markdown, no explanation, no backtick fences. + +ORIGINAL SCRIPT: +${currentCode} + +ERROR: +${lastError} + +RULES REMINDER: +- search(query) is KEYWORD SEARCH — pass a search query, NOT a filename. Use extract(filepath) to read file contents. +- search(), query(), extract(), listFiles(), bash() all return STRINGS, not arrays. +- Use chunk(stringData) to split a string into an array of chunks. +- Use map(array, fn) only with arrays. Do NOT pass strings to map(). +- Do NOT use .map(), .forEach(), .filter(), .join() — use for..of loops instead. +- Do NOT define helper functions that call tools — write logic inline. +- Do NOT use async/await, template literals, or shorthand properties. +- Do NOT use regex literals (/pattern/) — use String methods like indexOf, includes, startsWith instead. +- String concatenation with +, not template literals.`; + + const fixedCode = await llmCallFn(fixPrompt, '', { maxTokens: 4000, temperature: 0.2 }); + // Strip markdown fences and XML tags the LLM might add + currentCode = stripCodeWrapping(fixedCode); + + planSpan?.addEvent?.('dsl.self_heal_complete', { + 'dsl.attempt': attempt, + 'dsl.fixed_code_length': currentCode.length, + }); + + if (!currentCode) { + finalOutput = `Plan execution failed after ${attempt} retries: LLM returned empty fix.\n\nLast error: ${lastError}`; + planSpan?.setAttributes?.({ 'dsl.result': 'empty_fix', 'dsl.attempts': attempt }); + planSpan?.setStatus?.('ERROR'); + planSpan?.end?.(); + return finalOutput; + } + } catch (fixError) { + finalOutput = `Plan execution failed and self-heal failed: ${fixError.message}\n\nOriginal error: ${lastError}`; + planSpan?.setAttributes?.({ 'dsl.result': 'self_heal_error', 'dsl.attempts': attempt }); + planSpan?.setStatus?.('ERROR'); + planSpan?.end?.(); + return finalOutput; + } + } + + const result = await runtime.execute(currentCode, description); + + if (result.status === 'success') { + finalOutput = formatSuccess(result, description, attempt, outputBuffer); + planSpan?.setAttributes?.({ + 'dsl.result': 'success', + 'dsl.attempts': attempt, + 'dsl.self_healed': attempt > 0, + 'dsl.result_length': finalOutput.length, + 'dsl.log_count': result.logs.length, + }); + planSpan?.setStatus?.('OK'); + planSpan?.end?.(); + return finalOutput; + } + + // Execution failed — prepare for retry + const logOutput = result.logs.length > 0 ? `\nLogs: ${result.logs.join(' | ')}` : ''; + lastError = `${result.error}${logOutput}`; + + planSpan?.addEvent?.('dsl.execution_failed', { + 'dsl.attempt': attempt, + 'dsl.error': lastError.substring(0, 1000), + }); + } + + // All retries exhausted + finalOutput = `Plan execution failed after ${maxRetries} retries.\n\nLast error: ${lastError}`; + planSpan?.setAttributes?.({ + 'dsl.result': 'all_retries_exhausted', + 'dsl.attempts': maxRetries, + 'dsl.last_error': lastError?.substring(0, 1000), + }); + planSpan?.setStatus?.('ERROR'); + planSpan?.end?.(); + return finalOutput; + } catch (e) { + planSpan?.setStatus?.('ERROR'); + planSpan?.addEvent?.('exception', { + 'exception.message': e.message, + 'exception.stack': e.stack, + }); + planSpan?.end?.(); + throw e; + } + }, + }); +} + +function formatSuccess(result, description, attempt, outputBuffer) { + let output = ''; + + if (description) { + output += `Plan: ${description}\n\n`; + } + + if (attempt > 0) { + output += `(Self-healed after ${attempt} ${attempt === 1 ? 'retry' : 'retries'})\n\n`; + } + + if (result.logs.length > 0) { + const userLogs = result.logs.filter(l => !l.startsWith('[runtime]') && !l.startsWith('[output]')); + if (userLogs.length > 0) { + output += `Logs:\n${userLogs.join('\n')}\n\n`; + } + } + + // Format the result value + const resultValue = result.result; + if (resultValue === undefined || resultValue === null) { + output += 'Plan completed (no return value).'; + } else if (typeof resultValue === 'string') { + output += `Result:\n${resultValue}`; + } else { + try { + output += `Result:\n${JSON.stringify(resultValue, null, 2)}`; + } catch { + output += `Result: ${String(resultValue)}`; + } + } + + // If output buffer has content, tell the LLM the data was written to direct output + if (outputBuffer && outputBuffer.items && outputBuffer.items.length > 0) { + const totalChars = outputBuffer.items.reduce((sum, item) => sum + item.length, 0); + output += `\n\n[Output buffer: ${totalChars} chars written via output(). This content will be appended directly to your response. Do NOT repeat or summarize it.]`; + } + + return output; +} + +/** + * XML tool definition for the system prompt. + * + * @param {string[]} availableFunctions - List of available DSL function names + * @returns {string} Tool definition text + */ +export function getExecutePlanToolDefinition(availableFunctions = []) { + const funcList = availableFunctions.length > 0 + ? availableFunctions.join(', ') + : 'search, query, extract, LLM, map, chunk, batch, listFiles, bash, log, range, flatten, unique, groupBy, parseJSON, storeSet, storeGet, storeAppend, storeKeys, storeGetAll, output'; + + return `## execute_plan +Description: Execute a JavaScript DSL program to orchestrate tool calls. Use for batch processing, large data analysis, and multi-step workflows where intermediate data is large. + +ALWAYS use this tool when: +- The question asks about "all", "every", "comprehensive", "complete", or "inventory" of something +- The question covers multiple topics or requires scanning across the full codebase +- Open-ended discovery questions where you don't know the right search keywords (use the discovery-first pattern) +- Processing large search results that exceed context limits +- Iterating over paginated APIs or many files +- Batch operations with the same logic applied to many items +- Chaining multiple tool calls where intermediate data is large + +Do NOT use this tool for: +- Simple single searches or extractions (1-2 tool calls) +- Questions about a specific function, class, or file +- Tasks where you need to see and reason about every detail of results + +Parameters: +- code: (required) JavaScript DSL code to execute. Write synchronous-looking code — do NOT use async/await. +- description: (optional) Human-readable description of what this plan does. + + + +Discovery-first analysis (RECOMMENDED for open-ended questions — explore before searching): + + +const files = listFiles("**/*"); +const sample = search("initial keyword"); +const plan = LLM( + "Based on this repo structure and sample results, suggest the best search strategy. " + + "Return JSON: {keywords: [2-4 queries], extractionFocus: string, aggregation: string}. ONLY valid JSON.", + "Files:\\n" + String(files).substring(0, 3000) + "\\nSample:\\n" + String(sample).substring(0, 3000) +); +const strategy = parseJSON(plan); +log("Strategy: " + strategy.keywords.length + " keywords"); +const allFindings = []; +for (const kw of strategy.keywords) { + const results = search(kw); + if (String(results).length > 500) { + const chunks = chunk(results); + const findings = map(chunks, (c) => LLM(strategy.extractionFocus, c)); + for (const f of findings) { allFindings.push(String(f)); } + } +} +var combined = ""; +for (const f of allFindings) { combined = combined + f + "\\n---\\n"; } +return LLM("Synthesize all findings into a comprehensive answer.", combined); + +Discover optimal search strategy, then analyze + + +Analyze large search results: + + +const results = search("error handling"); +const chunks = chunk(results); +log("Processing " + chunks.length + " chunks"); +const extracted = map(chunks, (c) => LLM("List error handling patterns found. Be brief.", c)); +var combined = ""; +for (const e of extracted) { combined = combined + String(e) + "\\n---\\n"; } +return LLM("Combine into a summary.", combined); + +Analyze error handling patterns across the codebase + + +Multi-topic analysis: + + +const topics = ["authentication", "authorization"]; +const allFindings = []; +for (const topic of topics) { + const results = search(topic); + const chunks = chunk(results); + const findings = map(chunks, (c) => LLM("Extract key findings about " + topic + ". Be brief.", c)); + for (const f of findings) { allFindings.push(String(f)); } +} +var combined = ""; +for (const f of allFindings) { combined = combined + f + "\\n---\\n"; } +return LLM("Synthesize all findings into a report.", combined); + +Cross-topic analysis of auth patterns + + +Process each file individually (use extract to read files, NOT search): + + +const files = listFiles("**/*.md"); +log("Found " + files.length + " files"); +const batches = batch(files, 5); +const results = []; +for (const b of batches) { + const batchResults = map(b, (filepath) => { + try { + const content = extract(filepath); + if (String(content).length > 100) { + const info = LLM("Extract: customer name, industry, key use case. Return JSON: {customer, industry, useCase}. ONLY JSON.", content); + try { return parseJSON(info); } catch (e) { return null; } + } + } catch (e) { return null; } + return null; + }); + for (const r of batchResults) { if (r) { results.push(r); } } + log("Batch done, total: " + results.length); +} +var table = "| Customer | Industry | Use Case |"; +for (const r of results) { + table = table + "\n| " + r.customer + " | " + r.industry + " | " + r.useCase + " |"; +} +return table; + +Read each file with extract() and classify with LLM + + + + +### Rules +- Write simple, synchronous-looking JavaScript. Do NOT use async/await — the runtime injects it automatically. +- Do NOT use: class, new, import, require, eval, this, Promise, async, await, setTimeout. +- Do NOT use these as variable names: eval, Function, require, process, globalThis, constructor, prototype, exports, Proxy, Reflect, Symbol. +- Use \`map(items, fn)\` for **parallel** batch processing. Use \`for..of\` only for **sequential** logic where order matters. +- **CRITICAL: When processing multiple files**, use \`batch(files, 5)\` + \`map(batch, fn)\` for parallel processing. NEVER use a sequential for..of loop with LLM() or extract() calls on many files — it will timeout. +- Do NOT use Array.prototype.map (.map()) — use the global \`map()\` function instead. +- Use \`LLM(instruction, data)\` for AI processing — returns a string. +- Use \`log(message)\` for debugging — messages appear in the output. +- Use \`parseJSON(text)\` instead of \`JSON.parse()\` when parsing LLM output — LLM responses often have markdown fences. +- Tool functions never throw — on error they return an \`"ERROR: ..."\` string. Check with \`if (result.indexOf("ERROR:") === 0)\` to handle errors. +- Always use explicit property assignment: \`{ key: value }\` not shorthand \`{ key }\`. +- String concatenation with \`+\`, no template literals with backticks. +- Use \`String(value)\` before calling \`.trim()\`, \`.split()\`, or \`.length\` on tool results. +- Use \`for (const item of array)\` loops instead of \`.forEach()\`, \`.map()\`, \`.filter()\`, or \`.join()\` array methods. +- Do NOT define helper functions that call tools. Write all logic inline or use for..of loops. +- Do NOT use regex literals (/pattern/) — use String methods like indexOf, includes, startsWith instead. +- ONLY use functions listed below. Do NOT call functions that are not listed. + +### Available functions + +**Tools (async, auto-awaited):** +${funcList} + +**Return types — IMPORTANT:** +- \`search(query)\` → **keyword search** — pass a search query (e.g. "error handling"), NOT a filename. Returns a **string** (matching code snippets). To process parts, use \`chunk()\` to split it. +- \`query(pattern)\` → **AST search** — pass a tree-sitter pattern. Returns a **string** (matching code elements). +- \`extract(targets)\` → **read file contents** — pass a file path like "src/main.js" or "src/main.js:42". Use this to read specific files found by listFiles(). Returns a **string**. +- \`listFiles(pattern)\` → **list files** — pass a glob pattern like "**/*.md". Returns an **array** of file path strings. Use directly with \`for (const f of listFiles("**/*.md"))\`. +- \`LLM(instruction, data)\` → returns a **string** (AI response) +- \`map(array, fn)\` → returns an **array** of results. First argument MUST be an array. +- \`bash(command)\` → returns a **string** (command output) + +**COMMON MISTAKE:** Do NOT use \`search(filename)\` to read a file's contents — search() is for keyword queries. Use \`extract(filepath)\` to read file contents. + +**Parallel processing:** +- \`map(array, fn)\` — process array items **in parallel** (concurrency=3). Use this for batch operations, NOT for..of loops. + +**Utilities (sync):** +- \`chunk(data, tokens)\` — split a string into token-sized array of chunks (default 20000 tokens). Returns an **array of strings**. +- \`batch(array, size)\` — split an array into sub-arrays of \`size\` (default 10). Returns an **array of arrays**. +- \`log(message)\` — log a message (collected in output) +- \`range(start, end)\` — generate array of integers [start, end) +- \`flatten(arr)\` — flatten one level of nesting +- \`unique(arr)\` — deduplicate array +- \`groupBy(arr, key)\` — group array of objects by key or function +- \`parseJSON(text)\` — **safely parse JSON from LLM responses**. Strips markdown fences and extracts JSON. ALWAYS use \`parseJSON()\` instead of \`JSON.parse()\` when parsing LLM output. + +**Direct output (sync):** +- \`output(content)\` — **write content directly to the user's response**, bypassing LLM rewriting. Use for large tables, JSON, or CSV that should be delivered verbatim. Can be called multiple times; all content is appended to the final response. The \`return\` value still goes to the tool result for you to see. + +**Session store (sync, persists across execute_plan calls):** +- \`storeSet(key, value)\` — store a value that persists across execute_plan calls in this session +- \`storeGet(key)\` — retrieve a stored value (returns undefined if not found) +- \`storeAppend(key, item)\` — append item to an array in the store (auto-creates array if key doesn't exist) +- \`storeKeys()\` — list all keys in the store +- \`storeGetAll()\` — return entire store as a plain object + +### Patterns + +**Pattern 1: Discovery-first (RECOMMENDED for open-ended questions)** +When you don't know the right keywords, explore the repo first, then use LLM to determine the best search strategy: +\`\`\` +// Phase 1: Discover repo structure and test queries +const files = listFiles("**/*"); +const sample = search("initial keyword guess"); +log("Files overview length: " + String(files).length + ", sample length: " + String(sample).length); + +// Phase 2: Ask LLM to determine optimal strategy based on what exists +const plan = LLM( + "Based on this repository structure and sample search results, determine the best search strategy. " + + "Return a JSON object with: keywords (array of 2-4 search queries that will find relevant data), " + + "extractionFocus (what to extract from each result), " + + "and aggregation (summarize, list_unique, count, or group_by). " + + "IMPORTANT: Only suggest keywords likely to match actual content you see. Return ONLY valid JSON.", + "Repository files:\\n" + String(files).substring(0, 3000) + "\\nSample results:\\n" + String(sample).substring(0, 3000) +); +const strategy = parseJSON(plan); +log("Strategy: " + strategy.keywords.length + " keywords, focus: " + strategy.extractionFocus); + +// Phase 3: Execute with discovered strategy +const allFindings = []; +for (const kw of strategy.keywords) { + const results = search(kw); + if (String(results).length > 500) { + const chunks = chunk(results); + const findings = map(chunks, (c) => LLM(strategy.extractionFocus, c)); + for (const f of findings) { allFindings.push(String(f)); } + log("Keyword '" + kw + "': " + chunks.length + " chunks processed"); + } else { + log("Keyword '" + kw + "': skipped (too few results)"); + } +} +var combined = ""; +for (const f of allFindings) { combined = combined + f + "\\n---\\n"; } +return LLM("Synthesize all findings into a comprehensive answer.", combined); +\`\`\` + +**Pattern 2: Large result analysis** +search() returns a big string. Split into 20K-token chunks, process in parallel, synthesize: +\`\`\` +const results = search("error handling"); +const chunks = chunk(results); +log("Processing " + chunks.length + " chunks"); +const extracted = map(chunks, (c) => LLM("List error handling patterns found. Be brief.", c)); +var combined = ""; +for (const e of extracted) { combined = combined + String(e) + "\\n---\\n"; } +return LLM("Combine into a summary. Max 5 bullet points.", combined); +\`\`\` + +**Pattern 3: Paginated API with while loop** +For APIs that return pages of results: +\`\`\` +const allItems = []; +let page = 1; +while (true) { + const result = mcp_api_list_items({ page: page, per_page: 50 }); + for (const item of result.items) { + allItems.push(item); + } + log("Page " + page + ": " + result.items.length + " items"); + if (!result.has_more) break; + page = page + 1; +} +return allItems; +\`\`\` + +**Pattern 4: Batch classify/process with map** +For processing many items in parallel: +\`\`\` +const items = mcp_api_get_tickets({ status: "open" }); +const classified = map(items, (item) => { + const sentiment = LLM("Classify as positive, negative, or neutral. Return ONLY the word.", item.description); + return { id: item.id, title: item.title, sentiment: String(sentiment).trim() }; +}); +return groupBy(classified, "sentiment"); +\`\`\` + +**Pattern 5: Multi-search with error handling** +For searching multiple topics and combining results: +\`\`\` +const queries = ["authentication", "authorization", "session management"]; +const results = []; +for (const q of queries) { + try { + const r = search(q); + if (r.length > 500) { + const summary = LLM("Summarize the key patterns found. Be concise.", r); + results.push({ query: q, summary: summary }); + } else { + results.push({ query: q, summary: "No significant results" }); + } + } catch (e) { + results.push({ query: q, summary: "Search failed" }); + } +} +return LLM("Combine these findings into a security overview.", results); +\`\`\` + +**Pattern 6: Iterative deepening** +Search broadly, then drill into specific findings: +\`\`\` +const broad = search("database"); +const keyFunctions = LLM("List the 3 most important function names. Return comma-separated, nothing else.", broad); +const names = []; +const parts = keyFunctions.split(","); +for (const p of parts) { + const trimmed = p.trim(); + if (trimmed.length > 0) names.push(trimmed); +} +const details = map(names, (fn) => { + const code = search(fn); + return { name: fn, analysis: LLM("Explain what " + fn + " does in 2 sentences.", code) }; +}); +return details; +\`\`\` + +**Pattern 7: Multi-topic analysis with chunking** +Search multiple topics, chunk each result, process in parallel: +\`\`\` +const topics = ["authentication", "authorization", "session"]; +const allFindings = []; +for (const topic of topics) { + const results = search(topic); + const chunks = chunk(results); + const findings = map(chunks, (c) => LLM("Extract key patterns for " + topic + ". Be brief.", c)); + for (const f of findings) { allFindings.push(String(f)); } + log("Processed " + topic + ": " + chunks.length + " chunks"); +} +var combined = ""; +for (const f of allFindings) { combined = combined + f + "\\n---\\n"; } +return LLM("Synthesize all findings into a security report.", combined); +\`\`\` + +**Pattern 8: Batched file processing** +Process many files in parallel batches: +\`\`\` +const files = listFiles("*.js"); +log("Found " + files.length + " files"); +const batches = batch(files, 5); +const allResults = []; +for (const b of batches) { + const batchResults = map(b, (file) => { + const content = extract(file); + return LLM("Summarize this file in one sentence.", content); + }); + for (const r of batchResults) { allResults.push(r); } + log("Processed batch, total: " + allResults.length); +} +return allResults; +\`\`\` + +**Pattern 9: Data pipeline with session store** +Extract structured data, accumulate, compute statistics with pure JS, format as table: +\`\`\` +// Phase 1: Extract structured data from search results +const results = search("API endpoints"); +const chunks = chunk(results); +const extracted = map(chunks, (c) => LLM( + "Extract API endpoints as JSON array: [{method, path, description}]. Return ONLY valid JSON.", + c +)); +for (const e of extracted) { + const parsed = parseJSON(String(e)); + if (parsed) { + for (const item of parsed) { storeAppend("endpoints", item); } + } else { log("Parse error, skipping chunk"); } +} + +// Phase 2: Compute statistics with pure JS (no LLM needed) +const all = storeGet("endpoints"); +log("Total endpoints: " + all.length); +const byMethod = groupBy(all, "method"); +var table = "| Method | Count | % |\\n|--------|-------|---|\\n"; +const methods = Object.keys(byMethod); +for (const m of methods) { + const count = byMethod[m].length; + const pct = Math.round(count / all.length * 100); + table = table + "| " + m + " | " + count + " | " + pct + "% |\\n"; +} + +// Phase 3: Small LLM summary of the statistics +const summary = LLM("Write a 2-sentence executive summary of this API surface analysis.", table); +return table + "\\n" + summary; +\`\`\` + +**Pattern 10: Direct output for large structured data** +Use \`output()\` to deliver tables/JSON directly to the user without LLM rewriting. The \`return\` value is what you (the AI) see as the tool result: +\`\`\` +const files = listFiles("**/*.md"); +const batches = batch(files, 5); +const results = []; +for (const b of batches) { + const batchResults = map(b, (f) => { + try { + const content = extract(f); + return LLM("Extract: name, category. Return JSON: {name, category}. ONLY JSON.", content); + } catch (e) { return null; } + }); + for (const r of batchResults) { + try { if (r) results.push(parseJSON(r)); } catch (e) { /* skip */ } + } +} +var table = "| Name | Category |\\n|------|----------|\\n"; +for (const r of results) { + table = table + "| " + (r.name || "?") + " | " + (r.category || "?") + " |\\n"; +} +output(table); +return "Generated table with " + results.length + " items."; +\`\`\``; +} diff --git a/npm/src/tools/index.js b/npm/src/tools/index.js index 3c681632..1143d0b5 100644 --- a/npm/src/tools/index.js +++ b/npm/src/tools/index.js @@ -11,6 +11,9 @@ export { editTool, createTool } from './edit.js'; // Export LangChain tools export { createSearchTool, createQueryTool, createExtractTool } from './langchain.js'; +// Export execute_plan tool +export { createExecutePlanTool, getExecutePlanToolDefinition } from './executePlan.js'; + // Export common schemas and utilities export { searchSchema, @@ -18,6 +21,7 @@ export { extractSchema, delegateSchema, bashSchema, + executePlanSchema, delegateDescription, delegateToolDefinition, bashDescription, diff --git a/npm/tests/unit/dsl-runtime.test.js b/npm/tests/unit/dsl-runtime.test.js new file mode 100644 index 00000000..1b3b0d6a --- /dev/null +++ b/npm/tests/unit/dsl-runtime.test.js @@ -0,0 +1,880 @@ +import { createDSLRuntime } from '../../src/agent/dsl/runtime.js'; + +// Mock tool implementations +function createMockTools() { + return { + search: { + execute: async (params) => `search results for: ${params.query}`, + }, + extract: { + execute: async (params) => `extracted: ${params.targets}`, + }, + listFiles: { + execute: async (params) => ['file1.js', 'file2.js', 'file3.js'], + }, + }; +} + +function createMockLLM() { + return async (instruction, data, options = {}) => { + return `LLM processed: ${instruction} with ${typeof data === 'string' ? data.substring(0, 50) : JSON.stringify(data).substring(0, 50)}`; + }; +} + +describe('DSL Runtime', () => { + let runtime; + + beforeEach(() => { + runtime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + mapConcurrency: 2, + }); + }); + + describe('basic execution', () => { + test('executes simple return', async () => { + const result = await runtime.execute('return 42;'); + expect(result.status).toBe('success'); + expect(result.result).toBe(42); + }); + + test('executes variable declarations', async () => { + const result = await runtime.execute('const x = 1; const y = 2; return x + y;'); + expect(result.status).toBe('success'); + expect(result.result).toBe(3); + }); + + test('executes string operations', async () => { + const result = await runtime.execute('const s = "hello"; return s.toUpperCase();'); + expect(result.status).toBe('success'); + expect(result.result).toBe('HELLO'); + }); + + test('executes array operations', async () => { + const result = await runtime.execute(` + const arr = [3, 1, 2]; + return arr.sort(); + `); + expect(result.status).toBe('success'); + expect(result.result).toEqual([1, 2, 3]); + }); + }); + + describe('tool calls', () => { + test('calls search tool', async () => { + const result = await runtime.execute('const r = search("test query"); return r;'); + expect(result.status).toBe('success'); + expect(result.result).toContain('search results for: test query'); + }); + + test('calls listFiles tool', async () => { + const result = await runtime.execute('const files = listFiles("*.js"); return files;'); + expect(result.status).toBe('success'); + expect(result.result).toEqual(['file1.js', 'file2.js', 'file3.js']); + }); + + test('calls LLM', async () => { + const result = await runtime.execute('const r = LLM("summarize", "some data"); return r;'); + expect(result.status).toBe('success'); + expect(result.result).toContain('LLM processed: summarize'); + }); + + test('chains tool calls', async () => { + const result = await runtime.execute(` + const searchResult = search("functions"); + const summary = LLM("summarize these results", searchResult); + return summary; + `); + expect(result.status).toBe('success'); + expect(result.result).toContain('LLM processed: summarize'); + }); + }); + + describe('map() with concurrency', () => { + test('processes items with map()', async () => { + const result = await runtime.execute(` + const items = [1, 2, 3]; + const results = map(items, (item) => LLM("process", item)); + return results; + `); + expect(result.status).toBe('success'); + expect(result.result).toHaveLength(3); + expect(result.result[0]).toContain('LLM processed'); + }); + + test('respects concurrency limit', async () => { + let concurrent = 0; + let maxConcurrent = 0; + + const slowRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: async (instruction, data) => { + concurrent++; + maxConcurrent = Math.max(maxConcurrent, concurrent); + await new Promise(r => setTimeout(r, 50)); + concurrent--; + return `done: ${data}`; + }, + mapConcurrency: 2, + }); + + const result = await slowRuntime.execute(` + const items = [1, 2, 3, 4, 5]; + const results = map(items, (item) => LLM("process", item)); + return results; + `); + expect(result.status).toBe('success'); + expect(result.result).toHaveLength(5); + expect(maxConcurrent).toBeLessThanOrEqual(2); + }, 10000); + }); + + describe('utility functions', () => { + test('chunk() splits data', async () => { + const result = await runtime.execute(` + const data = "a".repeat(100000); + const chunks = chunk(data, 5000); + return chunks.length; + `); + expect(result.status).toBe('success'); + expect(result.result).toBeGreaterThan(1); + }); + + test('range() generates array', async () => { + const result = await runtime.execute('return range(0, 5);'); + expect(result.status).toBe('success'); + expect(result.result).toEqual([0, 1, 2, 3, 4]); + }); + + test('flatten() flattens arrays', async () => { + const result = await runtime.execute('return flatten([[1,2],[3,4]]);'); + expect(result.status).toBe('success'); + expect(result.result).toEqual([1, 2, 3, 4]); + }); + + test('unique() deduplicates', async () => { + const result = await runtime.execute('return unique([1,2,2,3,3,3]);'); + expect(result.status).toBe('success'); + expect(result.result).toEqual([1, 2, 3]); + }); + + test('batch() splits array into sub-arrays', async () => { + const result = await runtime.execute('return batch([1,2,3,4,5,6,7], 3);'); + expect(result.status).toBe('success'); + expect(result.result).toEqual([[1,2,3],[4,5,6],[7]]); + }); + + test('groupBy() groups array', async () => { + const result = await runtime.execute(` + const items = [{type:"a", v:1}, {type:"b", v:2}, {type:"a", v:3}]; + return groupBy(items, "type"); + `); + expect(result.status).toBe('success'); + expect(result.result.a).toHaveLength(2); + expect(result.result.b).toHaveLength(1); + }); + + test('parseJSON() strips markdown fences from LLM output', async () => { + const result = await runtime.execute(` + var raw = ' \\\`\\\`\\\`json\\n[{"name":"a"},{"name":"b"}]\\n\\\`\\\`\\\` '; + return parseJSON(raw); + `); + expect(result.status).toBe('success'); + expect(result.result).toEqual([{ name: 'a' }, { name: 'b' }]); + }); + + test('parseJSON() handles clean JSON without fences', async () => { + const result = await runtime.execute(` + return parseJSON('{"key": "value"}'); + `); + expect(result.status).toBe('success'); + expect(result.result).toEqual({ key: 'value' }); + }); + + test('parseJSON() extracts JSON from surrounding text', async () => { + const result = await runtime.execute(` + return parseJSON('Here is the result: [{"a":1}] end'); + `); + expect(result.status).toBe('success'); + expect(result.result).toEqual([{ a: 1 }]); + }); + + test('log() collects messages', async () => { + const result = await runtime.execute(` + log("step 1"); + log("step 2"); + return "done"; + `); + expect(result.status).toBe('success'); + expect(result.logs).toContain('step 1'); + expect(result.logs).toContain('step 2'); + }); + }); + + describe('validation errors', () => { + test('rejects async keyword', async () => { + const result = await runtime.execute('const fn = async () => 1;'); + expect(result.status).toBe('error'); + expect(result.error).toContain('Validation failed'); + }); + + test('rejects eval', async () => { + const result = await runtime.execute('eval("1+1");'); + expect(result.status).toBe('error'); + expect(result.error).toContain('Validation failed'); + }); + + test('rejects require', async () => { + const result = await runtime.execute('const fs = require("fs");'); + expect(result.status).toBe('error'); + expect(result.error).toContain('Validation failed'); + }); + + test('rejects class', async () => { + const result = await runtime.execute('class Foo {}'); + expect(result.status).toBe('error'); + expect(result.error).toContain('Validation failed'); + }); + }); + + describe('timeout and loop guards', () => { + test('times out long-running execution', async () => { + const slowRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: async (instruction, data) => { + await new Promise(r => setTimeout(r, 5000)); + return 'done'; + }, + timeoutMs: 500, + }); + + const result = await slowRuntime.execute(` + const r = LLM("slow", "data"); + return r; + `); + expect(result.status).toBe('error'); + expect(result.error).toContain('timed out'); + }, 10000); + + test('stops infinite while loops', async () => { + const guardedRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + maxLoopIterations: 10, + }); + + const result = await guardedRuntime.execute(` + let i = 0; + while (true) { + i = i + 1; + } + return i; + `); + expect(result.status).toBe('error'); + expect(result.error).toContain('Loop exceeded maximum'); + }); + + test('stops runaway for loops', async () => { + const guardedRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + maxLoopIterations: 5, + }); + + const result = await guardedRuntime.execute(` + let sum = 0; + for (let i = 0; i < 100; i = i + 1) { + sum = sum + i; + } + return sum; + `); + expect(result.status).toBe('error'); + expect(result.error).toContain('Loop exceeded maximum'); + }); + + test('allows loops within limit', async () => { + const guardedRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + maxLoopIterations: 100, + }); + + const result = await guardedRuntime.execute(` + let sum = 0; + for (let i = 0; i < 10; i = i + 1) { + sum = sum + i; + } + return sum; + `); + expect(result.status).toBe('success'); + expect(result.result).toBe(45); + }); + + test('counts iterations across multiple loops', async () => { + const guardedRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + maxLoopIterations: 15, + }); + + const result = await guardedRuntime.execute(` + let a = 0; + for (let i = 0; i < 10; i = i + 1) { + a = a + 1; + } + let b = 0; + for (let j = 0; j < 10; j = j + 1) { + b = b + 1; + } + return a + b; + `); + expect(result.status).toBe('error'); + expect(result.error).toContain('Loop exceeded maximum'); + }); + }); + + // NOTE: SandboxJS has a known issue with async error propagation. + // Errors thrown inside async globals escape the promise chain as unhandled rejections + // instead of being caught by our try/catch around exec().run(). + // This will be addressed in a future iteration — options include: + // 1. Wrapping tool globals to return error objects instead of throwing + // 2. Using process.on('unhandledRejection') to capture escaping errors + // 3. Using SandboxJS sync mode (compile instead of compileAsync) with a different approach + // For now, tool implementations should not throw — they should return error values. + + describe('OTEL tracing', () => { + function createMockTracer() { + const spans = []; + const events = []; + return { + spans, + events, + createToolSpan: (name, attrs = {}) => { + const span = { + name, + attributes: { ...attrs }, + events: [], + status: null, + ended: false, + setAttributes: (a) => Object.assign(span.attributes, a), + setStatus: (s) => { span.status = s; }, + addEvent: (n, a = {}) => { span.events.push({ name: n, ...a }); }, + end: () => { span.ended = true; }, + }; + spans.push(span); + return span; + }, + addEvent: (name, attrs = {}) => { + events.push({ name, ...attrs }); + }, + recordToolResult: (toolName, result, success, durationMs, metadata = {}) => { + events.push({ name: 'tool.result', toolName, success, durationMs, ...metadata }); + }, + }; + } + + test('traces individual tool calls (search, LLM)', async () => { + const mockTracer = createMockTracer(); + const tracedRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + tracer: mockTracer, + }); + + const result = await tracedRuntime.execute(` + const r = search("test"); + const summary = LLM("summarize", r); + return summary; + `); + + expect(result.status).toBe('success'); + + // Should have spans for dsl.search and dsl.LLM + const toolSpanNames = mockTracer.spans.map(s => s.name); + expect(toolSpanNames).toContain('dsl.search'); + expect(toolSpanNames).toContain('dsl.LLM'); + + // All spans should be ended + for (const span of mockTracer.spans) { + expect(span.ended).toBe(true); + expect(span.status).toBe('OK'); + } + + // Should have tool.result events + const resultEvents = mockTracer.events.filter(e => e.name === 'tool.result'); + expect(resultEvents.length).toBeGreaterThanOrEqual(2); + }); + + test('traces map() calls', async () => { + const mockTracer = createMockTracer(); + const tracedRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + tracer: mockTracer, + }); + + const result = await tracedRuntime.execute(` + const items = [1, 2, 3]; + const results = map(items, (item) => LLM("process", item)); + return results; + `); + + expect(result.status).toBe('success'); + + // Should have a dsl.map span + 3 dsl.LLM spans + const mapSpans = mockTracer.spans.filter(s => s.name === 'dsl.map'); + const llmSpans = mockTracer.spans.filter(s => s.name === 'dsl.LLM'); + expect(mapSpans.length).toBe(1); + expect(llmSpans.length).toBe(3); + }); + + test('records runtime phase events', async () => { + const mockTracer = createMockTracer(); + const tracedRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + tracer: mockTracer, + }); + + await tracedRuntime.execute('return 42;'); + + // Should have validate, transform, execute phase events + const eventNames = mockTracer.events.map(e => e.name); + expect(eventNames).toContain('dsl.phase.validate_start'); + expect(eventNames).toContain('dsl.phase.validate_complete'); + expect(eventNames).toContain('dsl.phase.transform_start'); + expect(eventNames).toContain('dsl.phase.transform_complete'); + expect(eventNames).toContain('dsl.phase.execute_start'); + expect(eventNames).toContain('dsl.phase.execute_complete'); + }); + + test('records failure events on error', async () => { + const mockTracer = createMockTracer(); + const tracedRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + tracer: mockTracer, + }); + + await tracedRuntime.execute('eval("bad")'); + + const eventNames = mockTracer.events.map(e => e.name); + expect(eventNames).toContain('dsl.phase.validate_failed'); + }); + }); + + describe('session store', () => { + test('storeSet and storeGet within a single execution', async () => { + const result = await runtime.execute(` + storeSet("key1", "value1"); + return storeGet("key1"); + `); + expect(result.status).toBe('success'); + expect(result.result).toBe('value1'); + }); + + test('store persists across multiple execute() calls', async () => { + const r1 = await runtime.execute('storeSet("counter", 42); return "stored";'); + expect(r1.status).toBe('success'); + + const r2 = await runtime.execute('return storeGet("counter");'); + expect(r2.status).toBe('success'); + expect(r2.result).toBe(42); + }); + + test('storeAppend creates array and accumulates items', async () => { + const storeRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + }); + + await storeRuntime.execute('storeAppend("items", "a");'); + await storeRuntime.execute('storeAppend("items", "b");'); + await storeRuntime.execute('storeAppend("items", "c");'); + + const result = await storeRuntime.execute('return storeGet("items");'); + expect(result.status).toBe('success'); + expect(result.result).toEqual(['a', 'b', 'c']); + }); + + test('storeKeys returns all stored keys', async () => { + const storeRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + }); + + await storeRuntime.execute('storeSet("x", 1); storeSet("y", 2);'); + const result = await storeRuntime.execute('return storeKeys();'); + expect(result.status).toBe('success'); + expect(result.result.sort()).toEqual(['x', 'y']); + }); + + test('storeGetAll returns copy of entire store', async () => { + const storeRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + }); + + await storeRuntime.execute('storeSet("a", 1); storeSet("b", 2);'); + const result = await storeRuntime.execute('return storeGetAll();'); + expect(result.status).toBe('success'); + expect(result.result).toEqual({ a: 1, b: 2 }); + }); + + test('storeGet returns undefined for missing keys', async () => { + const result = await runtime.execute('return storeGet("nonexistent");'); + expect(result.status).toBe('success'); + expect(result.result).toBeUndefined(); + }); + + test('storeSet overwrites existing values', async () => { + const storeRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + }); + + await storeRuntime.execute('storeSet("key", "old");'); + await storeRuntime.execute('storeSet("key", "new");'); + const result = await storeRuntime.execute('return storeGet("key");'); + expect(result.status).toBe('success'); + expect(result.result).toBe('new'); + }); + + test('storeAppend works with objects', async () => { + const storeRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + }); + + await storeRuntime.execute('storeAppend("items", {name: "a", value: 1});'); + await storeRuntime.execute('storeAppend("items", {name: "b", value: 2});'); + const result = await storeRuntime.execute('return storeGet("items");'); + expect(result.status).toBe('success'); + expect(result.result).toEqual([ + { name: 'a', value: 1 }, + { name: 'b', value: 2 }, + ]); + }); + + test('different runtime instances have separate stores', async () => { + const runtime1 = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + }); + const runtime2 = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + }); + + await runtime1.execute('storeSet("session", "runtime1");'); + await runtime2.execute('storeSet("session", "runtime2");'); + + const r1 = await runtime1.execute('return storeGet("session");'); + const r2 = await runtime2.execute('return storeGet("session");'); + + expect(r1.result).toBe('runtime1'); + expect(r2.result).toBe('runtime2'); + }); + + test('data pipeline pattern with store', async () => { + const pipelineRuntime = createDSLRuntime({ + toolImplementations: { + search: { + execute: async () => 'GET /users\nPOST /users\nDELETE /users/:id', + }, + }, + llmCall: async (instruction, data) => { + if (instruction.includes('Extract')) { + return JSON.stringify([ + { method: 'GET', path: '/users' }, + { method: 'POST', path: '/users' }, + { method: 'DELETE', path: '/users/:id' }, + ]); + } + return 'Summary: 3 endpoints found'; + }, + }); + + const result = await pipelineRuntime.execute(` + const results = search("endpoints"); + const chunks = chunk(results); + for (const c of chunks) { + var parsed = JSON.parse(String(LLM("Extract endpoints as JSON", c))); + for (const item of parsed) { storeAppend("endpoints", item); } + } + var all = storeGet("endpoints"); + log("Total: " + all.length); + var byMethod = groupBy(all, "method"); + return { total: all.length, methods: Object.keys(byMethod) }; + `); + expect(result.status).toBe('success'); + expect(result.result.total).toBe(3); + expect(result.result.methods.sort()).toEqual(['DELETE', 'GET', 'POST']); + }); + }); + + describe('error-safe tool returns', () => { + test('tool error returns ERROR: string instead of throwing', async () => { + const errorRuntime = createDSLRuntime({ + toolImplementations: { + search: { + execute: async () => { throw new Error('search failed'); }, + }, + }, + llmCall: createMockLLM(), + }); + + const result = await errorRuntime.execute(` + const r = search("test"); + return r; + `); + expect(result.status).toBe('success'); + expect(result.result).toBe('ERROR: search failed'); + }); + + test('parseJSON returns null on invalid input', async () => { + const result = await runtime.execute(` + const r = parseJSON("not valid json"); + return r; + `); + expect(result.status).toBe('success'); + expect(result.result).toBeNull(); + }); + + test('error-resilient loop via string check', async () => { + const errorRuntime = createDSLRuntime({ + toolImplementations: { + extract: { + execute: async (params) => { + if (params.targets === 'bad.js') throw new Error('file not found'); + return 'content of ' + params.targets; + }, + }, + }, + llmCall: createMockLLM(), + }); + + const result = await errorRuntime.execute(` + const files = ["good.js", "bad.js", "other.js"]; + const results = []; + for (const f of files) { + const content = extract(f); + if (typeof content === "string" && content.indexOf("ERROR:") === 0) { + results.push("err: " + f); + } else { + results.push("ok: " + f); + } + } + return results; + `); + expect(result.status).toBe('success'); + expect(result.result).toEqual([ + 'ok: good.js', + 'err: bad.js', + 'ok: other.js', + ]); + }); + + test('LLM error returns ERROR: string', async () => { + const errorRuntime = createDSLRuntime({ + toolImplementations: {}, + llmCall: async () => { throw new Error('API rate limited'); }, + }); + + const result = await errorRuntime.execute(` + const r = LLM("test", "data"); + return r; + `); + expect(result.status).toBe('success'); + expect(result.result).toBe('ERROR: API rate limited'); + }); + + test('errors are logged', async () => { + const errorRuntime = createDSLRuntime({ + toolImplementations: { + search: { + execute: async () => { throw new Error('timeout'); }, + }, + }, + llmCall: createMockLLM(), + }); + + const result = await errorRuntime.execute(` + search("test"); + return "done"; + `); + expect(result.status).toBe('success'); + expect(result.logs.some(l => l.includes('[search]') && l.includes('ERROR: timeout'))).toBe(true); + }); + }); + + describe('end-to-end scenario', () => { + test('analyze_all replacement pattern', async () => { + const analyzeRuntime = createDSLRuntime({ + toolImplementations: { + search: { + execute: async (params) => { + return `File1:\nfunction getUser() {}\n\nFile2:\nfunction deleteUser() {}\n\nFile3:\nfunction updateUser() {}`; + }, + }, + }, + llmCall: async (instruction, data) => { + if (instruction.includes('Extract')) { + return ['getUser', 'deleteUser', 'updateUser']; + } + if (instruction.includes('Organize')) { + return { crud: { read: ['getUser'], delete: ['deleteUser'], update: ['updateUser'] } }; + } + return data; + }, + mapConcurrency: 3, + }); + + const result = await analyzeRuntime.execute(` + const results = search("user functions", "./src"); + const chunks = chunk(results); + const extracted = map(chunks, (c) => LLM("Extract function names", c)); + return LLM("Organize by CRUD operation", flatten(extracted)); + `); + + expect(result.status).toBe('success'); + expect(result.result).toHaveProperty('crud'); + expect(result.result.crud.read).toContain('getUser'); + }, 10000); + }); + + describe('output buffer', () => { + test('output() writes to buffer', async () => { + const outputBuffer = { items: [] }; + const bufRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + outputBuffer, + }); + + const result = await bufRuntime.execute(` + output("hello world"); + return "done"; + `); + expect(result.status).toBe('success'); + expect(result.result).toBe('done'); + expect(outputBuffer.items).toEqual(['hello world']); + }); + + test('output() can be called multiple times', async () => { + const outputBuffer = { items: [] }; + const bufRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + outputBuffer, + }); + + const result = await bufRuntime.execute(` + output("line 1"); + output("line 2"); + output("line 3"); + return "done"; + `); + expect(result.status).toBe('success'); + expect(outputBuffer.items).toEqual(['line 1', 'line 2', 'line 3']); + }); + + test('output() stringifies non-string content', async () => { + const outputBuffer = { items: [] }; + const bufRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + outputBuffer, + }); + + const result = await bufRuntime.execute(` + output({name: "test", value: 42}); + return "done"; + `); + expect(result.status).toBe('success'); + expect(outputBuffer.items).toHaveLength(1); + const parsed = JSON.parse(outputBuffer.items[0]); + expect(parsed).toEqual({ name: 'test', value: 42 }); + }); + + test('output() ignores null and undefined', async () => { + const outputBuffer = { items: [] }; + const bufRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + outputBuffer, + }); + + const result = await bufRuntime.execute(` + output(null); + output(undefined); + output("valid"); + return "done"; + `); + expect(result.status).toBe('success'); + expect(outputBuffer.items).toEqual(['valid']); + }); + + test('output buffer persists across multiple execute() calls', async () => { + const outputBuffer = { items: [] }; + const bufRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + outputBuffer, + }); + + await bufRuntime.execute('output("from call 1");'); + await bufRuntime.execute('output("from call 2");'); + + expect(outputBuffer.items).toEqual(['from call 1', 'from call 2']); + }); + + test('output() and return are independent', async () => { + const outputBuffer = { items: [] }; + const bufRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + outputBuffer, + }); + + const result = await bufRuntime.execute(` + output("| Col1 | Col2 |\\n| a | b |"); + return "Table with 1 row generated"; + `); + expect(result.status).toBe('success'); + expect(result.result).toBe('Table with 1 row generated'); + expect(outputBuffer.items[0]).toContain('| Col1 | Col2 |'); + }); + + test('output() not available when no outputBuffer provided', async () => { + const result = await runtime.execute(` + if (typeof output === "undefined") { + return "output not available"; + } + output("test"); + return "output available"; + `); + expect(result.status).toBe('success'); + expect(result.result).toBe('output not available'); + }); + + test('output() logs buffer write notification', async () => { + const outputBuffer = { items: [] }; + const bufRuntime = createDSLRuntime({ + toolImplementations: createMockTools(), + llmCall: createMockLLM(), + outputBuffer, + }); + + const result = await bufRuntime.execute(` + output("some content"); + return "done"; + `); + expect(result.status).toBe('success'); + const outputLog = result.logs.find(l => l.startsWith('[output]')); + expect(outputLog).toBeDefined(); + expect(outputLog).toContain('chars written to output buffer'); + }); + }); +}); diff --git a/npm/tests/unit/dsl-transformer.test.js b/npm/tests/unit/dsl-transformer.test.js new file mode 100644 index 00000000..300e53aa --- /dev/null +++ b/npm/tests/unit/dsl-transformer.test.js @@ -0,0 +1,144 @@ +import { transformDSL } from '../../src/agent/dsl/transformer.js'; + +const ASYNC_FUNCS = new Set(['search', 'query', 'extract', 'LLM', 'map', 'listFiles', 'searchFiles', 'bash', 'mcp_github_create_issue']); + +describe('DSL Transformer', () => { + describe('await injection', () => { + test('injects await before async function calls', () => { + const code = 'const r = search("query");'; + const result = transformDSL(code, ASYNC_FUNCS); + expect(result).toContain('await search("query")'); + }); + + test('injects await before multiple calls', () => { + const code = ` + const a = search("foo"); + const b = LLM("summarize", a); + return b; + `; + const result = transformDSL(code, ASYNC_FUNCS); + expect(result).toContain('await search("foo")'); + expect(result).toContain('await LLM("summarize", a)'); + }); + + test('does not inject await before non-async functions', () => { + const code = 'const x = chunk(data, 20000); return x;'; + const result = transformDSL(code, ASYNC_FUNCS); + expect(result).not.toContain('await chunk'); + expect(result).toContain('chunk(data, 20000)'); + }); + + test('injects await before MCP tool calls', () => { + const code = 'const issue = mcp_github_create_issue({ title: "test" });'; + const result = transformDSL(code, ASYNC_FUNCS); + expect(result).toContain('await mcp_github_create_issue'); + }); + + test('handles map() with callback containing async calls', () => { + const code = ` + const results = map(items, (item) => LLM("process", item)); + `; + const result = transformDSL(code, ASYNC_FUNCS); + // map itself should be awaited + expect(result).toContain('await map'); + // The callback should be marked async + expect(result).toContain('async (item)'); + // LLM inside callback should be awaited + expect(result).toContain('await LLM("process", item)'); + }); + }); + + describe('async IIFE wrapping', () => { + test('wraps code in async IIFE with return', () => { + const code = 'return 42;'; + const result = transformDSL(code, ASYNC_FUNCS); + expect(result).toMatch(/^return \(async \(\) => \{/); + expect(result).toMatch(/\}\)\(\)$/); + }); + }); + + describe('complex programs', () => { + test('typical analyze_all replacement', () => { + const code = ` +const results = search("API endpoints", "./src"); +const chunks = chunk(results, 20000); +const extracted = map(chunks, (c) => LLM("Extract endpoints", c)); +return LLM("Organize", extracted); + `.trim(); + const result = transformDSL(code, ASYNC_FUNCS); + + // search should be awaited + expect(result).toContain('await search("API endpoints"'); + // chunk should NOT be awaited (it's sync) + expect(result).not.toContain('await chunk('); + // map should be awaited + expect(result).toContain('await map('); + // LLM in callback should be awaited + expect(result).toContain('await LLM("Extract endpoints"'); + // Final LLM should be awaited + expect(result).toContain('await LLM("Organize"'); + }); + + test('preserves code structure', () => { + const code = ` +const items = [1, 2, 3]; +if (items.length > 0) { + const result = search("test"); + return result; +} +return null; + `.trim(); + const result = transformDSL(code, ASYNC_FUNCS); + // Structure should be preserved + expect(result).toContain('if (items.length > 0)'); + expect(result).toContain('await search("test")'); + expect(result).toContain('return null'); + }); + + test('handles nested function calls', () => { + const code = 'return LLM("process", search("query"));'; + const result = transformDSL(code, ASYNC_FUNCS); + expect(result).toContain('await LLM'); + expect(result).toContain('await search'); + }); + }); + + describe('try/catch passthrough', () => { + test('preserves try/catch structure unchanged', () => { + const code = ` + try { + search("test"); + } catch (e) { + log("error: " + e); + } + `; + const result = transformDSL(code, ASYNC_FUNCS); + // try/catch passes through — tools return error strings, no catch param rewriting needed + expect(result).toContain('catch (e)'); + expect(result).not.toContain('__getLastError'); + expect(result).not.toContain('__setLastError'); + }); + }); + + describe('edge cases', () => { + test('handles empty code', () => { + expect(() => transformDSL('', ASYNC_FUNCS)).not.toThrow(); + }); + + test('handles code with no async calls', () => { + const code = 'const x = 1 + 2; return x;'; + const result = transformDSL(code, ASYNC_FUNCS); + // Should still wrap in async IIFE + expect(result).toContain('async'); + // But no await should be inserted + expect(result).not.toContain('await'); + }); + + test('handles code with only utility calls', () => { + const code = 'const r = range(0, 10); return flatten([r, r]);'; + const result = transformDSL(code, ASYNC_FUNCS); + expect(result).not.toContain('await range'); + expect(result).not.toContain('await flatten'); + }); + }); +}); diff --git a/npm/tests/unit/dsl-validator.test.js b/npm/tests/unit/dsl-validator.test.js new file mode 100644 index 00000000..49ed7d5b --- /dev/null +++ b/npm/tests/unit/dsl-validator.test.js @@ -0,0 +1,237 @@ +import { validateDSL } from '../../src/agent/dsl/validator.js'; + +describe('DSL Validator', () => { + describe('valid programs', () => { + test('simple variable assignment and return', () => { + const result = validateDSL('const x = 42; return x;'); + expect(result.valid).toBe(true); + expect(result.errors).toHaveLength(0); + }); + + test('function calls', () => { + const result = validateDSL('const r = search("query"); return r;'); + expect(result.valid).toBe(true); + }); + + test('arrow function callback', () => { + const result = validateDSL('const fn = (x) => x * 2; return fn(21);'); + expect(result.valid).toBe(true); + }); + + test('for...of loop', () => { + const result = validateDSL(` + const items = [1, 2, 3]; + const results = []; + for (const item of items) { + results.push(item * 2); + } + return results; + `); + expect(result.valid).toBe(true); + }); + + test('for loop with index', () => { + const result = validateDSL(` + const items = [1, 2, 3]; + for (let i = 0; i < items.length; i++) { + items[i] = items[i] * 2; + } + return items; + `); + expect(result.valid).toBe(true); + }); + + test('if/else', () => { + const result = validateDSL(` + const x = 10; + if (x > 5) { + return "big"; + } else { + return "small"; + } + `); + expect(result.valid).toBe(true); + }); + + test('template literals', () => { + const result = validateDSL('const name = "world"; return `hello ${name}`;'); + expect(result.valid).toBe(true); + }); + + test('object and array literals', () => { + const result = validateDSL(` + const obj = { a: 1, b: [2, 3] }; + return obj; + `); + expect(result.valid).toBe(true); + }); + + test('member expression access', () => { + const result = validateDSL('const arr = [1,2,3]; return arr.length;'); + expect(result.valid).toBe(true); + }); + + test('spread element', () => { + const result = validateDSL('const a = [1,2]; const b = [...a, 3]; return b;'); + expect(result.valid).toBe(true); + }); + + test('ternary expression', () => { + const result = validateDSL('const x = 10; return x > 5 ? "big" : "small";'); + expect(result.valid).toBe(true); + }); + + test('for...in loop', () => { + const result = validateDSL(` + const obj = { a: 1, b: 2, c: 3 }; + const keys = []; + for (const key in obj) { + keys.push(key); + } + return keys; + `); + expect(result.valid).toBe(true); + }); + + test('typical DSL program', () => { + const result = validateDSL(` + const results = search("API endpoints", "./src"); + const chunks = chunk(results, 20000); + const extracted = map(chunks, (c) => LLM("Extract endpoints", c)); + return LLM("Organize by resource", extracted); + `); + expect(result.valid).toBe(true); + }); + }); + + describe('blocked constructs', () => { + test('rejects async function', () => { + const result = validateDSL('const fn = async (x) => x;'); + expect(result.valid).toBe(false); + expect(result.errors[0]).toContain('Async functions are not allowed'); + }); + + test('rejects await expression', () => { + const result = validateDSL('const x = await fetch("url");'); + // This should fail at parse time since await outside async is invalid in script mode + expect(result.valid).toBe(false); + }); + + test('rejects class declaration', () => { + const result = validateDSL('class Foo {}'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes('ClassDeclaration') || e.includes('ClassBody'))).toBe(true); + }); + + test('rejects new expression', () => { + const result = validateDSL('const d = new Date();'); + expect(result.valid).toBe(false); + expect(result.errors[0]).toContain('Blocked node type: NewExpression'); + }); + + test('rejects this', () => { + const result = validateDSL('const x = this;'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes('ThisExpression'))).toBe(true); + }); + + test('rejects eval', () => { + const result = validateDSL('eval("alert(1)");'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes("'eval'"))).toBe(true); + }); + + test('rejects require', () => { + const result = validateDSL('const fs = require("fs");'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes("'require'"))).toBe(true); + }); + + test('rejects process access', () => { + const result = validateDSL('const env = process.env;'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes("'process'"))).toBe(true); + }); + + test('rejects __proto__ access', () => { + const result = validateDSL('const x = obj.__proto__;'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes('__proto__'))).toBe(true); + }); + + test('rejects constructor access', () => { + const result = validateDSL('const x = "".constructor;'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes('constructor'))).toBe(true); + }); + + test('rejects prototype access', () => { + const result = validateDSL('const x = Array.prototype;'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes('prototype'))).toBe(true); + }); + + test('rejects computed __proto__ access', () => { + const result = validateDSL('const x = obj["__proto__"];'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes("__proto__"))).toBe(true); + }); + + test('rejects import expression', () => { + // Dynamic import + const result = validateDSL('const m = import("fs");'); + expect(result.valid).toBe(false); + }); + + test('rejects with statement', () => { + // 'with' is not allowed in strict mode, and our parser uses script mode + // which should still catch it + const result = validateDSL('with (obj) { return x; }'); + // This may fail at parse time or validation time depending on strict mode + expect(result.valid).toBe(false); + }); + + test('rejects globalThis', () => { + const result = validateDSL('const x = globalThis;'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes('globalThis'))).toBe(true); + }); + + test('rejects Function constructor', () => { + const result = validateDSL('const fn = Function("return 1");'); + expect(result.valid).toBe(false); + expect(result.errors.some(e => e.includes("'Function'"))).toBe(true); + }); + + test('rejects generator function', () => { + const result = validateDSL('const gen = function* () { yield 1; };'); + expect(result.valid).toBe(false); + }); + + test('rejects regex literals', () => { + const result = validateDSL('const x = /pattern/.test("hello");'); + expect(result.valid).toBe(false); + expect(result.errors[0]).toContain('Regex literals are not supported'); + }); + + test('rejects regex in replace', () => { + const result = validateDSL('const s = "hello".replace(/h/, "H");'); + expect(result.valid).toBe(false); + expect(result.errors[0]).toContain('Regex literals are not supported'); + }); + }); + + describe('syntax errors', () => { + test('reports syntax errors', () => { + const result = validateDSL('const x = ;'); + expect(result.valid).toBe(false); + expect(result.errors[0]).toContain('Syntax error'); + }); + + test('reports unclosed brackets', () => { + const result = validateDSL('const x = [1, 2, 3'); + expect(result.valid).toBe(false); + expect(result.errors[0]).toContain('Syntax error'); + }); + }); +});