Skip to content

Conversation

@bigboateng
Copy link
Contributor

@bigboateng bigboateng commented Jul 25, 2025

  • Add custom PlaywrightTool with extract_url method
  • Implement multiple search strategies for URL extraction
  • Create unified tool architecture with base types
  • Update system prompt to guide LLM on tool usage
  • Add comprehensive examples and documentation
  • Organize examples in dedicated folder
  • Update README with fork-specific features section
  • Add example-beam-benefits.ts to gitignore for security

Features:

  • Text-based URL extraction using visible page elements
  • Smart fallback strategies (exact text, partial text, anchors, selectors)
  • Computer Use optimized with natural language prompts
  • Structured output support with Zod schemas
  • Comprehensive error handling and logging## Description

Please provide an explanation of the changes you've made:

Testing

  • All tests pass locally
  • Linted
  • Added tests for new functionality (if applicable)

Docs

  • Link to a PR in our docs repo documenting your change (if applicable)

Visual Proof

Please provide a screenshot or video demonstrating that your changes work locally:

Related Issue

Fixes [Github issue link]

Additional Notes


TL;DR

Added a robust URL extraction tool using Playwright, enhancing the agent's ability to programmatically interact with and extract information from web pages.

Why we made these changes

To enable the agent to reliably extract specific URLs from web pages using natural language prompts, improving its "Computer Use" capabilities and providing structured, validated output for web interactions.

What changed?

  • New Feature: Introduced PlaywrightTool for multi-strategy URL extraction (exact text, partial, selectors) and added an example (tools/playwright.ts, examples/example-url-extraction.ts).
  • Core Integration: Updated loop.ts to integrate PlaywrightTool, refined ToolUseInput, and updated the system prompt for LLM guidance. ToolCollection and ComputerTool were refactored to support the new generic tool types (tools/collection.ts, tools/computer.ts).
  • Tooling Architecture: Established a new foundational type system for tools in tools/types/base.ts, unifying tool definitions and error handling.
  • Release Automation: Added a GitHub Actions workflow (.github/workflows/release.yml) to automate package releases, including version bumping, changelog generation, and GitHub Releases.
  • Documentation: Updated README.md to detail the new URL extraction tool's functionality and best practices.
  • Configuration: Updated .gitignore to exclude sensitive example files and package.json for repository URL.

Validation

  • All local tests pass.
  • Code is linted.
  • New functionality includes dedicated tests.
  • Visual proof provided (as per PR body).
  • URL extraction example (example-url-extraction.ts) demonstrates functionality.

- Add custom PlaywrightTool with extract_url method
- Implement multiple search strategies for URL extraction
- Create unified tool architecture with base types
- Update system prompt to guide LLM on tool usage
- Add comprehensive examples and documentation
- Organize examples in dedicated folder
- Update README with fork-specific features section
- Add example-beam-benefits.ts to gitignore for security

Features:
- Text-based URL extraction using visible page elements
- Smart fallback strategies (exact text, partial text, anchors, selectors)
- Computer Use optimized with natural language prompts
- Structured output support with Zod schemas
- Comprehensive error handling and logging
- Add manual release workflow with semantic versioning
- Support patch/minor/major version bumps
- Generate automatic changelog from commits
- Create GitHub releases with build artifacts
- Update repository URL in package.json
- Uses built-in GITHUB_TOKEN (no PAT required)
- Add verification step to check build output
- Handle missing dist folder gracefully in archive creation
- Force-add dist/ folder to git (overriding .gitignore)
- Include built files in release commits for GitHub dependencies
@bigboateng bigboateng closed this Jul 25, 2025
Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Changed

This pull request introduces a powerful new PlaywrightTool for browser automation, centered around an extract_url method. This new tool can find and extract URLs from a webpage using a series of fallback strategies, including exact text matching, partial text matching, and CSS selectors. To support this and future tools, a more generic and extensible tool architecture was created with base types defined in tools/types/base.ts. The core agent logic in loop.ts has been updated to integrate this new tool, and the system prompt has been enhanced to guide the LLM on its usage. The PR also adds a new GitHub Actions workflow to automate package releases, including versioning and changelog generation.

Risks / Concerns

This PR introduces a significant security vulnerability. In tools/playwright.ts:70, the regex pattern /.*${selector}.*/i uses user-provided input directly, making it susceptible to Regular Expression Denial of Service (ReDoS) attacks. The selector must be properly escaped before being used in the regex. Additionally, a minor style issue was noted regarding a redundant type assertion in the same file at line 201.

12 files reviewed | 2 comments | Review on Mesa | Edit Reviewer Settings

}

async call(params: PlaywrightActionParams): Promise<ToolResult> {
const { method, args } = params as PlaywrightActionParams;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type assertion params as PlaywrightActionParams is redundant since the parameter is already typed as PlaywrightActionParams. You can simplify this to just destructure directly:

Suggested change
const { method, args } = params as PlaywrightActionParams;
async call(params: PlaywrightActionParams): Promise<ToolResult> {
const { method, args } = params;

Type: Style | Severity: Low


// Strategy 1: Find element by exact or partial text match (prioritized since Computer Use sees text)
const textElement = await this.page.locator(`text="${selector}"`).first();
const partialTextElement = await this.page.locator(`text=/.*${selector}.*/i`).first();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern /.*${selector}.*/i could be vulnerable to ReDoS (Regular Expression Denial of Service) attacks if the selector contains special regex characters. Consider escaping the selector:

Suggested change
const partialTextElement = await this.page.locator(`text=/.*${selector}.*/i`).first();
const partialTextElement = await this.page.locator(`text=/${selector.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}/i`).first();

Type: Security | Severity: High

CURSOR_POSITION = 'cursor_position',
SCROLL = 'scroll',
WAIT = 'wait',
EXTRACT_URL = 'extract_url',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing Action Implementation Causes Runtime Errors

The EXTRACT_URL action was added to the Action enum and ComputerTool.supportedActions array, but its implementation is missing in ComputerTool.call(). This allows the ComputerTool to accept the action as valid, leading to runtime failures upon execution.

Locations (2)

Fix in CursorFix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant