Skip to content

Conversation

@filip-michalsky
Copy link

@filip-michalsky filip-michalsky commented Jan 15, 2026

Description

Adds BrowserEnv - a unified browser automation integration for the verifiers library supporting two operational modes:

DOM Mode (mode="dom")

  • Uses the Stagehand Python SDK for natural language browser control
  • Tools: navigate, observe, act, extract - Stagehand's AI-driven primitives
  • Ideal for tasks that benefit from semantic understanding of page elements

CUA Mode (mode="cua")

  • Vision-based primitives for Computer Use Agent workflows
  • Tools: click, double_click, type_text, keypress, scroll, goto, back, forward, wait, screenshot
  • Automatic sandbox deployment - CUA server is deployed automatically to sandbox containers
  • Three execution modes (fastest to most flexible):
    1. Pre-built image (default): Uses deepdream19/cua-server:latest for ~5-10s startup
    2. Binary upload: Builds and uploads custom server version (~30-60s startup)
    3. Manual server: Connect to locally running CUA server for development

Both modes support local browser execution or Browserbase cloud infrastructure.

What's included:

  • verifiers/envs/integrations/browser_env/ - Core integration (BrowserEnv, DOMMode, CUAMode, CUASandboxMode)
  • assets/templates/browserbase/cua/ - TypeScript CUA server with Docker build/runtime configs
  • environments/browser_dom_example/ - Minimal DOM mode example
  • environments/browser_cua_example/ - Minimal CUA mode example
  • New [browser] extra: uv add 'verifiers[browser]'

Benchmarks (GAIA, WebVoyager, Mind2Web) have been pushed to Prime Hub under the browserbase/ namespace.

Type of Change

  • New feature (non-breaking change which adds functionality)

Testing

# DOM mode
prime eval run browser-dom-example -m openai/gpt-4o-mini

# CUA mode (pre-built image - default, fastest)
prime eval run browser-cua-example -m openai/gpt-4o-mini

# CUA mode (binary upload - custom server)
prime eval run browser-cua-example -m openai/gpt-4o-mini -a '{"use_prebuilt_image": false}'

# CUA mode (manual server for development)
cd assets/templates/browserbase/cua && pnpm dev  # In separate terminal
prime eval run browser-cua-example -m openai/gpt-4o-mini -a '{"use_sandbox": false}'
  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

CUA Server Deployment Options:

Mode Flag Startup Use Case
Pre-built image (default) None ~5-10s Production
Binary upload use_prebuilt_image=false ~30-60s Custom server
Manual server use_sandbox=false Instant Development

Future work:

  • Additional benchmark environments available on Prime Hub under browserbase/ org

---

## Bugbot Summary (updated)

```markdown

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces a unified browser automation environment with two modes and a packaged CUA server.
> 
> - Adds `BrowserEnv` with **DOM** (Stagehand SDK: `navigate/observe/act/extract`) and **CUA** (vision primitives: `click/type/scroll/goto/...`) modes
> - CUA mode supports automatic sandbox deployment with options: pre-built Docker image (default), binary upload, or manual local server
> - Implements TypeScript Fastify CUA server (`assets/templates/browserbase/cua/`) with REST endpoints, Dockerfiles, build/publish scripts, and SEA binary tooling
> - Provides example environments (`browser_dom_example`, `browser_cua_example`), docs updates, and a new `[browser]` extra dependency set
> - Adds tests for validation, modes, prompt defaults, response formatting, and example datasets; updates exports to expose `BrowserEnv`
> 
> <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 38d8ae167778f2e87b708c3f57dec329d010b337. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

@CLAassistant
Copy link

CLAassistant commented Jan 15, 2026

CLA assistant check
All committers have signed the CLA.

@filip-michalsky filip-michalsky changed the title ruff precommit Add Browser Env Integration Jan 15, 2026
@filip-michalsky filip-michalsky marked this pull request as ready for review January 16, 2026 16:02
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

filip-michalsky and others added 7 commits January 16, 2026 16:31
* use remote sandbox env for cua mode

* update tests

* remote cua in sandbox

* Fm/browser add binary (#4)

* binary

* update

* fix non binary execution

* fix ruff

* update default flags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants