Browser Testing Agent

An autonomous browser testing agent powered by AI. It uses vision models and LLMs to interact with web applications and complete testing scenarios without manual scripting.

What It Does

Instead of writing brittle selectors and waiting for elements, you just tell the agent what you want to test. It figures out how to do it by:

Looking at screenshots to understand the page
Finding buttons, inputs, and links automatically
Planning actions based on your goal
Executing them and checking if it worked

Features

Vision-based element detection - Uses screenshots to find interactive elements, no selectors needed
Multi-action planning - Fills entire forms in one go instead of field-by-field
Smart evaluation - Knows when a test actually passed vs just seeing a visual change
Multi-step workflows - Handles complex flows like signup processes from start to finish
Batch operations - Faster execution by grouping related actions
Screenshot debugging - Captures screenshots at each step so you can see what happened

Installation

You'll need:

Node.js 18+
pnpm 8+
A Google Gemini API key (get one here)

git clone https://github.com/yourusername/browser-testing-agent.git
cd browser-testing-agent
pnpm install
pnpm build

Create a .env file:

GOOGLE_API_KEY=your-api-key-here

Usage

Basic usage:

pnpm agent --url "https://example.com" --goal "click the login button"

Headed mode (see the browser):

pnpm agent --url "https://example.com" --goal "create an account" --headed

Verbose logging:

pnpm agent --url "https://example.com" --goal "fill the contact form" --verbose

Examples

Simple button click:

pnpm agent --url "https://example.com" --goal "click the sign up button"

Form filling:

pnpm agent --url "https://app.example.com" --goal "create an account by filling all required fields and completing signup"

The agent will fill all form fields in one batch, then click submit, then complete any additional steps.

Complex workflow:

pnpm agent --url "https://app.example.com" --goal "navigate to sign up, fill all the data needed, and complete the full signup process"

Architecture

This project uses Hexagonal Architecture (ports and adapters). The core domain logic is completely separate from external dependencies like Playwright or LLM APIs.

Domain (agent-core)
   ↑       ↑
Ports   Ports
   ↓       ↓
Adapters (browser-mcp, vision, llm)

Project Structure

browser-testing-agent/
├── apps/
│   └── agent-runner/          # CLI entry point
├── packages/
│   ├── agent-core/            # Domain logic (no external deps)
│   ├── browser-mcp/           # Playwright adapter
│   ├── vision/                # Vision analysis
│   └── llm/                   # LLM integration
└── scripts/                   # Utilities

How It Works

The agent runs a MAPE-K loop:

Monitor - Takes a screenshot
Analyze - Vision model extracts facts (buttons, inputs, links)
Plan - Planner LLM decides what to do next
Execute - Runs the browser action
Evaluate - Checks if the goal was achieved

This repeats until the goal is complete or it hits the step limit.

Design Patterns

Hexagonal Architecture - Domain isolated from adapters
Command Pattern - Actions are commands (NavigateCommand, ClickCommand, TypeCommand)
State Pattern - Test lifecycle with explicit states
Strategy Pattern - Swappable LLM providers
Observer Pattern - Event system for observability

Development

Build:

pnpm build

Lint:

pnpm lint

Clean:

pnpm clean

Configuration

Screenshots

Screenshots are saved to ./screenshots by default. Each step creates:

step-N-before.png - Before the action
step-N-after.png - After the action

Models

Default models:

Planner: gemini-2.0-flash-lite (falls back to gemini-2.5-flash if needed)
Evaluator: gemini-2.0-flash-lite
Vision: gemini-2.0-flash-lite (falls back to full model)

You can configure these in the LLM adapter.

Performance

The agent is optimized for speed:

Batch form filling (all fields at once)
Heuristic-first evaluation (fewer LLM calls)
Tiered image comparison (pixel → lite vision → full vision)
Parallel screenshot and DOM extraction
Smart page ready detection

Contributing

Pull requests welcome!

Fork the repo
Create a feature branch
Make your changes
Submit a PR

Please:

Follow TypeScript best practices
Keep the hexagonal architecture
Add tests for new features
Update docs as needed

License

MIT

Credits

Playwright for browser automation
Google Gemini for vision and language models
LangChain for LLM integration

Documentation

Implementation Report - Technical details

Known Issues

Heavy JavaScript apps might need longer wait times
Vision accuracy depends on screenshot quality
Complex SPAs might need retries for element detection

Roadmap

More browser actions (scroll, hover, drag-and-drop)
Test result reporting (JSON, HTML, JUnit)
CI/CD examples
Docker support
Multiple LLM providers (OpenAI, Anthropic)
Visual regression testing
Test recording/replay
Parallel execution

Support

Found a bug? Have a question?

Check existing issues
Open a new issue with details
Include screenshots and logs if possible

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
apps/agent-runner		apps/agent-runner
packages		packages
scripts		scripts
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
IMPLEMENTATION_REPORT.md		IMPLEMENTATION_REPORT.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Browser Testing Agent

What It Does

Features

Installation

Usage

Examples

Architecture

Project Structure

How It Works

Design Patterns

Development

Configuration

Screenshots

Models

Performance

Contributing

License

Credits

Documentation

Known Issues

Roadmap

Support

About

Uh oh!

Releases

Packages

Languages

kareem2002-k/browser-testing-agent

Folders and files

Latest commit

History

Repository files navigation

Browser Testing Agent

What It Does

Features

Installation

Usage

Examples

Architecture

Project Structure

How It Works

Design Patterns

Development

Configuration

Screenshots

Models

Performance

Contributing

License

Credits

Documentation

Known Issues

Roadmap

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages