REVERSE ENGINEER ANY WEB APP! ⚡️
You are in the right place if you ...
- want your AI agent to take real actions on the web
- never want to pay for an API (except for OpenAI... shouldn't piss them off...)
- are tired of complicated, endless API integrations
- dealing with closed APIs
Welcome to Vectorly's Web Hacker... No API? No Problem!
- Launch Chrome in debug mode (enable DevTools protocol on
127.0.0.1:9222). - Run the browser monitor and manually perform the target actions to capture browser state.
- Specify your task and run the routine discovery script; the agent reverse‑engineers the API flow.
- Review and run/test the generated routine JSON (locally).
- Go to console.vectorly.app and productionize your routines!
A Routine is a portable automation recipe that captures how to perform a specific task in any web app.
Define once. Reuse everywhere. Automate anything you can do in a browser.
Each Routine includes:
- name — a human-readable identifier
- description — what the Routine does
- parameters — input values the Routine needs to run (e.g. URLs, credentials, text)
- operations — the ordered browser actions that perform the automation
Example:
Navigate to a dashboard, search based on keywords, and return results — all as a reusable Routine.
2025.11_webhacker_quickstart.mp4
- Defined as typed inputs (see
Parameterclass). - Each parameter has required
nameanddescriptionfields. Optional fields includetype(defaults tostring),required(defaults totrue),default, andexamples. - Parameters are referenced inside
operationsusing placeholder tokens like"{{paramName}}"or\"{{paramName}}\"(see Placeholder Interpolation below). - Parameter Types: Supported types include
string,integer,number,boolean,date,datetime,email,url, andenum. - Parameter Validation: Parameters support validation constraints such as
min_length,max_length,min_value,max_value,pattern(regex),enum_values, andformat. - Reserved Prefixes: Parameter names cannot start with reserved prefixes:
sessionStorage,localStorage,cookie,meta,uuid,epoch_milliseconds.
Operations define the executable steps of a Routine. They are represented as a typed list (see RoutineOperationUnion) and are executed sequentially by a browser.
Each operation specifies a type and its parameters:
- navigate — open a URL in the browser.
{ "type": "navigate", "url": "https://example.com", "sleep_after_navigation_seconds": 3.0 } - sleep — pause execution for a given duration (in seconds).
{ "type": "sleep", "timeout_seconds": 1.5 } - wait_for_url — wait for the current URL to match a regex pattern.
{ "type": "wait_for_url", "url_regex": ".*dashboard.*", "timeout_ms": 20000 }
- fetch — perform an HTTP request defined by an
endpointobject (method, URL, headers, body, credentials). Optionally, store the response under asession_storage_key.{ "type": "fetch", "endpoint": { "method": "GET", "url": "https://api.example.com", "headers": {}, "body": {}, "credentials": "same-origin" }, "session_storage_key": "userData" } - download — download a file and return it as base64-encoded content.
{ "type": "download", "endpoint": { "method": "GET", "url": "https://example.com/report.pdf", "headers": {}, "body": {} }, "filename": "report.pdf" } - get_cookies — retrieve all cookies (including HttpOnly) via CDP and store them in session storage.
{ "type": "get_cookies", "session_storage_key": "allCookies", "domain_filter": "*" }
- click — click on an element by CSS selector. Automatically validates visibility to avoid honeypot traps.
{ "type": "click", "selector": "#submit-button", "button": "left", "ensure_visible": true } - input_text — type text into an input element. Validates visibility before typing.
{ "type": "input_text", "selector": "#username", "text": "\"{{username}}\"", "clear": false } - press — press a keyboard key (enter, tab, escape, etc.).
{ "type": "press", "key": "enter" } - scroll — scroll the page or a specific element.
{ "type": "scroll", "selector": "#content", "delta_y": 500, "behavior": "auto" }
- js_evaluate — evaluate custom JavaScript code in the browser context. Must be wrapped in an IIFE format.
{ "type": "js_evaluate", "js": "(function() { return document.title; })()", "timeout_seconds": 5.0, "session_storage_key": "pageTitle" }
- return — return the value previously stored under a
session_storage_key.{ "type": "return", "session_storage_key": "userData" } - return_html — return HTML content from the page or a specific element.
{ "type": "return_html", "scope": "page" }
Example sequence:
[
{ "type": "navigate", "url": "https://example.com/login" },
{ "type": "sleep", "timeout_seconds": 1 },
{
"type": "fetch",
"endpoint": {
"method": "POST",
"url": "/auth",
"body": { "username": "\"{{user}}\"", "password": "\"{{pass}}\"" }
},
"session_storage_key": "token"
},
{ "type": "return", "session_storage_key": "token" }
]This defines a deterministic flow: open → wait → authenticate → return a session token.
Placeholders inside operation fields are resolved at runtime:
- Parameter placeholders:
"{{paramName}}"or\"{{paramName}}\"→ substituted from routine parameters - Storage placeholders (read values from the current session):
{{sessionStorage:myKey.path.to.value}}— access nested values in sessionStorage{{localStorage:myKey}}— access localStorage values{{cookie:CookieName}}— read cookie values{{meta:name}}— read meta tag content (e.g.,<meta name="csrf-token">)
Important: Currently, sessionStorage, localStorage, cookie, and meta placeholder resolution is supported only inside fetch headers and body. Future versions will support interpolation anywhere in operations.
Interpolation occurs before an operation executes. For example, a fetch endpoint might be:
{
"type": "fetch",
"endpoint": {
"method": "GET",
"url": "https://api.example.com/search?paramName1=\"{{paramName1}}\"¶mName2=\"{{paramName1}}\"",
"headers": {
"Authorization": "Bearer {{cookie:auth_token}}"
},
"body": {}
},
"session_storage_key": "result_key"
}This substitutes parameter values and injects auth_token from cookies. The JSON response is stored under sessionStorage['result_key'] and can be returned by a final return operation using the matching session_storage_key.
- Python 3.12+
- Google Chrome (stable)
- uv (Python package manager) (optional, for development)
- macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh - Windows (PowerShell):
iwr https://astral.sh/uv/install.ps1 -UseBasicParsing | iex
- macOS/Linux:
- OpenAI API key
Note: We recommend using a virtual environment to avoid dependency conflicts.
# Create and activate a virtual environment
# Option 1: Using uv (recommended - handles Python version automatically)
uv venv web-hacker-env
source web-hacker-env/bin/activate # On Windows: web-hacker-env\Scripts\activate
uv pip install web-hacker
# Option 2: Using python3 (if Python 3.12+ is your default)
python3 -m venv web-hacker-env
source web-hacker-env/bin/activate # On Windows: web-hacker-env\Scripts\activate
pip install web-hacker
# Option 3: Using pyenv (if you need a specific Python version)
pyenv install 3.12.3 # if not already installed
pyenv local 3.12.3
python -m venv web-hacker-env
source web-hacker-env/bin/activate # On Windows: web-hacker-env\Scripts\activate
pip install web-hacker
# Troubleshooting: If pip is not found, recreate the venv or use:
python -m ensurepip --upgrade # Install pip in the venv
pip install web-hackerFor development or if you want the latest code:
# Clone the repository
git clone https://github.com/VectorlyApp/web-hacker.git
cd web-hacker
# Create and activate virtual environment
python3 -m venv web-hacker-env
source web-hacker-env/bin/activate # On Windows: web-hacker-env\Scripts\activate
# Install in editable mode
pip install -e .
# Or using uv (faster)
uv venv web-hacker-env
source web-hacker-env/bin/activate
uv pip install -e .The fastest way to get started is using the quickstart script, which automates the entire workflow:
# Make sure web-hacker is installed
pip install web-hacker
# Set your OpenAI API key
export OPENAI_API_KEY="sk-..."
# Run the quickstart script
python quickstart.pyThe quickstart script will:
- ✅ Automatically launch Chrome in debug mode
- 📊 Start browser monitoring (you perform actions)
- 🤖 Discover routines from captured data
- 📝 Show you how to execute the discovered routine
Note: The quickstart script is included in the repository. If you installed from PyPI, you can download it from the GitHub repository.
💡 Tip: The quickstart script automatically launches Chrome for you. You only need these manual instructions if you're not using the quickstart script.
# Create temporary Chrome user directory
mkdir -p $HOME/tmp/chrome
# Launch Chrome in debug mode
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-address=127.0.0.1 \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/tmp/chrome" \
--remote-allow-origins='*' \
--no-first-run \
--no-default-browser-check
# Verify Chrome is running
curl http://127.0.0.1:9222/json/version# Create temporary Chrome user directory
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\tmp\chrome" | Out-Null
# Locate Chrome
$chrome = "C:\Program Files\Google\Chrome\Application\chrome.exe"
if (!(Test-Path $chrome)) {
$chrome = "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"
}
# Launch Chrome in debug mode
& $chrome `
--remote-debugging-address=127.0.0.1 `
--remote-debugging-port=9222 `
--user-data-dir="$env:USERPROFILE\tmp\chrome" `
--remote-allow-origins=* `
--no-first-run `
--no-default-browser-check
# Verify Chrome is running
(Invoke-WebRequest http://127.0.0.1:9222/json/version).Content# Create temporary Chrome user directory
mkdir -p $HOME/tmp/chrome
# Launch Chrome in debug mode (adjust path if needed)
google-chrome \
--remote-debugging-address=127.0.0.1 \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/tmp/chrome" \
--remote-allow-origins='*' \
--no-first-run \
--no-default-browser-check
# Verify Chrome is running
curl http://127.0.0.1:9222/json/versionThe reverse engineering process follows a simple three-step workflow:
- Monitor — Capture network traffic, storage events, and interactions while you manually perform the target task in Chrome
- Discover — Let the AI agent analyze the captured data and generate a reusable Routine
- Execute — Run the discovered Routine with different parameters to automate the task
Easiest way: Use the quickstart script which automates the entire workflow.
Each step is detailed below. Start by ensuring Chrome is running in debug mode (see Launch Chrome in Debug Mode above).
Reverse-engineering and automating a website can violate terms of service. Store captures securely and scrub any sensitive fields before sharing.
Use the CDP browser monitor to block trackers and capture network, storage, and interaction data while you manually perform the task in Chrome.
Run this command to start monitoring:
web-hacker-monitor --host 127.0.0.1 --port 9222 --output-dir ./cdp_captures --url about:blank --incognitoThe script will open a new tab (starting at about:blank). Navigate to your target website, then manually perform the actions you want to automate (e.g., search, login, export report). Keep Chrome focused during this process. Press Ctrl+C and the script will consolidate transactions and produce a HAR automatically.
Output structure (under --output-dir, default ./cdp_captures):
cdp_captures/
├── session_summary.json
├── network/
│ ├── consolidated_transactions.json
│ ├── network.har
│ └── transactions/
│ └── <timestamp_url_id>/
│ ├── request.json
│ ├── response.json
│ └── response_body.[ext]
└── storage/
└── events.jsonl
Tip: Keep Chrome focused while monitoring and perform the target flow (search, checkout, etc.). Press Ctrl+C to stop; the script will consolidate transactions and produce a HTTP Archive (HAR) automatically.
Use the routine-discovery pipeline to analyze captured data and synthesize a reusable Routine (navigate → fetch → return).
Prerequisites: You’ve already captured a session with the browser monitor (./cdp_captures exists).
Run the discovery agent:
⚠️ Important: You must specify your own--taskparameter. The example below is just for demonstration—replace it with a description of what you want to automate.
Linux/macOS (bash):
web-hacker-discover \
--task "Recover API endpoints for searching for trains and their prices" \
--cdp-captures-dir ./cdp_captures \
--output-dir ./routine_discovery_output \
--llm-model gpt-5.1Windows (PowerShell):
# Simple task (no quotes inside):
web-hacker-discover --task "Recover the API endpoints for searching for trains and their prices" --cdp-captures-dir ./cdp_captures --output-dir ./routine_discovery_output --llm-model gpt-5.1Example tasks:
"recover the api endpoints for searching for trains and their prices"(shown above)"discover how to search for flights and get pricing""find the API endpoint for user authentication""extract the endpoint for submitting a job application"
Arguments:
- --task: A clear description of what you want to automate. This guides the AI agent to identify which network requests to extract and convert into a Routine. Examples: searching for products, booking appointments, submitting forms, etc.
- --cdp-captures-dir: Root of prior CDP capture output (default:
./cdp_captures) - --output-dir: Directory to write results (default:
./routine_discovery_output) - --llm-model: LLM to use for reasoning/parsing (default:
gpt-5.1)
Outputs (under --output-dir):
routine_discovery_output/
├── identified_transactions.json # Chosen transaction id/url
├── routine_transactions.json # Slimmed request/response samples given to LLM
├── resolved_variables.json # Resolution hints for cookies/tokens (if any)
└── routine.json # Final Routine model (name, parameters, operations)
127.0.0.1:9222.
"{{PARAM}}" when they should be "\"{{PARAM}}\"" to ensure proper JSON string escaping.
Example: If you see:
"field": "{{paramName}}"And paramName is a string parameter, manually change it to:
"field": "\"{{paramName}}\""This ensures the parameter value is properly quoted as a JSON string when substituted.
Run the example routine:
# Using a parameters file:
web-hacker-execute \
--routine-path example_routines/amtrak_one_way_train_search_routine.json \
--parameters-path example_routines/amtrak_one_way_train_search_input.json
# Or pass parameters inline (JSON string):
web-hacker-execute \
--routine-path example_routines/amtrak_one_way_train_search_routine.json \
--parameters-dict '{"origin": "BOS", "destination": "NYP", "departureDate": "2026-03-22"}'Run a discovered routine:
web-hacker-execute \
--routine-path routine_discovery_output/routine.json \
--parameters-path routine_discovery_output/test_parameters.jsonNote: Routines execute in a new incognito tab by default (controlled by the routine's incognito field). This ensures clean sessions for each execution.
Alternative: Deploy your routine to console.vectorly.app to expose it as an API endpoint or MCP tool for use in production environments.
-
Chrome not detected / cannot connect to DevTools
- Ensure Chrome is launched in debug mode and
http://127.0.0.1:9222/json/versionreturns JSON. - Check
--host/--portflags match your Chrome launch args.
- Ensure Chrome is launched in debug mode and
-
OPENAI_API_KEYnot set- Export the key in your shell or create a
.envfile and run viauv run(dotenv is loaded).
- Export the key in your shell or create a
-
No such file or directory: './cdp_captures/network/transactions/N/A'or similar transaction path errors-
The agent cannot find any network transactions relevant to your task. This usually means:
- The
--taskdescription doesn't match what you actually performed during monitoring - The relevant network requests weren't captured (they may have been blocked or filtered)
- The task description is too vague or too specific
- The
-
Fix: Reword your
--taskparameter to more accurately describe what you did during the monitoring step, or re-run the browser monitor and ensure you perform the exact actions you want to automate.
-
For programmatic control, use the Python SDK instead of CLI commands:
from web_hacker.sdk import WebHacker
from web_hacker.data_models.routine.routine import Routine
# Initialize (uses OPENAI_API_KEY from environment)
hacker = WebHacker()
# Load and execute an existing routine
routine = Routine.model_validate_json(open("routine.json").read())
result = hacker.execute_routine(
routine=routine,
parameters={"origin": "NYC", "destination": "LAX", "date": "2026-03-15"}
)
if result.ok:
print(result.data) # API response dataimport json
from web_hacker.sdk import WebHacker, BrowserMonitor
hacker = WebHacker()
# Step 1: Monitor browser activity
monitor = BrowserMonitor(output_dir="./captures")
monitor.start()
# ... user performs actions in browser ...
input("Press Enter when done")
monitor.stop()
# Step 2: Discover routine from captures
routine = hacker.discover_routine(
task="Search for flights and get prices",
cdp_captures_dir="./captures",
output_dir="./output"
)
# Step 3: Test with generated test parameters
test_params = json.load(open("./output/test_parameters.json"))
result = hacker.execute_routine(routine=routine, parameters=test_params)
# Step 4: Execute with new parameters
result = hacker.execute_routine(
routine=routine,
parameters={"origin": "SFO", "destination": "JFK", "date": "2026-04-01"}
)| Class | Description |
|---|---|
WebHacker |
Main client for the full workflow |
BrowserMonitor |
Capture browser network/storage activity |
RoutineDiscovery |
Discover routines from captured data |
RoutineExecutor |
Execute routines programmatically |
See quickstart.py for a complete interactive example.
- Integration of routine testing into the agentic pipeline
- The agent will execute discovered routines, detect failures, and automatically suggest/fix issues to make routines more robust and efficient.
- Checkpointing progress and resumability
- Avoid re-running the entire discovery pipeline after exceptions; the agent will checkpoint progress and resume from the last successful stage.
- Parameter resolution visibility
- During execution, show which placeholders (e.g.,
{{sessionStorage:...}},{{cookie:...}},{{localStorage:...}}) resolved successfully and which failed.
- During execution, show which placeholders (e.g.,
- wait_for_title — wait for the page title to match a regex pattern
- network_sniffing (background operation) — intercept and capture network requests matching a URL pattern in the background while other operations execute. Useful for capturing API calls triggered by UI interactions.
- Supports different capture modes:
list(all matching requests),first(only first match),last(only last match) - Can capture request, response, or body data
- Supports different capture modes:
- hover — move mouse over an element to trigger hover states
- wait_for_selector — wait for an element to reach a specific state (visible, hidden, attached, detached)
- set_files — set file paths for file input elements (for file uploads)
- return_screenshot — capture and return a screenshot of the page as base64
We welcome contributions! Here's how to get started:
- Report bugs or request features — Open an issue
- Submit code — Fork the repo and open a pull request
- Test your code — Add unit tests and make sure all tests pass:
pytest tests/ -vPlease follow existing code style and include tests for new features.