agent-watch runs a lightweight server to access your screen data.
It uses a mix of accessibility, and native ocr, and exposes a few straight forward API end point.
Go on https://github.com/different-ai/agent-watch install agent-watch and test it
- Captures text from your active screen context.
- Uses Accessibility APIs in the daemon by default.
- Uses Vision OCR on demand (
capture-now --force-ocrorsearch-buffer-ocr). - Stores text and metadata only in local SQLite (FTS5 search).
- Ships CLI-first (
agent-watch) with launchd daemon support. - Keeps a short rolling frame buffer to recover recent text after you switch away.
No audio, no screenshot storage, no cloud dependency.
- macOS 13+
- Apple Silicon (arm64)
- Swift 6.2+
One-command install (clone + build + install binary):
curl -fsSL https://raw.githubusercontent.com/different-ai/agent-watch/main/scripts/install.sh | bashIf you already cloned the repo:
bash scripts/install.shOptional persistent launchd install:
AGENT_WATCH_INSTALL_LAUNCHD=1 bash scripts/install.sh- Capture triggers: app switch + idle timer.
- Idle timer default:
30s. - Daemon OCR default: disabled (
ocr_enabled=false). - Rolling frame buffer: enabled,
5sinterval,120sretention. - Retention default:
14days.
swift buildThe repo includes quick operational scripts adapted from the agent-watch skill:
bash scripts/build.sh
bash scripts/doctor.sh
bash scripts/capture-once.sh
bash scripts/search.sh "invoice"
bash scripts/search-buffer-ocr.sh "invoice" --seconds 120 --limit 10
bash scripts/run-daemon.sh
bash scripts/run-api.sh
bash scripts/restart.sh
bash scripts/stop.shswift run agent-watch helpCommon commands:
swift run agent-watch doctor
swift run agent-watch capture-once
swift run agent-watch capture-now
swift run agent-watch capture-once --force-ocr
swift run agent-watch search-buffer-ocr "alex" --seconds 120 --limit 10
swift run agent-watch search "invoice"
swift run agent-watch ingest --text "manual line" --app "Notes"
swift run agent-watch status
swift run agent-watch purge --older-than 30dStart API server (loopback-only by default):
swift run agent-watch serve --host 127.0.0.1 --port 41733Routes:
GET /(discovery)GET /healthGET /statusGET /search?q=<query>&limit=<n>&app=<name>GET /screen-recording/probeGET /openapi.yaml
Examples:
curl -s "http://127.0.0.1:41733/health"
curl -s "http://127.0.0.1:41733/"
curl -s "http://127.0.0.1:41733/status"
curl -s "http://127.0.0.1:41733/search?q=invoice&limit=10"
curl -s "http://127.0.0.1:41733/screen-recording/probe"
curl -s "http://127.0.0.1:41733/openapi.yaml"API design and contract docs:
docs/api/openapi.yamldocs/api/INVARIANTS.mddocs/api/TEST_MATRIX.md
Screen recording proof from CLI:
swift run agent-watch doctordoctor prints permission status and a frame probe summary (resolution, byte count, and sample hash prefix).
Default:
~/Library/Application Support/AgentWatch/
Override for tests or local sandbox:
AGENT_WATCH_DATA_DIR=/tmp/agent-watch swift run agent-watch statusLegacy fallback variable is also supported:
SCREENTEXT_DATA_DIR=/tmp/agent-watch swift run agent-watch statusCapture one immediate OCR snapshot:
agent-watch capture-now --force-ocrSearch recent buffered screenshots (on demand OCR):
agent-watch search-buffer-ocr "your phrase" --seconds 120 --limit 10Useful config toggles:
agent-watch config set ocr_enabled false
agent-watch config set frame_buffer_interval_seconds 5
agent-watch config set frame_buffer_retention_seconds 120
agent-watch config set retention_days 14Run unit/integration tests:
swift testRun end-to-end CLI flow:
bash scripts/e2e.sh