Skip to content

different-ai/agent-watch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agent-watch

agent-watch runs a lightweight server to access your screen data.

It uses a mix of accessibility, and native ocr, and exposes a few straight forward API end point.

Install prompt

Go on https://github.com/different-ai/agent-watch install agent-watch and test it

What it does

  • Captures text from your active screen context.
  • Uses Accessibility APIs in the daemon by default.
  • Uses Vision OCR on demand (capture-now --force-ocr or search-buffer-ocr).
  • Stores text and metadata only in local SQLite (FTS5 search).
  • Ships CLI-first (agent-watch) with launchd daemon support.
  • Keeps a short rolling frame buffer to recover recent text after you switch away.

No audio, no screenshot storage, no cloud dependency.

Requirements

  • macOS 13+
  • Apple Silicon (arm64)
  • Swift 6.2+

Easy install

One-command install (clone + build + install binary):

curl -fsSL https://raw.githubusercontent.com/different-ai/agent-watch/main/scripts/install.sh | bash

If you already cloned the repo:

bash scripts/install.sh

Optional persistent launchd install:

AGENT_WATCH_INSTALL_LAUNCHD=1 bash scripts/install.sh

Low-CPU defaults (simple mode)

  • Capture triggers: app switch + idle timer.
  • Idle timer default: 30s.
  • Daemon OCR default: disabled (ocr_enabled=false).
  • Rolling frame buffer: enabled, 5s interval, 120s retention.
  • Retention default: 14 days.

Build

swift build

Skill-style helper scripts

The repo includes quick operational scripts adapted from the agent-watch skill:

bash scripts/build.sh
bash scripts/doctor.sh
bash scripts/capture-once.sh
bash scripts/search.sh "invoice"
bash scripts/search-buffer-ocr.sh "invoice" --seconds 120 --limit 10
bash scripts/run-daemon.sh
bash scripts/run-api.sh
bash scripts/restart.sh
bash scripts/stop.sh

CLI usage

swift run agent-watch help

Common commands:

swift run agent-watch doctor
swift run agent-watch capture-once
swift run agent-watch capture-now
swift run agent-watch capture-once --force-ocr
swift run agent-watch search-buffer-ocr "alex" --seconds 120 --limit 10
swift run agent-watch search "invoice"
swift run agent-watch ingest --text "manual line" --app "Notes"
swift run agent-watch status
swift run agent-watch purge --older-than 30d

Local HTTP API

Start API server (loopback-only by default):

swift run agent-watch serve --host 127.0.0.1 --port 41733

Routes:

  • GET / (discovery)
  • GET /health
  • GET /status
  • GET /search?q=<query>&limit=<n>&app=<name>
  • GET /screen-recording/probe
  • GET /openapi.yaml

Examples:

curl -s "http://127.0.0.1:41733/health"
curl -s "http://127.0.0.1:41733/"
curl -s "http://127.0.0.1:41733/status"
curl -s "http://127.0.0.1:41733/search?q=invoice&limit=10"
curl -s "http://127.0.0.1:41733/screen-recording/probe"
curl -s "http://127.0.0.1:41733/openapi.yaml"

API design and contract docs:

  • docs/api/openapi.yaml
  • docs/api/INVARIANTS.md
  • docs/api/TEST_MATRIX.md

Screen recording proof from CLI:

swift run agent-watch doctor

doctor prints permission status and a frame probe summary (resolution, byte count, and sample hash prefix).

Data directory

Default:

~/Library/Application Support/AgentWatch/

Override for tests or local sandbox:

AGENT_WATCH_DATA_DIR=/tmp/agent-watch swift run agent-watch status

Legacy fallback variable is also supported:

SCREENTEXT_DATA_DIR=/tmp/agent-watch swift run agent-watch status

Practical OCR commands

Capture one immediate OCR snapshot:

agent-watch capture-now --force-ocr

Search recent buffered screenshots (on demand OCR):

agent-watch search-buffer-ocr "your phrase" --seconds 120 --limit 10

Useful config toggles:

agent-watch config set ocr_enabled false
agent-watch config set frame_buffer_interval_seconds 5
agent-watch config set frame_buffer_retention_seconds 120
agent-watch config set retention_days 14

Tests

Run unit/integration tests:

swift test

Run end-to-end CLI flow:

bash scripts/e2e.sh

About

A native macOS API for your screen

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages