Skip to content
@Geisterhand-io

Geisterhand

LLM-powered desktop automation. Control any app with an HTTP API — click, type, read accessibility trees, capture screenshots. macOS, Linux & Windows

Geisterhand

LLM-powered desktop automation. Control any app with an HTTP API — click buttons, type text, navigate menus, read accessibility trees, and capture screenshots.

Geisterhand (German for "ghostly hand") lets LLMs like Claude autonomously interact with native desktop applications. It works in the background without stealing focus, so you can automate apps while continuing your own work.

How It Works

LLM / Script                    Geisterhand                     Desktop App
     |                               |                               |
     |   geisterhand run Calculator  |                               |
     | ----------------------------> |  Launch app in background      |
     |   {"port":49152, "pid":1234}  | ----------------------------> |
     |                               |                               |
     |   POST /click/element         |                               |
     |   {"title":"7","role":"Button"}                                |
     | ----------------------------> |  Click button via AX APIs     |
     |                               | ----------------------------> |
     |                               |                               |
     |   GET /screenshot             |                               |
     | ----------------------------> |  Capture window               |
     |         <image data>          | <---------------------------- |
     | <---------------------------- |                               |

geisterhand run launches an app and starts a scoped HTTP server. Every request is automatically targeted at that app — no need to specify PIDs or window titles in each call.

Repositories

Repo Description Language
macos macOS automation via ScreenCaptureKit & Accessibility APIs Swift
windows Windows automation via UI Automation & Win32 APIs C# (.NET)
linux Linux automation via AT-SPI2, XTest & xdg-desktop-portal Rust
mcp MCP server for Claude Code & Claude Desktop TypeScript
landing Website — geisterhand.dev Astro
homebrew-tap Homebrew tap for macOS install Ruby

Quick Start

macOS

brew install --cask geisterhand-io/tap/geisterhand
geisterhand run Calculator
# {"port":49152,"pid":12345,"app":"Calculator","host":"127.0.0.1"}

curl http://127.0.0.1:49152/accessibility/tree?format=compact
curl -X POST http://127.0.0.1:49152/click/element \
  -H "Content-Type: application/json" \
  -d '{"title": "7", "role": "AXButton"}'

Windows

Download from GitHub Releases, then:

geisterhand run Calculator

Linux

cargo install geisterhand
geisterhand run gnome-calculator

Using with Claude

Add Geisterhand as an MCP server so Claude can control desktop apps directly:

claude mcp add-json geisterhand \
  '{"type":"stdio","command":"npx","args":["geisterhand-mcp"]}' \
  --scope user

API

All platforms expose the same HTTP API:

Method Path Description
GET /status System info and permissions
GET /screenshot Capture screen or app window
POST /click Click at coordinates
POST /click/element Click element by title, role, or label
POST /type Type text
POST /key Press key with modifiers
POST /scroll Scroll at position
POST /wait Wait for element state
GET /accessibility/tree UI element hierarchy
GET /accessibility/elements Find elements by query
POST /accessibility/action Perform element action
GET /menu Get app menu structure
POST /menu Trigger menu item

Key Features

  • Background automation — Apps don't steal focus. Screenshots work even when windows are behind other apps.
  • Cross-platform — Same API on macOS, Windows, and Linux.
  • Scoped serversgeisterhand run creates a per-app server. No PID juggling.
  • Accessibility-first — Read and interact with UI elements by role, title, and label.
  • Local only — Binds to 127.0.0.1 by default. Your desktop stays private.

License

MIT — Skelpo GmbH

Popular repositories Loading

  1. macos macos Public

    Automate any Mac app with an HTTP API. Click buttons, type text, navigate menus, capture screenshots.

    Swift 2

  2. windows windows Public

    Screen automation tool for Windows - HTTP API for keyboard, mouse, screenshots & UI automation

    C# 1

  3. landing landing Public

    Astro

  4. homebrew-tap homebrew-tap Public

    Homebrew tap for Geisterhand - macOS screen automation tool

    Ruby

  5. linux linux Public

    Linux screen automation tool with HTTP API and CLI. Automate any Linux app — click, type, read accessibility trees, capture screenshots. Built on AT-SPI2 for GNOME (X11 & Wayland).

    Rust

  6. .github .github Public

    Organization profile and community health files

Repositories

Showing 6 of 6 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…