fix: use NSApplication.shared for proper WindowServer connection#2
fix: use NSApplication.shared for proper WindowServer connection#2Korkyzer wants to merge 3 commits intodifferent-ai:mainfrom
Conversation
CGDisplayCreateImage returns desktop wallpaper instead of actual screen content when running as a background process. This is because RunLoop.main.run() does not establish a WindowServer connection, so macOS TCC silently denies real screen capture. Fix: Initialize NSApplication.shared with .accessory activation policy before setting up capture timers, and use app.run() instead of RunLoop.main.run().
|
Thanks. But why the app icon though? Is it necessary for this to be successfully registered? |
|
lemme know @Korkyzer |
|
@benjaminshafii Honestly it's not strictly necessary, just added it to make the bundle feel more complete/clean. Happy to remove it if you'd prefer to keep the PR focused on the fix only. |
- maxDimension 1280→2560, jpegQuality 0.45→0.85 - Run both accessibility + OCR, keep longer result - Color-invert frames before OCR for dark UIs - minimumTextHeight 0.005→0.002 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TCC permissions breaking after codesign (fix)If Root cause: The LaunchAgent runs Fix: Split into two LaunchAgents that launch <!-- com.differentai.agentwatch.plist -->
<key>ProgramArguments</key>
<array>
<string>/path/to/AgentWatch.app/Contents/MacOS/agent-watch</string>
<string>daemon</string>
</array><!-- com.differentai.agentwatch.serve.plist -->
<key>ProgramArguments</key>
<array>
<string>/path/to/AgentWatch.app/Contents/MacOS/agent-watch</string>
<string>serve</string>
<string>--host</string>
<string>127.0.0.1</string>
<string>--port</string>
<string>41733</string>
</array>After any
Also pushed OCR improvements to my fork ( |
Improve OCR text extraction quality (dark UIs, resolution, extraction logic)
Problem
The current OCR pipeline misses most text on dark-themed applications (WhatsApp, Slack, Discord, etc.). On a WhatsApp conversation with dozens of visible messages, agent-watch only captured ~80 characters (menu bar text: "File Edit Chat Call View Window Help").
Root causes identified
NativeTextExtractorshort-circuits OCR: The accessibility extractor runs first. If it returns ≥minimumAccessibilityChars(even just menu bar items), OCR is skipped entirely. For WhatsApp, accessibility returns ~100 chars of sidebar/menu text, satisfying the minimum — so the Vision framework OCR never runs on the actual message content.Frame buffer resolution too low:
FrameBufferStoredownscales captures tomaxDimension = 1280, which halves Retina resolution (2560 → 1280). Text becomes too small for reliable OCR, especially in dense UIs.Apple Vision framework struggles with dark themes:
VNRecognizeTextRequestperforms poorly on light-on-dark text. The Vision framework was designed primarily for document scanning (dark text on light backgrounds).Changes
1.
NativeTextExtractor.swift— Always run both extractors, keep the bestBefore: Accessibility runs first; if it returns enough chars, OCR is skipped.
After: Both accessibility AND OCR always run. The result with more text wins.
2.
FrameBufferStore.swift— Increase resolution to full RetinaDisk impact: frames go from ~250KB to ~400-800KB. With the existing retention/pruning policy this remains well under control.
3.
OCRTextExtractor.swift— Color inversion for dark themesRuns OCR twice: once on the original image, once on a color-inverted version (using CoreImage
CIColorInvert). Keeps whichever result contains more text. Also loweredminimumTextHeightfrom 0.005 to 0.002 to catch smaller text.Results
Environment
4. LaunchAgent — Fix TCC accessibility permission persistence
Problem:
accessibilityGrantedalways showedfalseafter reboot, even with AgentWatch toggled ON in System Settings.Root cause: The LaunchAgent ran a bash script (
launch) as intermediary. macOS TCC grants permissions per-process based on code signing identity. Thelaunchbash script had its own identity (launch-3c34cc36...), so when the user granted accessibility to "AgentWatch.app", the TCC entry didn't match the actual process callingAXIsProcessTrusted().Fix: Split into two separate LaunchAgents that launch the
agent-watchbinary directly:com.differentai.agentwatch.plist→agent-watch daemoncom.differentai.agentwatch.serve.plist→agent-watch serve --host 127.0.0.1 --port 41733Both have
RunAtLoad: true. The TCC grant now correctly matches the process identity and persists across reboots.Results
Environment
Related