Skip to content

fix: detect and dismiss Windows lock screen before each task#117

Merged
abrichr merged 4 commits intomainfrom
fix/lock-screen-dismiss
Mar 8, 2026
Merged

fix: detect and dismiss Windows lock screen before each task#117
abrichr merged 4 commits intomainfrom
fix/lock-screen-dismiss

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Mar 8, 2026

Summary

  • Adds _dismiss_lock_screen() to run_dc_eval.py that detects LogonUI.exe process and types password to unlock
  • Called from ensure_waa_ready() after each successful probe, before every task
  • Prevents eval failures when Windows VM has been idle and lock screen engaged

Context

Windows lock screen was likely a contributing factor to infrastructure failures in previous eval trials. When the VM sits idle between tasks or sessions, Windows can lock the screen. The agent then sees a login UI instead of the desktop, causing focus checks to fail and tasks to abort.

Test plan

  • Verified LogonUI.exe detection returns False on unlocked VM
  • /evaluate endpoint confirmed working after QEMU reset (reverse proxy to container evaluate_server on port 5050)
  • Non-blocking: check errors don't prevent task execution

🤖 Generated with Claude Code

abrichr and others added 4 commits March 8, 2026 12:32
Implements the correction flywheel MVP:

- correction_store.py: JSON-file-based correction library with
  save/find (fuzzy string matching via SequenceMatcher)/load_all
- correction_capture.py: Human correction capture using openadapt-capture
  Recorder (primary) with PIL screenshot fallback
- correction_parser.py: VLM call to parse before/after screenshots
  into PlanStep dict (think/action/expect)
- demo_controller.py: Added correction_store and enable_correction_capture
  params. On retry exhaustion: check correction store -> inject match,
  or capture human correction -> parse -> store -> advance
- cli.py: Added --correction-library and --enable-correction-capture flags

The loop: agent fails at step N -> correction store checked -> if match,
inject corrected step -> if no match and capture enabled, human completes
step -> Recorder captures -> VLM parses -> correction stored -> next run
retrieves it.

17 tests added, all passing. 54 existing demo_controller tests unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test was calling the real Recorder which may not have
wait_for_ready in the installed version. Mock it to use
the simple fallback path since this is a unit test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _dismiss_lock_screen() to run_dc_eval.py that checks for LogonUI.exe
process and types the password to unlock if the screen is locked. Called
from ensure_waa_ready() after each successful probe.

This prevents eval failures when the Windows VM has been idle and the
lock screen has engaged between tasks or between sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr force-pushed the fix/lock-screen-dismiss branch from 03e2fe7 to 8d2d11d Compare March 8, 2026 16:33
@abrichr abrichr merged commit 4a28653 into main Mar 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant