Skip to content

fix: default strict_setup_readiness to False to avoid false infra failures#113

Merged
abrichr merged 1 commit intomainfrom
fix/focus-check-non-fatal
Mar 6, 2026
Merged

fix: default strict_setup_readiness to False to avoid false infra failures#113
abrichr merged 1 commit intomainfrom
fix/focus-check-non-fatal

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Mar 6, 2026

Summary

  • Changes strict_setup_readiness default from True to False
  • The post-setup focus check (added in PR fix(waa-live): gate app readiness and classify infra setup failures #107) uses a11y window enumeration to verify the target app is in foreground
  • In practice, LibreOffice takes longer to render window titles than the check timeout allows
  • This caused ALL LibreOffice tasks (3 of 4 Core4 tasks) to fail as infra failures
  • With False, the focus check still runs and logs warnings, but doesn't abort the task
  • Prior Core4 trials (7 successful runs) worked without this check

Test plan

  • Re-run Core4 trial — LibreOffice tasks should proceed to agent execution instead of failing on setup

🤖 Generated with Claude Code

…lures

The post-setup focus check (PR #107) defaults to strict mode, which
marks tasks as infrastructure failures when the a11y window enumeration
can't find the expected app title. In practice, LibreOffice windows
take longer to render titles than the check allows, causing ALL
LibreOffice tasks to fail as infra — even though the app IS open.

Changing default to False: focus check still runs and logs warnings,
but doesn't abort the task. The agent can recover from focus issues
on its own (it did in all prior trials without this check).

Use --strict-setup-readiness to opt into the fatal behavior when
the a11y detection is more reliable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr merged commit 73111d3 into main Mar 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant