Skip to content

[FIX] Fix SyncEnvClient loop affinity issues and restore test/runtime compatibility#396

Open
burtenshaw wants to merge 7 commits intomainfrom
fix-async-on-websockets
Open

[FIX] Fix SyncEnvClient loop affinity issues and restore test/runtime compatibility#396
burtenshaw wants to merge 7 commits intomainfrom
fix-async-on-websockets

Conversation

@burtenshaw
Copy link
Collaborator

@burtenshaw burtenshaw commented Feb 17, 2026

Summary

This PR fixes cross-event-loop failures in synchronous client usage and resolves related test
collection/execution regressions across optional env modules and script utilities.

Problem

SyncEnvClient previously executed each async call independently, which can bind connection state (especially websockets) to one loop and then use it from another. This caused runtime errors like Future attached to a different loop in sync workflows.

In parallel, several tests failed due to:

  • optional dependencies imported at module import time,
  • websocket tests using async clients with sync with blocks,
  • helper signature/behavior drift in scripts/manage_hf_collection.py.

Changes

Core sync client fix

  • src/openenv/core/sync_client.py
    • Added a dedicated background event loop thread per SyncEnvClient.
    • Added _run() using asyncio.run_coroutine_threadsafe(...) so all calls share one loop.
    • Added _stop_loop() cleanup on close() and best-effort in del.
    • Added getattr delegation to wrap coroutine methods (e.g. call_tool, list_tools) as
      sync calls.

Websocket integration tests

  • tests/envs/test_websockets.py
    • Updated sync usage to with (...).sync() as client:.

Optional dependency resilience

  • tests/envs/test_python_codeact_reset.py
  • envs/textarena_env/server/environment.py
    • Made nltk import lazy inside _ensure_nltk_data() with explicit runtime error messaging.

HF collection script compatibility

  • scripts/manage_hf_collection.py
    • Enforced explicit HF_TOKEN requirement in setup_api() (matches test expectations).
    • Restored backward-compatible defaults for:
      • get_collection_spaces(...),
      • discover_openenv_spaces(...),
      • add_spaces_to_collection(...).
    • Hardened find_collection_by_title() for non-iterable mocked responses.
    • Updated main() call-site argument style for compatibility with existing tests.
    • Removed trailing whitespace.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 17, 2026
@burtenshaw burtenshaw changed the title [FIX] use a stateful asyn runner [FIX] Fix SyncEnvClient loop affinity issues and restore test/runtime compatibility Feb 17, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Greptile Summary

Replaced the run_async_safely() pattern with a stateful, per-instance event loop thread to fix cross-event-loop affinity issues in SyncEnvClient. This prevents "Future attached to a different loop" errors when WebSocket connections and other async state is created in one loop and used in another.

Key changes:

  • Added dedicated background event loop thread per SyncEnvClient instance with lazy initialization
  • Implemented _run() using asyncio.run_coroutine_threadsafe() to ensure all async calls share the same loop
  • Added proper cleanup via _stop_loop() in close() and best-effort cleanup in __del__
  • Added __getattr__ delegation to automatically wrap coroutine methods as sync calls
  • Removed dependency on run_async_safely() utility function

Issues identified:

  • Race condition in _ensure_loop() when multiple threads initialize concurrently
  • Outdated docstring still references removed run_async_safely()
  • Performance consideration: __getattr__ creates new wrapper functions on every access

Confidence Score: 3/5

  • This PR fixes a real issue but introduces a race condition that needs addressing before merge
  • The approach correctly solves the cross-event-loop affinity problem by maintaining a dedicated loop per client instance. However, there's a critical race condition in _ensure_loop() where concurrent initialization from multiple threads could start multiple loop threads. Additionally, the docstring is outdated. These are fixable issues, but the race condition could cause subtle bugs in multi-threaded usage.
  • src/openenv/core/sync_client.py needs race condition fix in _ensure_loop() and docstring update

Important Files Changed

Filename Overview
src/openenv/core/sync_client.py Replaced run_async_safely with dedicated event loop per SyncEnvClient instance to fix cross-loop affinity issues; added getattr delegation and proper cleanup

Sequence Diagram

sequenceDiagram
    participant User
    participant SyncEnvClient
    participant LoopThread as Background Loop Thread
    participant AsyncClient as EnvClient (async)
    participant Server

    User->>SyncEnvClient: __init__(async_client)
    Note over SyncEnvClient: _loop = None<br/>_loop_thread = None

    User->>SyncEnvClient: connect() / __enter__()
    SyncEnvClient->>SyncEnvClient: _ensure_loop()
    alt Loop not started
        SyncEnvClient->>LoopThread: Start thread (daemon=True)
        LoopThread->>LoopThread: Create new event loop
        LoopThread->>LoopThread: loop.run_forever()
        LoopThread-->>SyncEnvClient: _loop_ready.set()
    end
    
    SyncEnvClient->>SyncEnvClient: _run(async_client.connect())
    SyncEnvClient->>LoopThread: asyncio.run_coroutine_threadsafe(coro, loop)
    LoopThread->>AsyncClient: connect()
    AsyncClient->>Server: WebSocket connection
    Server-->>AsyncClient: Connected
    AsyncClient-->>LoopThread: Result
    LoopThread-->>SyncEnvClient: future.result()

    User->>SyncEnvClient: step(action)
    SyncEnvClient->>SyncEnvClient: _run(async_client.step(action))
    SyncEnvClient->>LoopThread: run_coroutine_threadsafe
    LoopThread->>AsyncClient: step(action)
    AsyncClient->>Server: Send action
    Server-->>AsyncClient: StepResult
    AsyncClient-->>LoopThread: Result
    LoopThread-->>SyncEnvClient: future.result()
    SyncEnvClient-->>User: StepResult

    User->>SyncEnvClient: close() / __exit__()
    SyncEnvClient->>SyncEnvClient: _run(async_client.close())
    SyncEnvClient->>LoopThread: run_coroutine_threadsafe
    LoopThread->>AsyncClient: close()
    AsyncClient->>Server: Disconnect
    Server-->>AsyncClient: Closed
    AsyncClient-->>LoopThread: Done
    LoopThread-->>SyncEnvClient: Complete
    
    SyncEnvClient->>SyncEnvClient: _stop_loop()
    SyncEnvClient->>LoopThread: loop.call_soon_threadsafe(loop.stop)
    LoopThread->>LoopThread: loop.stop()
    SyncEnvClient->>LoopThread: thread.join(timeout=5)
    Note over SyncEnvClient,LoopThread: Cleanup complete
Loading

Last reviewed commit: a28ba18

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Additional Comments (1)

src/openenv/core/sync_client.py
Outdated docstring references run_async_safely() which has been removed. Update to reflect the new dedicated event loop approach.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/openenv/core/sync_client.py
Line: 48:52

Comment:
Outdated docstring references `run_async_safely()` which has been removed. Update to reflect the new dedicated event loop approach.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant