Skip to content

[2/3] Add HarnessEnvironment with multi-turn support (RFC 005)#390

Merged
Darktex merged 3 commits intofeature/issue-385-harness-typesfrom
feature/issue-385-harness-environment
Feb 19, 2026
Merged

[2/3] Add HarnessEnvironment with multi-turn support (RFC 005)#390
Darktex merged 3 commits intofeature/issue-385-harness-typesfrom
feature/issue-385-harness-environment

Conversation

@Darktex
Copy link
Contributor

@Darktex Darktex commented Feb 17, 2026

Summary

Stacked on #389 (foundation types).

Implements HarnessEnvironment, which wraps an external agentic harness with OpenEnv's Gym-style API.

Key semantics

  • reset(): Stops any running harness, injects MCP tools, starts a fresh process and conversation
  • step(HarnessAction): Sends one conversational turn to the harness. The harness does its ReAct loop (potentially many LLM calls and tool invocations) and returns when it has a response
  • Multi-turn: The harness maintains conversation context across step() calls. Multiple steps form a conversation within one episode
  • done signal: Propagated from HarnessResponse.done to Observation.done
  • Trajectory: Events accumulated across all turns, accessible via .trajectory property
  • close(): Cleans up the harness process

Files

  • src/openenv/core/harnesses/environment.py - HarnessEnvironment implementation
  • tests/core/test_harnesses/test_harness_environment.py - 26 tests with MockHarnessAdapter

Test plan

  • 26 tests covering reset, multi-turn step, trajectory, state, close, MCP injection
  • 69 total harness tests pass (types + environment)
  • No regressions in MCP tests
  • Lint clean

Part of #385

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 17, 2026
@Darktex Darktex mentioned this pull request Feb 17, 2026
4 tasks
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR implements HarnessEnvironment, a new Environment subclass that wraps external agentic harnesses (OpenClaw, Claude Code, etc.) with OpenEnv's Gym-style API, as specified in RFC 005. Each step() is one conversational turn, and the harness maintains conversation context across turns within an episode.

  • Core reset()/step()/close() lifecycle is well-implemented with proper trajectory accumulation, state management, and done-signal propagation from the harness
  • Bug in _get_mcp_tool_definitions: The method splits fastmcp.Client's async context manager (__aenter__, list_tools, __aexit__) across three separate run_async_safely() calls, each of which creates a new event loop — the client state won't persist across these boundaries. The rest of the codebase uses a single async with block (see MCPEnvironment._async_list_tools). This needs to be fixed before merge
  • Silent except Exception: return [] in _get_mcp_tool_definitions will hide failures during MCP tool injection, making debugging difficult
  • Test suite is comprehensive (26 tests) with a well-designed MockHarnessAdapter

ALIGNMENT FLAG: RFC 005 specifies HarnessEnvironment(MCPEnvironment) but implementation uses HarnessEnvironment(Environment)

  • Principle at stake: RFC adherence — key decisions are documented in RFCs and should not be changed without discussion
  • The concern: The base class change may be intentional (since harness handles MCP differently than direct tool serving), but the deviation from RFC 005's design section should be explicitly acknowledged
  • Suggested reviewer: @Darktex

Confidence Score: 3/5

  • The core harness lifecycle is sound, but the MCP tool injection method has a bug that will cause runtime failures when MCP tools are present.
  • Score of 3 reflects that the main environment logic (reset/step/close/trajectory) is correct and well-tested, but _get_mcp_tool_definitions has a concrete bug (async context manager split across event loops) that will break MCP tool injection at runtime. The bug is in a non-default code path (only triggered when mcp is provided), so the core functionality works. Alignment question about base class deviation from RFC is also unresolved.
  • src/openenv/core/harnesses/environment.py — specifically the _get_mcp_tool_definitions method (lines 154-167)

Important Files Changed

Filename Overview
src/openenv/core/harnesses/init.py Adds HarnessEnvironment to the package exports. Clean and straightforward change.
src/openenv/core/harnesses/environment.py New HarnessEnvironment implementation. _get_mcp_tool_definitions has a bug: it splits an async context manager (__aenter__/list_tools/__aexit__) across separate event loops via run_async_safely, which will fail at runtime. Silent exception swallowing hides errors. The rest of the implementation (reset, step, close, trajectory) is clean and well-structured.
tests/core/test_harnesses/test_harness_environment.py Comprehensive test suite with 26 tests covering reset, step, multi-turn, trajectory, state, close, and MCP integration. Good use of MockHarnessAdapter. The MCP injection test (test_mcp_tools_injected_on_reset) only asserts is not None rather than checking actual tool contents, which is weak but non-blocking.

Sequence Diagram

sequenceDiagram
    participant TL as Training Loop
    participant HE as HarnessEnvironment
    participant HA as HarnessAdapter
    participant H as Harness (ReAct Loop)

    TL->>HE: reset(episode_id)
    HE->>HA: is_alive()
    HA-->>HE: true/false
    opt if alive
        HE->>HA: stop()
    end
    opt if MCP server present
        HE->>HE: _get_mcp_tool_definitions()
        HE->>HA: inject_tools(tools)
    end
    HE->>HA: start(working_directory)
    HE-->>TL: Observation(done=false)

    TL->>HE: step(HarnessAction("Fix the bug"))
    HE->>HA: send_message("Fix the bug")
    HA->>H: message
    H->>H: LLM calls + tool invocations
    H-->>HA: HarnessResponse(events, done=false)
    HA-->>HE: HarnessResponse
    HE->>HE: accumulate trajectory
    HE-->>TL: Observation(response, turn_events, done=false)

    TL->>HE: step(HarnessAction("Tests still failing"))
    HE->>HA: send_message("Tests still failing")
    HA->>H: message
    H->>H: LLM calls + tool invocations
    H-->>HA: HarnessResponse(events, done=true)
    HA-->>HE: HarnessResponse
    HE->>HE: accumulate trajectory
    HE-->>TL: Observation(response, turn_events, done=true)

    TL->>HE: close()
    HE->>HA: is_alive()
    HA-->>HE: true
    HE->>HA: stop()
Loading

Last reviewed commit: bf89368

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@Darktex Darktex force-pushed the feature/issue-385-harness-environment branch 2 times, most recently from 37a0876 to d8b1362 Compare February 17, 2026 21:50
@Darktex Darktex changed the title Add HarnessEnvironment with multi-turn support (RFC 005) [3/4] Add HarnessEnvironment with multi-turn support (RFC 005) Feb 17, 2026
Implements HarnessEnvironment, which wraps an external agentic harness
with OpenEnv's Gym-style API:

- reset() stops any running harness, injects MCP tools, starts fresh
- step(HarnessAction) sends one conversational turn to the harness
- Harness maintains conversation context across step() calls
- done signal propagated from harness to observation
- Trajectory accumulated across turns, accessible via .trajectory
- close() cleans up harness process

26 tests covering reset, multi-turn step, trajectory accumulation,
state management, close behavior, and MCP tool injection.

Part of #385
- Fix broken async context manager in _get_mcp_tool_definitions:
  use single async function with 'async with' instead of separate
  run_async_safely calls for __aenter__/__aexit__
- Add warning log on MCP tool extraction failure instead of
  silently swallowing exceptions
@Darktex Darktex changed the title [3/4] Add HarnessEnvironment with multi-turn support (RFC 005) [2/3] Add HarnessEnvironment with multi-turn support (RFC 005) Feb 18, 2026
@Darktex Darktex force-pushed the feature/issue-385-harness-types branch from a993c0c to 2408704 Compare February 18, 2026 07:06
@Darktex Darktex force-pushed the feature/issue-385-harness-environment branch from d8b1362 to f9f77c6 Compare February 18, 2026 07:06
* Add OpenClaw adapter implementation (RFC 005)

Concrete HarnessAdapter for the OpenClaw agentic platform:

- Process lifecycle: start/stop via asyncio subprocess
- MCP tool injection: writes mcpServers config to openclaw.json,
  merges with existing config entries
- Communication: JSON-line protocol over stdin/stdout
- Event extraction: parses tool_calls from responses into HarnessEvents
- Streaming: yields events from turn response
- Error handling: timeout detection, plain-text fallback

18 tests covering imports, config injection, process lifecycle,
message sending with JSON/plain-text responses, tool call extraction,
streaming, and HarnessEnvironment integration.

Part of #385

* Address review feedback: move import os to top of file

* Fix env variable bug and add missing test coverage

- Fix critical bug: env vars now merge with parent env (os.environ)
  instead of replacing it, preserving PATH, PYTHONPATH, etc.
- When no env_vars or api_key configured, pass None to inherit parent
- Add test for parent env inheritance with custom env vars
- Add test for None env when no overrides configured
- Add test for kill path when terminate times out
- Add test for send_message timeout handling
- Add test for corrupted config file handling
- Add subprocess pipe parameter assertions to start test
@Darktex Darktex merged commit 49e9fb6 into feature/issue-385-harness-types Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant