Skip to content

feat: add ROCK + iFlow CLI environment for agentic bash RL training#1

Open
shamanez wants to merge 3 commits intoinitial-agentfrom
agent-roll-iflow
Open

feat: add ROCK + iFlow CLI environment for agentic bash RL training#1
shamanez wants to merge 3 commits intoinitial-agentfrom
agent-roll-iflow

Conversation

@shamanez
Copy link
Collaborator

Summary

  • New RockBashNativeEnv — an RL environment that connects ROLL's agentic pipeline with iFlow CLI running inside ROCK sandboxes, enabling RL training on autonomous bash/CLI tasks.
  • Bash task dataset (data/bash_tasks.jsonl) — 20 graded coding tasks (file creation, scripting, pytest, HTTP servers) with automated test-based reward signals.
  • Production config + launch script for L4×4 GPU setups using Qwen2.5-0.5B-Instruct with Megatron training + vLLM inference.
  • How-to guides and quick tests for onboarding.

How ROLL connects to iFlow inside ROCK

The core integration lives in roll/pipeline/agentic/env/rock_bash/env.py (RockBashNativeEnv).

The architecture is:

ROLL AgenticPipeline
  └─ RolloutScheduler
       └─ AgentNativeStepEnvManager
            └─ RockBashNativeEnv
                 └─ ROCK Sandbox (python:3.11 container)
                      ├─ iFlow CLI (autonomous coding agent)
                      └─ ModelProxyService (file-based IPC)

On first reset(), the environment:

  1. Provisions a ROCK sandbox container via rock.sdk
  2. Installs Node.js, iFlow CLI, and ModelProxyService inside the sandbox
  3. Uploads iFlow settings (tool config, model endpoint, timeouts)
  4. Sandboxes are reused across episodes — only the iFlow CLI process is restarted per task

Each episode:

  1. _prepare_new_episode() kills the old iFlow process, clears the IPC log, restarts ModelProxyService, and launches iflow -p "<task prompt>" --yolo
  2. iFlow CLI autonomously works on the task, making LLM calls whenever it needs to think/plan

How iFlow talks to the vLLM policy model during training

This is the key RL loop — iFlow CLI does not call an external API. Instead, its LLM requests are intercepted and served by ROLL's own policy model (vLLM), ensuring zero distribution shift between the model being trained and the model generating actions.

The mechanism is file-based IPC via /data/logs/LLMService.log:

iFlow CLI ──writes──▶ LLM_REQUEST_START{openai_json}LLM_REQUEST_END{meta}
                                          │
                               RockBashNativeEnv._poll_for_request()
                               reads the log file, parses the request
                                          │
                                          ▼
                          ROLL policy (vLLM actor_infer cluster)
                          generates tokens via RequestScheduler
                                          │
                               RockBashNativeEnv._write_response()
                               appends response to the same log file
                                          │
                                          ▼
iFlow CLI ──reads───▶ LLM_RESPONSE_START{response_json}LLM_RESPONSE_END{meta}

The ModelProxyService (installed via rl_rock[model-service]) handles the file IPC on the sandbox side. iFlow CLI is configured with baseUrl: "http://127.0.0.1:8080/v1/" and apiKey: "training" — it thinks it's talking to a local OpenAI-compatible API, but the requests are actually intercepted by the file IPC layer and routed to ROLL's vLLM workers.

After each episode completes (iFlow writes SESSION_END), the environment runs the task's test_command inside the sandbox and returns a binary reward (1.0 = pass, 0.0 = fail).

Files changed

File Description
roll/pipeline/agentic/env/rock_bash/env.py Core environment: sandbox lifecycle, file IPC, task evaluation
roll/pipeline/agentic/env/rock_bash/__init__.py Module export
roll/pipeline/agentic/env/__init__.py Register rock_bash_native_env in GEM
examples/agentic_demo/agent_bash_iflow_prod.yaml Production config (Megatron + vLLM, L4×4)
examples/agentic_demo/run_agentic_pipeline_bash_iflow.sh Launch script
examples/start_agentic_pipeline.py Minor whitespace changes
data/bash_tasks.jsonl 20 bash tasks with test-based rewards
agent_how_to_guide/ Setup guides, install script, quick tests
tests/agentic/env_manager/test_traj_env_manager.py Added Ray debugger helpers

Test plan

  • Verify ROCK sandbox provisioning works (curl http://localhost:8080 returns {"message":"hello, ROCK!"})
  • Run with small config: python examples/start_agentic_pipeline.py --config_path agentic_demo --config_name agent_bash_iflow_prod
  • Verify iFlow CLI installs correctly inside sandbox
  • Verify file IPC loop: request → vLLM generation → response → next request
  • Verify task evaluation: test_command runs and returns correct reward
  • Check sandbox reuse across episodes (no reinstall overhead after first episode)

🤖 Generated with Claude Code

@shamanez shamanez changed the base branch from main to initial-agent March 13, 2026 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant