feat: add ROCK + iFlow CLI environment for agentic bash RL training by shamanez · Pull Request #1 · PluralisResearch/ROLL-agent

shamanez · 2026-03-10T12:02:43Z

Summary

New RockBashNativeEnv — an RL environment that connects ROLL's agentic pipeline with iFlow CLI running inside ROCK sandboxes, enabling RL training on autonomous bash/CLI tasks.
Bash task dataset (data/bash_tasks.jsonl) — 20 graded coding tasks (file creation, scripting, pytest, HTTP servers) with automated test-based reward signals.
Production config + launch script for L4×4 GPU setups using Qwen2.5-0.5B-Instruct with Megatron training + vLLM inference.
How-to guides and quick tests for onboarding.

How ROLL connects to iFlow inside ROCK

The core integration lives in roll/pipeline/agentic/env/rock_bash/env.py (RockBashNativeEnv).

The architecture is:

ROLL AgenticPipeline
  └─ RolloutScheduler
       └─ AgentNativeStepEnvManager
            └─ RockBashNativeEnv
                 └─ ROCK Sandbox (python:3.11 container)
                      ├─ iFlow CLI (autonomous coding agent)
                      └─ ModelProxyService (file-based IPC)

On first reset(), the environment:

Provisions a ROCK sandbox container via rock.sdk
Installs Node.js, iFlow CLI, and ModelProxyService inside the sandbox
Uploads iFlow settings (tool config, model endpoint, timeouts)
Sandboxes are reused across episodes — only the iFlow CLI process is restarted per task

Each episode:

_prepare_new_episode() kills the old iFlow process, clears the IPC log, restarts ModelProxyService, and launches iflow -p "<task prompt>" --yolo
iFlow CLI autonomously works on the task, making LLM calls whenever it needs to think/plan

How iFlow talks to the vLLM policy model during training

This is the key RL loop — iFlow CLI does not call an external API. Instead, its LLM requests are intercepted and served by ROLL's own policy model (vLLM), ensuring zero distribution shift between the model being trained and the model generating actions.

The mechanism is file-based IPC via /data/logs/LLMService.log:

iFlow CLI ──writes──▶ LLM_REQUEST_START{openai_json}LLM_REQUEST_END{meta}
                                          │
                               RockBashNativeEnv._poll_for_request()
                               reads the log file, parses the request
                                          │
                                          ▼
                          ROLL policy (vLLM actor_infer cluster)
                          generates tokens via RequestScheduler
                                          │
                               RockBashNativeEnv._write_response()
                               appends response to the same log file
                                          │
                                          ▼
iFlow CLI ──reads───▶ LLM_RESPONSE_START{response_json}LLM_RESPONSE_END{meta}

The ModelProxyService (installed via rl_rock[model-service]) handles the file IPC on the sandbox side. iFlow CLI is configured with baseUrl: "http://127.0.0.1:8080/v1/" and apiKey: "training" — it thinks it's talking to a local OpenAI-compatible API, but the requests are actually intercepted by the file IPC layer and routed to ROLL's vLLM workers.

After each episode completes (iFlow writes SESSION_END), the environment runs the task's test_command inside the sandbox and returns a binary reward (1.0 = pass, 0.0 = fail).

Files changed

File	Description
`roll/pipeline/agentic/env/rock_bash/env.py`	Core environment: sandbox lifecycle, file IPC, task evaluation
`roll/pipeline/agentic/env/rock_bash/__init__.py`	Module export
`roll/pipeline/agentic/env/__init__.py`	Register `rock_bash_native_env` in GEM
`examples/agentic_demo/agent_bash_iflow_prod.yaml`	Production config (Megatron + vLLM, L4×4)
`examples/agentic_demo/run_agentic_pipeline_bash_iflow.sh`	Launch script
`examples/start_agentic_pipeline.py`	Minor whitespace changes
`data/bash_tasks.jsonl`	20 bash tasks with test-based rewards
`agent_how_to_guide/`	Setup guides, install script, quick tests
`tests/agentic/env_manager/test_traj_env_manager.py`	Added Ray debugger helpers

Test plan

Verify ROCK sandbox provisioning works (curl http://localhost:8080 returns {"message":"hello, ROCK!"})
Run with small config: python examples/start_agentic_pipeline.py --config_path agentic_demo --config_name agent_bash_iflow_prod
Verify iFlow CLI installs correctly inside sandbox
Verify file IPC loop: request → vLLM generation → response → next request
Verify task evaluation: test_command runs and returns correct reward
Check sandbox reuse across episodes (no reinstall overhead after first episode)

🤖 Generated with Claude Code

shamanez added 3 commits March 12, 2026 11:14

new ones

f494bb7

minor fixes related to the max step.

86c7deb

minor modification to the logging.

186ebd2

shamanez changed the base branch from main to initial-agent March 13, 2026 04:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ROCK + iFlow CLI environment for agentic bash RL training#1

feat: add ROCK + iFlow CLI environment for agentic bash RL training#1
shamanez wants to merge 3 commits intoinitial-agentfrom
agent-roll-iflow

shamanez commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shamanez commented Mar 10, 2026

Summary

How ROLL connects to iFlow inside ROCK

How iFlow talks to the vLLM policy model during training

Files changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant