(WIP) Add Generic OpenEnv GRPO Training Framework #733

wukaixingxp · 2026-01-27T18:52:38Z

Description

Add Generic OpenEnv GRPO Training Framework

A centralized framework for training language models on any OpenEnv task using GRPO/DAPO without
requiring environment-specific packages locally.

Key Features

GenericEnvClient: Works with ANY OpenEnv Docker image - no local environment packages needed
GenericAction: Simple dict wrapper that maps to environment-specific actions at runtime
Single Main Script: One main_generic.py handles all OpenEnv tasks
Circuit Breaker Pattern: Automatic detection and restart of unhealthy Docker containers
Episode Dropout: Configurable filtering of low-quality training batches (variance + truncation)
Demand-Driven Rollouts: Only generates episodes when replay buffer needs them
Backpressure Detection: Prevents buffer starvation by checking health before weight updates

Files Added (2,748 lines)
File: main_generic.py
Description: Core training orchestrator with dual async loops (rollouts + training)
────────────────────────────────────────
File: julia_utils_generic.py
Description: Julia task utilities with syntax validation
────────────────────────────────────────
File: python_utils_generic.py
Description: Python coding task utilities
────────────────────────────────────────
File: llama3_8b_julia_generic.yaml
Description: Julia training config
────────────────────────────────────────
File: llama3_8b_coding_generic.yaml
Description: Python training config
────────────────────────────────────────
File: README.md
Description: Comprehensive documentation with examples
Architecture

GenericRewardActor
├── env_actor_0 (container pool + circuit breaker)
├── env_actor_1 (container pool + circuit breaker)
└── ...

Each env_actor is isolated with its own circuit breaker for failure recovery.

Adding New Languages

To add support for a new language (e.g., Rust):

Create rust_utils_generic.py with 3 functions: build_rust_action, evaluate_rust_response,
transform_rust_sample
Create YAML config referencing these via !function tags
Run with python -m apps.openenv.main_generic --config <config.yaml>

Test Plan

Run Julia training: python -m apps.openenv.main_generic --config
apps/openenv/llama3_8b_julia_generic.yaml
Run Python training: python -m apps.openenv.main_generic --config
apps/openenv/llama3_8b_coding_generic.yaml
Verify circuit breaker triggers on container failures
Verify episode dropout filters low-quality batches
Verify W&B metrics are logged correctly
Test buffer starvation recovery with backpressure

Resolve merge conflict in provisioner.py: - Keep shutdown_context import with try/except fallback for compatibility - Remove unused monarch.tools.commands import - Use main's job.kill() shutdown logic with None check for shutdown_context

This commit adds three layers of defense against the buffer starvation deadlock that can occur when training advances faster than rollouts: 1. Generator weight update timeout (generator.py): - Add 60s timeout to wait_for pending requests to complete - Prevents indefinite blocking if requests are stuck - Configurable via FORGE_WEIGHT_UPDATE_TIMEOUT_S env var 2. Buffer health check endpoint (replay_buffer.py): - Add health_check() method to predict buffer state after eviction - Returns surviving episode count, required batch size, and health status - Enables proactive backpressure decisions 3. Training loop improvements (main_generic.py): - Add buffer starvation detection with progressive warnings (1s, 10s, 30s) - Fail fast with helpful error after 60s of empty buffer - Add backpressure: wait for buffer health before weight updates - Configurable via FORGE_MAX_EMPTY_BUFFER_WAIT_S env var Also includes related improvements: - Refactor dapo_loss to make_dapo_loss factory with configurable params - Add KL divergence monitoring and threshold warnings - Add TYPE_CHECKING imports to avoid openenv runtime import conflicts

This commit adds additional protections against rollout failures: 1. Rollout loop improvements (main_generic.py): - Add timeout protection to ALL actor calls (dataloader, generator, ref_model) - Add circuit breaker pattern for evaluation errors - Track consecutive errors and fail fast after 50 consecutive failures - Wrap entire rollout iteration in try/except to prevent thread crash - Configurable via FORGE_MAX_ROLLOUT_ERRORS and FORGE_ROLLOUT_TIMEOUT_S 2. Environment actor resilience (generic_openenv_client.py): - Add detection for WebSocket connection errors (keepalive ping timeout) - Increase retry attempts from 2 to 3 - Add health_check() endpoint to verify environment responsiveness - Add metrics for reconnect success/failure tracking - Better error handling with automatic container recreation These changes address the root cause discovered in log analysis: - The Julia environment container's WebSocket connection died - 3702 connection errors occurred but were not properly recovered - Rollouts blocked silently without adding new episodes to buffer

Resolved conflict in provisioner.py: - Kept shutdown_context import from openenv branch (used for graceful shutdown)

# Conflicts: # src/forge/actors/vllm/v0/generator.py

wukaixingxp and others added 30 commits November 7, 2025 16:39

first try

9a696a6

important fix pad_id

2f13a21

1500 steps working

2693f1f

add DAPO

17cd39b

add general openenv recipe

3a5d829

temp-save,julia working

51f6138

third checkpoint

536e0c2

Merge branch 'meta-pytorch:main' into openenv

9eec90d

working on julia

daa57c5

Merge branch 'meta-pytorch:main' into openenv

9a9b79f

temp save

fb05a5d

revert src/forge/observability/metrics.py

81d8e3f

Merge remote-tracking branch 'origin/main' into openenv

d05b29b

Merge branch 'meta-pytorch:main' into openenv

e3bdedc

change for autoenv

09c9d10

Merge branch 'meta-pytorch:main' into openenv

d1ed89a

ready to merge

655135f

Merge branch 'meta-pytorch:main' into openenv

31f3104

clean up with general actions

398231c

generic working

75614e9

Merge main into openenv branch

2bc3af9

Resolve merge conflict in provisioner.py: - Keep shutdown_context import with try/except fallback for compatibility - Remove unused monarch.tools.commands import - Use main's job.kill() shutdown logic with None check for shutdown_context

temp save

bc08f75

add rdma

bb20417

Merge main into openenv branch

9435a25

Resolved conflict in provisioner.py: - Kept shutdown_context import from openenv branch (used for graceful shutdown)

add openenv folder

45aacc8

revert grpo

9e97ab3

refactor

86889b4

Merge branch 'meta-pytorch:main' into openenv

7755e28

wukaixingxp added 3 commits January 26, 2026 15:54

rebased non-rdma works

45e223b

improved docs

d99a9d2

revert provisioner.py

860f8e4

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 27, 2026

wukaixingxp changed the title ~~Add Generic OpenEnv GRPO Training Framework~~ (WIP) Add Generic OpenEnv GRPO Training Framework Jan 27, 2026

wukaixingxp and others added 11 commits January 27, 2026 10:57

generate when needed

56bbeb8

Merge branch 'meta-pytorch:main' into openenv

d2d777f

refactored

ac2a94a

Merge branch 'main' into openenv

b9afec7

# Conflicts: # src/forge/actors/vllm/v0/generator.py

renamed

4edbc2f

refactor working

659b0c5

only pull from dataset when needed

2aaa243

Merge branch 'meta-pytorch:main' into openenv

ee18e37

change to dapo

3917665

rdma retry added

aabbcfc

lint

26a70ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Add Generic OpenEnv GRPO Training Framework #733

(WIP) Add Generic OpenEnv GRPO Training Framework #733

Uh oh!

wukaixingxp commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

(WIP) Add Generic OpenEnv GRPO Training Framework #733

Are you sure you want to change the base?

(WIP) Add Generic OpenEnv GRPO Training Framework #733

Uh oh!

Conversation

wukaixingxp commented Jan 27, 2026

Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant