Skip to content

Conversation

@wukaixingxp
Copy link
Contributor

Description

Add Generic OpenEnv GRPO Training Framework

A centralized framework for training language models on any OpenEnv task using GRPO/DAPO without
requiring environment-specific packages locally.

Key Features

  • GenericEnvClient: Works with ANY OpenEnv Docker image - no local environment packages needed
  • GenericAction: Simple dict wrapper that maps to environment-specific actions at runtime
  • Single Main Script: One main_generic.py handles all OpenEnv tasks
  • Circuit Breaker Pattern: Automatic detection and restart of unhealthy Docker containers
  • Episode Dropout: Configurable filtering of low-quality training batches (variance + truncation)
  • Demand-Driven Rollouts: Only generates episodes when replay buffer needs them
  • Backpressure Detection: Prevents buffer starvation by checking health before weight updates

Files Added (2,748 lines)
File: main_generic.py
Description: Core training orchestrator with dual async loops (rollouts + training)
────────────────────────────────────────
File: julia_utils_generic.py
Description: Julia task utilities with syntax validation
────────────────────────────────────────
File: python_utils_generic.py
Description: Python coding task utilities
────────────────────────────────────────
File: llama3_8b_julia_generic.yaml
Description: Julia training config
────────────────────────────────────────
File: llama3_8b_coding_generic.yaml
Description: Python training config
────────────────────────────────────────
File: README.md
Description: Comprehensive documentation with examples
Architecture

GenericRewardActor
├── env_actor_0 (container pool + circuit breaker)
├── env_actor_1 (container pool + circuit breaker)
└── ...

Each env_actor is isolated with its own circuit breaker for failure recovery.

Adding New Languages

To add support for a new language (e.g., Rust):

  1. Create rust_utils_generic.py with 3 functions: build_rust_action, evaluate_rust_response,
    transform_rust_sample
  2. Create YAML config referencing these via !function tags
  3. Run with python -m apps.openenv.main_generic --config <config.yaml>

Test Plan

  • Run Julia training: python -m apps.openenv.main_generic --config
    apps/openenv/llama3_8b_julia_generic.yaml
  • Run Python training: python -m apps.openenv.main_generic --config
    apps/openenv/llama3_8b_coding_generic.yaml
  • Verify circuit breaker triggers on container failures
  • Verify episode dropout filters low-quality batches
  • Verify W&B metrics are logged correctly
  • Test buffer starvation recovery with backpressure

wukaixingxp and others added 30 commits November 7, 2025 16:39
Resolve merge conflict in provisioner.py:
- Keep shutdown_context import with try/except fallback for compatibility
- Remove unused monarch.tools.commands import
- Use main's job.kill() shutdown logic with None check for shutdown_context
This commit adds three layers of defense against the buffer starvation
deadlock that can occur when training advances faster than rollouts:

1. Generator weight update timeout (generator.py):
   - Add 60s timeout to wait_for pending requests to complete
   - Prevents indefinite blocking if requests are stuck
   - Configurable via FORGE_WEIGHT_UPDATE_TIMEOUT_S env var

2. Buffer health check endpoint (replay_buffer.py):
   - Add health_check() method to predict buffer state after eviction
   - Returns surviving episode count, required batch size, and health status
   - Enables proactive backpressure decisions

3. Training loop improvements (main_generic.py):
   - Add buffer starvation detection with progressive warnings (1s, 10s, 30s)
   - Fail fast with helpful error after 60s of empty buffer
   - Add backpressure: wait for buffer health before weight updates
   - Configurable via FORGE_MAX_EMPTY_BUFFER_WAIT_S env var

Also includes related improvements:
   - Refactor dapo_loss to make_dapo_loss factory with configurable params
   - Add KL divergence monitoring and threshold warnings
   - Add TYPE_CHECKING imports to avoid openenv runtime import conflicts
This commit adds additional protections against rollout failures:

1. Rollout loop improvements (main_generic.py):
   - Add timeout protection to ALL actor calls (dataloader, generator, ref_model)
   - Add circuit breaker pattern for evaluation errors
   - Track consecutive errors and fail fast after 50 consecutive failures
   - Wrap entire rollout iteration in try/except to prevent thread crash
   - Configurable via FORGE_MAX_ROLLOUT_ERRORS and FORGE_ROLLOUT_TIMEOUT_S

2. Environment actor resilience (generic_openenv_client.py):
   - Add detection for WebSocket connection errors (keepalive ping timeout)
   - Increase retry attempts from 2 to 3
   - Add health_check() endpoint to verify environment responsiveness
   - Add metrics for reconnect success/failure tracking
   - Better error handling with automatic container recreation

These changes address the root cause discovered in log analysis:
- The Julia environment container's WebSocket connection died
- 3702 connection errors occurred but were not properly recovered
- Rollouts blocked silently without adding new episodes to buffer
Resolved conflict in provisioner.py:
- Kept shutdown_context import from openenv branch (used for graceful shutdown)
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 27, 2026
@wukaixingxp wukaixingxp changed the title Add Generic OpenEnv GRPO Training Framework (WIP) Add Generic OpenEnv GRPO Training Framework Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant