-
Notifications
You must be signed in to change notification settings - Fork 82
(WIP) Add Generic OpenEnv GRPO Training Framework #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
wukaixingxp
wants to merge
44
commits into
meta-pytorch:main
Choose a base branch
from
wukaixingxp:openenv
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Resolve merge conflict in provisioner.py: - Keep shutdown_context import with try/except fallback for compatibility - Remove unused monarch.tools.commands import - Use main's job.kill() shutdown logic with None check for shutdown_context
This commit adds three layers of defense against the buffer starvation deadlock that can occur when training advances faster than rollouts: 1. Generator weight update timeout (generator.py): - Add 60s timeout to wait_for pending requests to complete - Prevents indefinite blocking if requests are stuck - Configurable via FORGE_WEIGHT_UPDATE_TIMEOUT_S env var 2. Buffer health check endpoint (replay_buffer.py): - Add health_check() method to predict buffer state after eviction - Returns surviving episode count, required batch size, and health status - Enables proactive backpressure decisions 3. Training loop improvements (main_generic.py): - Add buffer starvation detection with progressive warnings (1s, 10s, 30s) - Fail fast with helpful error after 60s of empty buffer - Add backpressure: wait for buffer health before weight updates - Configurable via FORGE_MAX_EMPTY_BUFFER_WAIT_S env var Also includes related improvements: - Refactor dapo_loss to make_dapo_loss factory with configurable params - Add KL divergence monitoring and threshold warnings - Add TYPE_CHECKING imports to avoid openenv runtime import conflicts
This commit adds additional protections against rollout failures: 1. Rollout loop improvements (main_generic.py): - Add timeout protection to ALL actor calls (dataloader, generator, ref_model) - Add circuit breaker pattern for evaluation errors - Track consecutive errors and fail fast after 50 consecutive failures - Wrap entire rollout iteration in try/except to prevent thread crash - Configurable via FORGE_MAX_ROLLOUT_ERRORS and FORGE_ROLLOUT_TIMEOUT_S 2. Environment actor resilience (generic_openenv_client.py): - Add detection for WebSocket connection errors (keepalive ping timeout) - Increase retry attempts from 2 to 3 - Add health_check() endpoint to verify environment responsiveness - Add metrics for reconnect success/failure tracking - Better error handling with automatic container recreation These changes address the root cause discovered in log analysis: - The Julia environment container's WebSocket connection died - 3702 connection errors occurred but were not properly recovered - Rollouts blocked silently without adding new episodes to buffer
Resolved conflict in provisioner.py: - Kept shutdown_context import from openenv branch (used for graceful shutdown)
# Conflicts: # src/forge/actors/vllm/v0/generator.py
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Add Generic OpenEnv GRPO Training Framework
A centralized framework for training language models on any OpenEnv task using GRPO/DAPO without
requiring environment-specific packages locally.
Key Features
Files Added (2,748 lines)
File: main_generic.py
Description: Core training orchestrator with dual async loops (rollouts + training)
────────────────────────────────────────
File: julia_utils_generic.py
Description: Julia task utilities with syntax validation
────────────────────────────────────────
File: python_utils_generic.py
Description: Python coding task utilities
────────────────────────────────────────
File: llama3_8b_julia_generic.yaml
Description: Julia training config
────────────────────────────────────────
File: llama3_8b_coding_generic.yaml
Description: Python training config
────────────────────────────────────────
File: README.md
Description: Comprehensive documentation with examples
Architecture
GenericRewardActor
├── env_actor_0 (container pool + circuit breaker)
├── env_actor_1 (container pool + circuit breaker)
└── ...
Each env_actor is isolated with its own circuit breaker for failure recovery.
Adding New Languages
To add support for a new language (e.g., Rust):
transform_rust_sample
Test Plan
apps/openenv/llama3_8b_julia_generic.yaml
apps/openenv/llama3_8b_coding_generic.yaml