添加训推修复功能(add feature train-infer-mismatch),更完整,更全面#288
Open
millioniron wants to merge 8 commits intoalibaba:mainfrom
Open
添加训推修复功能(add feature train-infer-mismatch),更完整,更全面#288millioniron wants to merge 8 commits intoalibaba:mainfrom
millioniron wants to merge 8 commits intoalibaba:mainfrom
Conversation
new file: examples/qwen2.5-infer_correction/agentic_webshop_infer_correction.yaml new file: examples/qwen2.5-infer_correction/rlvr_infer_correction_config.yaml new file: examples/qwen2.5-infer_correction/run_agentic_pipeline_webshop.sh new file: examples/qwen2.5-infer_correction/run_rlvr_pipeline.sh modified: roll/configs/base_config.py modified: roll/configs/generating_args.py modified: roll/distributed/scheduler/generate_scheduler.py modified: roll/pipeline/agentic/env_manager/step_env_manager.py modified: roll/pipeline/agentic/env_manager/traj_env_manager.py modified: roll/pipeline/base_worker.py modified: roll/pipeline/rlvr/actor_pg_worker.py modified: roll/pipeline/rlvr/actor_worker.py
new file: examples/qwen2.5-infer_correction/agentic_webshop_infer_correction.yaml new file: examples/qwen2.5-infer_correction/rlvr_infer_correction_config.yaml new file: examples/qwen2.5-infer_correction/run_agentic_pipeline_webshop.sh new file: examples/qwen2.5-infer_correction/run_rlvr_pipeline.sh modified: roll/configs/base_config.py modified: roll/configs/generating_args.py modified: roll/distributed/scheduler/generate_scheduler.py modified: roll/pipeline/agentic/env_manager/step_env_manager.py modified: roll/pipeline/agentic/env_manager/traj_env_manager.py modified: roll/pipeline/base_worker.py modified: roll/pipeline/rlvr/actor_pg_worker.py modified: roll/pipeline/rlvr/actor_worker.py
Author
|
done |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
我进行了合并修订。具体来讲,对于新版本中有关infer-log-prob的获取部分我遵循了官方的版本。但是在具体的训推修复部分,是使用我自己的修订。
✨ What's Changed
What does this PR do?
✨ What's Changed
1. 核心组件重构
InferCorrectionHandler(roll/utils/infer_correction.py)类:专注处理IS校正+样本拒绝,替代原loss_func中混杂逻辑2. 三级拒绝策略体系
infer_token_mask_threshold_{min,max}enable_seq_reject,infer_seq_mask_threshold_{min,max}infer_catastrophic_threshold3. 智能重要性采样
token:传统token级IS(默认)sequence:序列总log-ratio(稳定长序列训练)geometric:几何平均比率(平衡极端值)none:关闭IS(基准测试用)4. 工业级诊断系统
StatsCollector集中管理指标,分三类:token_ratio_mean/std/min/maxtoken_reject_frac,seq_reject_frac,catastrophic_seq_fracinferkl(原始KL),inferkl_reject(拒绝后KL)