-
Notifications
You must be signed in to change notification settings - Fork 207
Open
Description
The steps taken to inspect the process using pystack are as follows:
file "/root/miniconda3/env/roll/python3.11/site-packages/vllm/excutor/uniproc_excutor.py", line 56, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
File "/root/miniconda3/env/roll/python3.11/site-packages/vllm/utils/__init__.py", line 2985, in run_method
return func(*args, **kwargs)
File "/home/ROLL/roll/third_party/vllm/worker_helper.py", in line 80, in setup_collective_group
collective.allreduce(torch.zeros(1).to(current_platform.device_type), group_name=group_name)
File "/home/ROLL/roll/third_party/utils/collective/collective.py", in line 84, in allreduce
dist.all_reduce(tensor, op=op, group=_group_mgr.get_group_by_name(group_name))
File "/root/miniconda3/env/roll/python3.11/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
return func(*args, **kwargs)
File "/root/miniconda3/env/roll/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 2180, in all_reuduce
work = group.allreduce([tensor], opts)
Here is the training config used:
defaults:
- ../config/deepspeed_zero@_here_
- ../config/deepspeed_zero2@_here_
- ../config/deepspeed_zero3@_here_
- ../config/deepspeed_zero3_cpuoffload@_here_
hydra:
run:
dir: .
output_subdir: null
pg_variant: topr # topr, vanilla, tis, cispo, kimi15, ppo
exp_name: Qwen3-8B-RLVR-${pg_variant}
seed: 42
logging_dir: ./output/logs
output_dir: ./output
system_envs:
USE_MODELSCOPE: '1'
checkpoint_config:
type: file_system
output_dir: /data/cpfs_0/rl_examples/models/${exp_name}
num_gpus_per_node: 8
max_steps: 500
save_steps: 100
logging_steps: 1
eval_steps: 10
resume_from_checkpoint: false
rollout_batch_size: 128 # prompt
prompt_length: 2048
response_length: 8192
num_return_sequences_in_group: 8
ppo_epochs: 1
adv_estimator: "reinforce"
# clip
value_clip: 0.5
reward_clip: 10
advantage_clip: 2.0
dual_clip_loss: true
# normalize
norm_mean_type: batch
norm_std_type: batch
# data mask
max_len_mask: true
difficulty_mask: true
difficulty_low_threshold: 0.1
difficulty_high_threshold: 0.95
error_max_len_clip: false
# data weight
difficulty_loss_weight: false
length_loss_weight: false
# reward
add_token_level_kl: false
# advantage
whiten_advantages: true
pretrain: Qwen/Qwen3-8B
reward_pretrain: Qwen/Qwen3-8B
validation:
data_args:
template: qwen3
file_name:
- data/math_benchmarks.jsonl
generating_args:
top_p: 0.6
top_k: 50
num_beams: 1
temperature: 0.6
num_return_sequences: 1
eval_steps: 10
actor_train:
worker_cls: roll.pipeline.rlvr.actor_pg_worker.ActorPGWorker
pg_variant: topr # topr, vanilla, tis, cispo, kimi15, ppo
model_args:
flash_attn: fa2
disable_gradient_checkpointing: false
dtype: bf16
model_type: ~
training_args:
learning_rate: 1.0e-6
weight_decay: 0
per_device_train_batch_size: 1
gradient_accumulation_steps: 64
warmup_steps: 20
num_train_epochs: 50
data_args:
template: qwen2_5
file_name:
- data/math_deepmath_deal.jsonl
domain_interleave_probs:
math_rule: 1
dataset_dir: data
messages: messages
interleave_probs: "1.0"
preprocessing_num_workers: 16
strategy_args:
strategy_name: deepspeed_train
strategy_config:${deepspeed_zero3}
device_mapping: list(range(0,4))
infer_batch_size: 4
actor_infer:
model_args:
flash_attn: fa2
disable_gradient_checkpointing: true
dtype: bf16
generating_args:
max_new_tokens: ${response_length}
top_p: 0.99
top_k: 100
num_beams: 1
temperature: 0.99
num_return_sequences: ${num_return_sequences_in_group}
data_args:
template: qwen3
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.6
block_size: 16
max_model_len: 8000
device_mapping: list(range(4,6))
infer_batch_size: 1
reference:
model_args:
flash_attn: fa2
disable_gradient_checkpointing: true
dtype: bf16
model_type: ~
data_args:
template: qwen2_5
strategy_args:
strategy_name: hf_infer
strategy_config: ~
device_mapping: list(range(6,8))
infer_batch_size: 8
rewards:
math_rule:
worker_cls: roll.pipeline.rlvr.rewards.math_rule_reward_worker.MathRuleRewardWorker
model_args:
model_name_or_path: ${reward_pretrain}
data_args:
template: qwen2_5
tag_included: [deepmath_103k, 'MATH-500', 'OlympiadBench', 'minervamath', 'aime2025', 'gsm8k', 'aime', 'amc23', 'math_rule']
world_size: 8
infer_batch_size: 1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels