Skip to content

AttributeError: 'NoneType' object has no attribute 'rename_key_' #316

@bastianxux

Description

@bastianxux

尝试单卡跑通流程,遇到报错不知如何解决,配置基于example_grpo修改。另外,单卡推荐使用什么strategy,期待并感谢作者的答复。
跑错信息:
(ActorWorker(reference-0) pid=136685) [2026-01-11 21:25:34] [context_managers.py (41)] [INFO] [reference-0 0 / 1][PID 136685] reference/compute_log_probs_start_offload, memory allocated (GB): 0.0, memory reserved (GB): 0.0, memory max reserved (GB): 0.0, rss (GB): 0.9964942932128906 memory device used (GB): 4.25604248046875 (ActorWorker(reference-0) pid=136685) /root/miniconda3/envs/roll/lib/python3.10/site-packages/torch/cuda/memory.py:491: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats. (ActorWorker(reference-0) pid=136685) warnings.warn( (ActorWorker(reference-0) pid=136685) /root/miniconda3/envs/roll/lib/python3.10/site-packages/torch/cuda/memory.py:517: FutureWarning: torch.cuda.reset_max_memory_cached now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats. (ActorWorker(reference-0) pid=136685) warnings.warn( (ActorWorker(reference-0) pid=136685) [2026-01-11 21:25:35] [context_managers.py (41)] [INFO] [reference-0 0 / 1][PID 136685] reference/compute_log_probs_start_onload, memory allocated (GB): 0.0, memory reserved (GB): 0.0, memory max reserved (GB): 0.0, rss (GB): 0.9965057373046875 memory device used (GB): 17.13885498046875 (ActorWorker(reference-0) pid=136685) [2026-01-11 21:25:35] [context_managers.py (41)] [INFO] [reference-0 0 / 1][PID 136685] reference/compute_log_probs_end_onload, memory allocated (GB): 0.0028085708618164062, memory reserved (GB): 0.00390625, memory max reserved (GB): 0.00390625, rss (GB): 1.0738525390625 memory device used (GB): 17.52490234375 (ActorWorker(reference-0) pid=136685) [2026-01-11 21:25:35] [context_managers.py (41)] [INFO] [reference-0 0 / 1][PID 136685] reference/compute_log_probs_end_offload, memory allocated (GB): 0.0028085708618164062, memory reserved (GB): 0.00390625, memory max reserved (GB): 0.00390625, rss (GB): 1.0738525390625 memory device used (GB): 4.64208984375 Traceback (most recent call last): File "/root/ROLL/examples/start_rlvr_pipeline.py", line 36, in <module> main() File "/root/ROLL/examples/start_rlvr_pipeline.py", line 32, in main pipeline.run() File "/root/miniconda3/envs/roll/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) File "/root/ROLL/roll/pipeline/rlvr/rlvr_pipeline.py", line 568, in run ref_log_probs.rename(old_keys="log_probs", new_keys="ref_log_probs") File "/root/ROLL/roll/distributed/scheduler/protocol.py", line 542, in rename self.batch.rename_key_(tuple(old_keys), tuple(new_keys)) AttributeError: 'NoneType' object has no attribute 'rename_key_' (MathRuleRewardWorker(reward-math_rule-1) pid=137269) [2026-01-11 21:25:33] [worker.py (154)] [WARNING] [reward-math_rule-1 1 / 8][PID 137269] worker has not strategy [repeated 8x across cluster] (MathRuleRewardWorker(reward-math_rule-4) pid=137272) [2026-01-11 21:25:34] [platform.py (89)] [WARNING] [reward-math_rule-4 4 / 8][PID 137272] Current platform cpu does not have 'empty_cache' attribute. [repeated 8x across cluster] (ActorWorker(reference-0) pid=136685) Elapsed time: 0.8141 seconds [repeated 4x across cluster] (ActorWorker(reference-0) pid=136685) (EngineCore_DP0 pid=140470) INFO 01-11 21:25:35 [block_pool.py:292] Successfully reset prefix cache [repeated 2x across cluster] (ActorWorker(reference-0) pid=136685) (EngineCore_DP0 pid=140470) INFO 01-11 21:25:35 [gpu_worker.py:116] Sleep mode freed 12.88 GiB memory, 4.02 GiB memory is still in use.
配置信息:
`defaults:

  • ../config/deepspeed_zero@here
  • ../config/deepspeed_zero2@here
  • ../config/deepspeed_zero3@here
  • ../config/deepspeed_zero3_cpuoffload@here

hydra:
run:
dir: .
output_subdir: null

exp_name: "test"
seed: 42
logging_dir: ./output/logs
output_dir: ./output
system_envs:
USE_MODELSCOPE: '1'

checkpoint_config:
type: file_system
output_dir: /data/cpfs_0/rl_examples/models/${exp_name}

track_with: tensorboard
tracker_kwargs:
log_dir: /data/oss_bucket_0/rl_examples/llm/tensorboard/roll_exp/rlvr

num_gpus_per_node: 1

max_steps: 100
save_steps: 100
logging_steps: 1
eval_steps: 10
resume_from_checkpoint: false

--------------------------

important tips:

NOTE: Configurations prefixed with "example_" are for documentation purposes only;

no guarantee on training performance. For actual usage,

please refer to configurations without the "example_" prefix.

grpo related

rollout_batch_size: 4 # prompt
adv_estimator: "grpo"
num_return_sequences_in_group: 4

prompt_length: 2048
response_length: 2048

ppo_epochs: 1
use_kl_loss: true
kl_loss_coef: 0.001
loss_agg_mode: "seq-mean-token-mean"

ppo related

advantage

whiten_advantages: true
advantage_clip: 2.0
dual_clip_loss: true

clip

reward_clip: 10

normalize

reward_norm: null
reward_shift: false
reward_scale: false

reward

add_token_level_kl: false

--------------------------

Below are detailed configurations for each role.

pretrain: Qwen/Qwen2.5-0.5B-Instruct
reward_pretrain: Qwen/Qwen2.5-0.5B-Instruct

validation:
data_args:
template: qwen2_5
file_name:
- data/math_benchmarks.jsonl
generating_args:
max_new_tokens: ${response_length}
top_p: 0.6
top_k: 50
num_beams: 1
temperature: 0.6
num_return_sequences: 1

actor_train:
model_args:
# attn_implementation: fa2
disable_gradient_checkpointing: false
dtype: bf16
model_type: ~
training_args:
learning_rate: 1.0e-6
weight_decay: 0
per_device_train_batch_size: 1
gradient_accumulation_steps: 32
warmup_steps: 20
num_train_epochs: 50
data_args:
template: qwen2_5
file_name:
- data/math_deepmath_deal.jsonl
domain_interleave_probs:
math_rule: 1
dataset_dir: data
messages: messages
interleave_probs: "1.0"
preprocessing_num_workers: 16
strategy_args:
strategy_name: deepspeed_train
strategy_config: ${deepspeed_zero3}
# strategy_config:
# tensor_model_parallel_size: 1
# pipeline_model_parallel_size: 1
# expert_model_parallel_size: 1
# use_distributed_optimizer: true
# recompute_granularity: full
device_mapping: list(range(0,1))
infer_batch_size: 1

actor_infer:
model_args:
disable_gradient_checkpointing: true
dtype: bf16
generating_args:
max_new_tokens: ${response_length}
top_p: 0.99
top_k: 100
num_beams: 1
temperature: 0.99
num_return_sequences: ${num_return_sequences_in_group}
data_args:
template: qwen2_5
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.5
block_size: 16
max_model_len: 2048
device_mapping: list(range(0,1))
infer_batch_size: 1

reference:
model_args:
disable_gradient_checkpointing: true
dtype: bf16
model_type: ~
data_args:
template: qwen2_5
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.5
block_size: 16
max_model_len: 2048
# strategy_config:
# tensor_model_parallel_size: 1
# pipeline_model_parallel_size: 1
# expert_model_parallel_size: 1
device_mapping: list(range(0,1))
infer_batch_size: 2

rewards:

math_rule:
worker_cls: roll.pipeline.rlvr.rewards.math_rule_reward_worker.MathRuleRewardWorker
model_args:
model_name_or_path: ${reward_pretrain}
data_args:
template: qwen2_5
tag_included: [deepmath_103k, aime]
world_size: 8
infer_batch_size: 1`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions