Skip to content

AssertionError in _log_rollout_data when training qwen3-vl-8B with true_on_policy_mode #1109

@caikun-pjlab

Description

@caikun-pjlab

Hello! First of all, thank you for the great work on this framework.

I encountered an assertion error while running the true_on_policy_vlm example. Details are below.

Environment & Steps to Reproduce

Codebase Version: slime @ 0934a0e
Command Executed:

SLIME_SCRIPT_MODEL_NAME=Qwen3-VL-8B-Instruct SLIME_SCRIPT_NUM_GPUS=8 python examples/true_on_policy_vlm/run_simple.py

Error Description

When running the command above, the program throws an AssertionError which indicates a mismatch between the values of log_probs and rollout_log_probs.

Traceback (most recent call last):
  File "/root/slime/train.py", line 106, in <module>
    train(args)
  File "/root/slime/train.py", line 79, in train
    ray.get(actor_model.async_train(rollout_id, rollout_data_ref))
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2972, in get
    values, debugger_breakpoint = worker.get_objects(
                                  ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1031, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AssertionError): ^[[36mray::FSDPTrainRayActor.train()^[[39m (pid=858541, ip=10.102.98.154, actor_id=61e27bdd7141b62f9b9a4da302000000, repr=<slime.backends.fsdp_utils.actor.FSDPTrainRayActor object at 0x7fd9e7159430>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/slime/slime/backends/fsdp_utils/actor.py", line 486, in train
    self._train_core(rollout_id=rollout_id, rollout_data=rollout_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/slime/slime/backends/fsdp_utils/actor.py", line 550, in _train_core
    self._log_rollout_data(rollout_id, rollout_data, packed_batches)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/slime/slime/backends/fsdp_utils/actor.py", line 525, in _log_rollout_data
    assert log_dict["rollout/log_probs"] == log_dict["rollout/rollout_log_probs"], (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: CI check failed: true_on_policy_mode is enabled, but log_probs (-0.2051895260810852) != rollout_log_probs (-0.20450712740421295)

Additional Context

Notably, under the same configuration, the Qwen3-VL-2B-Instruct and Qwen3-VL-4B-Instruct models run successfully without this error. The issue appears to be specific to the Qwen3-VL-8B-Instruct model.

Are there any known solutions or directions for troubleshooting? Please let me know if you need more logs or environment details.

Thank you for your attention and help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions