-
Notifications
You must be signed in to change notification settings - Fork 394
Description
Hello! First of all, thank you for the great work on this framework.
I encountered an assertion error while running the true_on_policy_vlm example. Details are below.
Environment & Steps to Reproduce
Codebase Version: slime @ 0934a0e
Command Executed:
SLIME_SCRIPT_MODEL_NAME=Qwen3-VL-8B-Instruct SLIME_SCRIPT_NUM_GPUS=8 python examples/true_on_policy_vlm/run_simple.py
Error Description
When running the command above, the program throws an AssertionError which indicates a mismatch between the values of log_probs and rollout_log_probs.
Traceback (most recent call last):
File "/root/slime/train.py", line 106, in <module>
train(args)
File "/root/slime/train.py", line 79, in train
ray.get(actor_model.async_train(rollout_id, rollout_data_ref))
File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2972, in get
values, debugger_breakpoint = worker.get_objects(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1031, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AssertionError): ^[[36mray::FSDPTrainRayActor.train()^[[39m (pid=858541, ip=10.102.98.154, actor_id=61e27bdd7141b62f9b9a4da302000000, repr=<slime.backends.fsdp_utils.actor.FSDPTrainRayActor object at 0x7fd9e7159430>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/slime/slime/backends/fsdp_utils/actor.py", line 486, in train
self._train_core(rollout_id=rollout_id, rollout_data=rollout_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/slime/slime/backends/fsdp_utils/actor.py", line 550, in _train_core
self._log_rollout_data(rollout_id, rollout_data, packed_batches)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/slime/slime/backends/fsdp_utils/actor.py", line 525, in _log_rollout_data
assert log_dict["rollout/log_probs"] == log_dict["rollout/rollout_log_probs"], (
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: CI check failed: true_on_policy_mode is enabled, but log_probs (-0.2051895260810852) != rollout_log_probs (-0.20450712740421295)
Additional Context
Notably, under the same configuration, the Qwen3-VL-2B-Instruct and Qwen3-VL-4B-Instruct models run successfully without this error. The issue appears to be specific to the Qwen3-VL-8B-Instruct model.
Are there any known solutions or directions for troubleshooting? Please let me know if you need more logs or environment details.
Thank you for your attention and help!