-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Woderful work! Thank you for your contributions, though I have some questions to be solved.
I noticed that you mentioned :
We observed that directly applying RL training on Qwen3-VL fails to yield improvements, likely because Qwen3-VL has undergone large-scale multi-task RL training that includes VTG data, preventing the model from generating rollouts with sufficient diversity on VTG task during our continual RL.Therefore, we first perform a small SFT stage to, in a sense, revert the model back to the “base model” state before RL. This is merely a workaround specific to Qwen3-VL, a model that has already acquired strong VTG capabilities through an RL stage similar to that proposed in this paper. In the common scenario, our recipes are designed to enhance the VTG capabilities of a “base MLLM”, where this trick is not required.
in page 16 of your paper.
My question:
- Is this strategy necessary when evaluating Qwen3-VL?
- Could you briefly introduce the SFT configuration/settings for Qwen3-VL?
- What do you mean by “common scenario”? Can I understand it as: this step may not be required for other tasks (e.g., VQA), but it is necessary when fine-tuning Qwen3-VL on our own VTG dataset?
These questions are intended purely for discussion. I will also run my own experiments to validate the points above. I hope we can have a pleasant, in-depth discussion, and thank you again for your contribution.
Wishing you all the best.