Skip to content

Question about paper on Qwen3-VL #1

@zrli03

Description

@zrli03

Woderful work! Thank you for your contributions, though I have some questions to be solved.
I noticed that you mentioned :

We observed that directly applying RL training on Qwen3-VL fails to yield improvements, likely because Qwen3-VL has undergone large-scale multi-task RL training that includes VTG data, preventing the model from generating rollouts with sufficient diversity on VTG task during our continual RL.Therefore, we first perform a small SFT stage to, in a sense, revert the model back to the “base model” state before RL. This is merely a workaround specific to Qwen3-VL, a model that has already acquired strong VTG capabilities through an RL stage similar to that proposed in this paper. In the common scenario, our recipes are designed to enhance the VTG capabilities of a “base MLLM”, where this trick is not required.

in page 16 of your paper.
My question:

  1. Is this strategy necessary when evaluating Qwen3-VL?
  2. Could you briefly introduce the SFT configuration/settings for Qwen3-VL?
  3. What do you mean by “common scenario”? Can I understand it as: this step may not be required for other tasks (e.g., VQA), but it is necessary when fine-tuning Qwen3-VL on our own VTG dataset?

These questions are intended purely for discussion. I will also run my own experiments to validate the points above. I hope we can have a pleasant, in-depth discussion, and thank you again for your contribution.
Wishing you all the best.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions