Skip to content

Question about GRPO training setup for Qwen-Image-Edit-2509 or 2511 #5

@Weistrass

Description

@Weistrass

Hi, thanks for sharing this great repository.

I noticed that your repo supports GRPO training for Qwen-Image-Edit-2509, and that it allows choosing between LoRA and full-parameter training modes. I have a few questions regarding training setup and customization:

Multi-reference image training

(1) I would like to train the model to generate a target image conditioned on 4–5 reference images simultaneously. In this case, do you think a configuration like 8 × 140 is sufficient for either LoRA or full training?

(2) Are there any practical differences in feasibility or stability between LoRA and full training for this multi-reference setting?

Configuration considerations

For the above setup, which configuration aspects should I pay special attention to? For example:

(1) Image resolution / sequence length

(2) GRPO-specific hyperparameters (e.g., rollout length, reward normalization)

(3) Any model- or data-related constraints specific to Qwen-Image-Edit-2509 or 2511

Adding custom reward functions:

(1) If I want to add custom reward functions (e.g., for reference consistency or visual alignment). Which part of the codebase should I modify or extend?

(2) Is there a recommended interface or example for registering new reward functions in the GRPO pipeline?

Thanks a lot for your time and for open-sourcing this work. Any guidance would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions