-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Hi, thanks for sharing this great repository.
I noticed that your repo supports GRPO training for Qwen-Image-Edit-2509, and that it allows choosing between LoRA and full-parameter training modes. I have a few questions regarding training setup and customization:
Multi-reference image training
(1) I would like to train the model to generate a target image conditioned on 4–5 reference images simultaneously. In this case, do you think a configuration like 8 × 140 is sufficient for either LoRA or full training?
(2) Are there any practical differences in feasibility or stability between LoRA and full training for this multi-reference setting?
Configuration considerations
For the above setup, which configuration aspects should I pay special attention to? For example:
(1) Image resolution / sequence length
(2) GRPO-specific hyperparameters (e.g., rollout length, reward normalization)
(3) Any model- or data-related constraints specific to Qwen-Image-Edit-2509 or 2511
Adding custom reward functions:
(1) If I want to add custom reward functions (e.g., for reference consistency or visual alignment). Which part of the codebase should I modify or extend?
(2) Is there a recommended interface or example for registering new reward functions in the GRPO pipeline?
Thanks a lot for your time and for open-sourcing this work. Any guidance would be greatly appreciated.