Replies: 33 comments
-
|
Hi! Thanks for checking out this repo! Multi-reference image training
That will be sufficient, I think. I haven't tried these many condition images yet. But for LoRA tuning, it should be enough. As for full, maybe it requires lower-resolution for condition images.
I suggest to train a LoRA with some easy reward and see if the reward goes well as expected. Then, turn to full-fine tuning. There is no much existing known feasibility/stability differences. Configuration considerations
I suggest to set
No. They share the exact same architecture and just trained with different weights. If your task is not about like human face or realistism, using a random one is just fine. Adding custom reward functions:
Please check here for more infomation. Just implement your own logic of computing rewards in this file and set corresponding config (see below). An example configuration fileHere I provide an example config file for you. Sorry that I don't have the GPU resources in the recent few days, I cannot verify it by myself. Please bring any feedback or issues. Thanks! # Environment Configuration
launcher: "accelerate" # Options: accelerate
config_file: config/accelerate_configs/fsdp2.yaml # Use FSDP2 to shard the model as well. Switch to config/deepspeed/deepspeed_zero2.yaml if you have enough GPU memory.
num_processes: 8 # Number of processes to launch (overrides config file)
main_process_port: 29500
mixed_precision: "bf16" # Options: no, fp16, bf16
run_name: null # Run name (auto: {model_type}_{finetune_type}_{timestamp})
project: "Flow-Factory" # Project name for logging
logging_backend: "wandb" # Options: wandb, swanlab, none
# Data Configuration
data:
dataset_dir: "dataset/sharegpt4o_image_mini" # Path to dataset folder
preprocessing_batch_size: 8 # Batch size for preprocessing
dataloader_num_workers: 16 # Number of workers for DataLoader
enable_preprocess: true # Enable dataset preprocessing
force_reprocess: true # Force reprocessing of the dataset
cache_dir: "~/.cache/flow_factory/datasets" # Cache directory for preprocessed datasets
max_dataset_size: 1000 # Limit the maximum number of samples in the dataset
# Model Configuration
model:
finetune_type: 'lora' # Options: full, lora
lora_rank : 64
lora_alpha : 128
target_modules: "default" # Try default first, if OOM, then try ["to_k", "to_q", "to_v", "to_out.0"] for attention layers only.
model_name_or_path: "Qwen/Qwen-Image-Edit-2509" # Qwen/Qwen-Image-Edit-2509 or Qwen/Qwen-Image-Edit-2511
model_type: "qwen-image-edit-plus"
resume_path: null # Path to load previous checkpoint/lora adapter
resume_training_state: false # Whether to resume training state, only effective when resume_path is a directory with full checkpoint
log:
save_dir: "~/Flow-Factory" # Directory to save model checkpoints and logs
save_freq: 20 # Save frequency in epochs (0 to disable)
save_model_only: true # Save only the model weights (not optimizer, scheduler, etc.)
# Training Configuration
train:
# Training settings
trainer_type: 'grpo'
enable_gradient_checkpointing: true # Enable gradient checkpointing to save memory with extra compute. If OOM, set it `true`.
# Image settings
resolution: 512 # 512 is worth trying at first, if OOM, try 384 and then 256.
condition_image_size: [512, 512] # Keep as 512, or set the same as `resolution` above.
# Batch and sampling
per_device_batch_size: 1 # Qwen-Image-Edit-Plus accepts varying length of multi-image as condition, so the batch_size will always fallback to 1.
group_size: 16 # Group size for GRPO sampling
global_std: false # Use global std for advantage normalization
unique_sample_num_per_epoch: 48 # Unique samples per group
gradient_step_per_epoch: 2 # Gradient steps per epoch
# Clipping
clip_range: 1.0e-4 # PPO/GRPO clipping range
adv_clip_range: 5.0 # Advantage clipping range
max_grad_norm: 1.0 # Max gradient norm for clipping
# KL div
kl_type: 'v-based' # Options: 'x-based', 'v-based'
kl_beta: 0 # KL divergence beta. Set to 0 to disable it and save memory at first. If rewards grows as expected, set it to values like 0.04.
ref_param_device: 'same_as_model' # Options: cpu, same_as_model
# Denoising process
num_inference_steps: 10 # Number of timesteps. For many tasks, 10 is good enough. Increase it to 20 if your task requires higher quality.
guidance_scale: 4 # Guidance scale for sampling. 4 is following recommendation from Qwen-Image-Edit official model card.
# Optimization
seed: 42 # Random seed
learning_rate: 3.0e-4 # Initial learning rate 3.0e-4 for LoRA and 1.0e-5 for full fine-tuning
adam_weight_decay: 1.0e-4 # AdamW weight decay
adam_betas: [0.9, 0.999] # AdamW betas
adam_epsilon: 1.0e-8 # AdamW epsilon
# EMA
ema_decay: 0.9 # EMA decay rate (0 to disable)
ema_update_interval: 4 # EMA update interval (in epochs)
# Scheduler Configuration
scheduler:
dynamics_type: "Flow-SDE" # Options: Flow-SDE, Dance-SDE, CPS, ODE
noise_level: 1.0 # Noise level for sampling
num_train_steps: 1 # Number of noise steps
train_steps: [1, 2, 3] # Custom noise window, noise steps are randomly selected from this list during training
seed: 42 # Scheduler seed (for noise step selection)
# Evaluation settings
eval:
resolution: 512 # Evaluation resolution
condition_image_size: [512, 512] # Max condition image resolution, int or [height, width]
auto_resize: true # Enable auto-resize to fit the condition images' aspect ratio during inference
guidance_scale: 4 # Guidance scale for sampling
num_inference_steps: 50 # Number of eval timesteps, 50 is recommended for better quality according to Qwen-Image-Edit official model card.
per_device_batch_size: 1 # Eval batch size
seed: 42 # Eval seed
eval_freq: 20 # Eval frequency in epochs (0 to disable)
# Reward Model Configuration
rewards:
- name: "visual_consistency"
reward_model: "flow_factory.rewards.my_reward.VisualConsistencyRewardModel" # path to custom reward model
batch_size: 16 # Batch size for reward model inference
device: "cuda"
dtype: bfloat16
# Optional Evaluation Reward Models
# eval_rewards:
# - name: "text_alignment"
# reward_model: "CLIP"
# batch_size: 16
# dtype: bfloat16
# device: "cuda"If you meet any issue, feel free to post here and I am happy to help🤗 |
Beta Was this translation helpful? Give feedback.
-
|
感谢您的详细回复,目前我想快速验证一下这个框架对于qwen-edit-2509 grpo lora rank64的可行性,数据集我打算先用您提供的dataset/sharegpt4o_image_mini,然后奖励单纯就先用pickscore,不知道这种能否观察到您项目中提到的reward mean上升曲线。 |
Beta Was this translation helpful? Give feedback.
-
|
这个组合我已经验证过了,我把试验曲线贴到这里:
还有一些Evaluation的例子,很明显可以看到图像是变得越来越符合PickScore的审美了。
配置也贴到这里: # Environment Configuration
launcher: "accelerate" # Options: accelerate
config_file: config/deepspeed/deepspeed_zero2.yaml # Use FSDP2 to shard the model as well. Switch to config/deepspeed/deepspeed_zero2.yaml if you have enough GPU memory.
num_processes: 8 # Number of processes to launch (overrides config file)
main_process_port: 29500
mixed_precision: "bf16" # Options: no, fp16, bf16
run_name: null # Run name (auto: {model_type}_{finetune_type}_{timestamp})
project: "Flow-Factory" # Project name for logging
logging_backend: "wandb" # Options: wandb, swanlab, none
# Data Configuration
data:
dataset_dir: "dataset/sharegpt4o_image_mini" # Path to dataset folder
preprocessing_batch_size: 8 # Batch size for preprocessing
dataloader_num_workers: 16 # Number of workers for DataLoader
enable_preprocess: true # Enable dataset preprocessing
force_reprocess: true # Force reprocessing of the dataset
cache_dir: "~/jcy/.cache/flow_factory/datasets" # Cache directory for preprocessed datasets
max_dataset_size: 1000 # Limit the maximum number of samples in the dataset
# Model Configuration
model:
finetune_type: 'lora' # Options: full, lora
lora_rank : 64
lora_alpha : 128
target_modules: "default" # Options: all, default, or list of module names like ["to_k", "to_q", "to_v", "to_out.0"]
model_name_or_path: "Qwen/Qwen-Image-Edit-2509" # Qwen/Qwen-Image-Edit-2509 or Qwen/Qwen-Image-Edit-2511
model_type: "qwen-image-edit-plus"
resume_path: null # Path to load previous checkpoint/lora adapter
resume_training_state: false # Whether to resume training state, only effective when resume_path is a directory with full checkpoint
log:
save_dir: "~/jcy/Flow-Factory" # Directory to save model checkpoints and logs
save_freq: 20 # Save frequency in epochs (0 to disable)
save_model_only: true # Save only the model weights (not optimizer, scheduler, etc.)
# Training Configuration
train:
# Training settings
trainer_type: 'grpo'
enable_gradient_checkpointing: false # Enable gradient checkpointing to save memory with extra compute
# Image settings
resolution: 384 # Can be int or [height, width]
auto_resize: true # Enable auto-resize to fit the condition images' aspect ratio during inference
condition_image_size: [512, 512] # Max condition image resolution, int or [height, width]
# Batch and sampling
per_device_batch_size: 1 # Qwen-Image-Edit-Plus accepts varying length of multi-image as condition, so the batch_size will always fallback to 1.
group_size: 16 # Group size for GRPO sampling
global_std: false # Use global std for advantage normalization
unique_sample_num_per_epoch: 48 # Unique samples per group
gradient_step_per_epoch: 2 # Gradient steps per epoch
# Clipping
clip_range: 1.0e-4 # PPO/GRPO clipping range
adv_clip_range: 5.0 # Advantage clipping range
max_grad_norm: 1.0 # Max gradient norm for clipping
# KL div
kl_type: 'v-based' # Options: 'x-based', 'v-based'
kl_beta: 0.04 # KL divergence beta
ref_param_device: 'same_as_model' # Options: cpu, same_as_model
# Denoising process
num_inference_steps: 10 # Number of timesteps
guidance_scale: 4 # Guidance scale for sampling
# Optimization
seed: 42 # Random seed
learning_rate: 3.0e-4 # Initial learning rate
adam_weight_decay: 1.0e-4 # AdamW weight decay
adam_betas: [0.9, 0.999] # AdamW betas
adam_epsilon: 1.0e-8 # AdamW epsilon
# EMA
ema_decay: 0.9 # EMA decay rate (0 to disable)
ema_update_interval: 4 # EMA update interval (in epochs)
# Scheduler Configuration
scheduler:
dynamics_type: "Flow-SDE" # Options: Flow-SDE, Dance-SDE, CPS, ODE
noise_level: 1.0 # Noise level for sampling
num_train_steps: 1 # Number of noise steps
train_steps: [1, 2, 3] # Custom noise window, noise steps are randomly selected from this list during training
seed: 42 # Scheduler seed (for noise step selection)
# Evaluation settings
eval:
resolution: 512 # Evaluation resolution
condition_image_size: [512, 512] # Max condition image resolution, int or [height, width]
auto_resize: true # Enable auto-resize to fit the condition images' aspect ratio during inference
guidance_scale: 4 # Guidance scale for sampling
num_inference_steps: 40 # Number of eval timesteps
per_device_batch_size: 1 # Eval batch size
seed: 42 # Eval seed
eval_freq: 20 # Eval frequency in epochs (0 to disable)
# Reward Model Configuration
rewards:
- name: "pick_score"
reward_model: "PickScore"
batch_size: 16
device: "cuda"
dtype: bfloat16当时我为了快,把训练分辨率设置在384,Eval的在512,和目前仓库中的example略有不同。因为分辨率较低,你可以看到step 0生成的图像有些扭曲,这好像是Qwen-Image-Edit在低分辨率自带的问题,不过RL在低分辨率能把这个问题修复过来。看到后面的训练图像都会好很多。这个问题在 X-GenGroup/PaCo-RL#2 也有讨论。你可以也试试把分辨率调低到512左右先跑,这样会快很多,可以很快验证能不能涨分 |
Beta Was this translation helpful? Give feedback.
-
|
太感谢您的宝贵经验了,关于分辨率这个问题,qwen-edit-2509这个模型好像有过拟合现象的(拟合官方推理代码中默认的1024*1024分辨率),推理时,用1024x1024的分辨率得到的结果会比512x512好很多。不过我对你提到的这个讨论,在低分辨率,例如512训练完之后,能极大得缓解qwen-edit-2509这个模型的这种过拟合现象(即在非1024分辨率下的表现)比较感兴趣,我看你的建议是训练和采样时的分辨率应该一致,然后推理时的分辨率应小于等于训练时的分辨率? |
Beta Was this translation helpful? Give feedback.
-
|
应该是:RL训练过程中 采样和优化时采用低分辨率(512),在推理时采用正常的高分辨率即可(如1024)。比如我上面给出的例子是在384下训练的+512上评估,可以看到后面在384上的表现会变好。评估时在512也涨分,我肯定在1024上也会变好。PaCo-RL这篇文章发现了这种机制——低分辨率RL的收益可以一并转移到高分辨率。这是一个非常简单有效的trick,可以显著加速训练。可能存在的问题就是太低分辨率导致图像细节的缺失会影响Reward Model打分,所以选一个差不多低的最好,但一定没必要跑1024训练. |
Beta Was this translation helpful? Give feedback.
-
|
好的,感谢,我会按照您的建议快速尝试一下,希望后面能继续和您交流 |
Beta Was this translation helpful? Give feedback.
-
|
我刚看到FLUX2-klein发布了,有4B和9B的两个版本,有统一的推理Pipeline,而且支持文生图,(多)图生图。很激动,我会尽快加上支持。后面这种多图任务就不用训练Qwen-Image-Edit-Plus和FLUX2-dev这两个大家伙了😃 |
Beta Was this translation helpful? Give feedback.
-
那真是太好了,感谢您为这个领域做出的贡献,我也希望能有小模型能快速验证算法。我之前有特意调研过能支持多图参考训练的模型,ominigen2其实有专门为支持多张参考图片设计对应的位置编码,然后模型大小只有7B(可惜配套的lora微调或者强化学习不如qwen-edit-2509完善),其他的能支持多图的模型,例如mosaic,dreamomniv2都是作者额外改进位置编码做训练才实现的,比较麻烦,并且都没有开源训练代码,无法很好得进行修改。之所以选择qwen-edit-2509,是因为这个基模性能更好(我希望能在某个方面能做出比闭源模型,例如nano pro相当或者更好的效果),同时训练框架,例如对lora,或者flow-grpo的支持更好,能更快速的搭建起来,避免深陷工程的泥塘里,所以您的项目对于我来说非常有意义。 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
我这边是8*140G最多18个小时。所以最远我也不知道能到多少。实际上到200步多你可以看到图像质量已经有降低了,会出现过拟合PickScore偏好的风格,这个时候基本上就可以手动停止了。在配置文件可以为训练和评估分别采用两种奖励模型,比如用两个不同美学模型,刚开始可能会两个都涨分,直到评估开始掉分时,峰值的checkpoint应该就是最好的了。 |
Beta Was this translation helpful? Give feedback.
-
|
这个框架对模型封装的层次和 |
Beta Was this translation helpful? Give feedback.
-
好的,感谢您的建议。 |
Beta Was this translation helpful? Give feedback.
-
我没有太多构建大型项目的经验,不过我认为可以先尝试一下工作量,仅先支持最热门的,并且diffusers还没支持的试试(尽量避免破坏原有的可拓展性)。我实际的科研经验来看,很多事可能很难一开始就想的特别清楚。 |
Beta Was this translation helpful? Give feedback.
-
|
您好,我在训练模型支持更多参考图片的时候,发现如果训练和测试的分辨率都为512,group size为16,最多支持4张参考图片的训练,如果降低group size到8,可以支持最多到6张参考图片的训练。不过这个group size从16降低到8,对性能影响会有多大呢?希望能从您这得到一些经验。我是希望模型能够在更多参考图片的情况下仍然保持一个不错的性能,我不确定训练时最大参考图片为4的情况下,测试时泛化到7-8张参考图片的效果是否会好。所以面临一个参考图片数量和group size的权衡,分辨率目前不太能降低,担心在384的情况下训出来的模型在1024测试下表现不好,毕竟差距有点大。 |
Beta Was this translation helpful? Give feedback.
-
|
我不太明白为什么(分辨率+参考图像数量)和group_size会有冲突。分辨率+参考图像数量影响显存占用,而group_size不管多大都不影响峰值显存。但是增大group_size需要更多时间采样,训练效率更低,分辨率+参考图像数量也会影响训练效率。所以我猜你想表达的瓶颈应该是72h的训练时间。如果是训练时间的话,其实也不用很担心,可以先训72h,然后再开一个训练直接加载之前的检查点接着训练就行了。 |
Beta Was this translation helpful? Give feedback.
-
|
我尝试用了fsdp2.yaml,这样确实节省了很多显存,能够在group size为16 然后6张参考图片的情况下运行起来,感谢您的建议。我这边还碰到一个情况就是,我尝试在多张参考图片的情况下,训练模型保持风格和主体的一致性,然而我的训练数据中,并非每一个样例都有风格参考,但每一个样例都有主体参考。因此在设计关于风格的奖励函数方面,我目前打算只对训练样本中含有风格参考图片的才去计算风格奖励,不包含风格参考图片的训练样本不参与奖励计算。但我还不太确定这是否有什么隐患,想听听您的建议。 |
Beta Was this translation helpful? Give feedback.
-
这个可能取决于你具体加权的方式了。因为 @Weistrass 此外,我刚刚上线了FLUX2-klein的支持。大概试了一下,训练非常快,而且占用显存很少。4B模型的占用不到24G。模型性能肉眼可见感觉很好。我感觉140G*8能训练LoRA的话,跑满10张参考图像完全没问题,甚至可以试试全参+ 我目前简单验证了一下
中间那一部分drop可能是学习率过大,或者噪声过大导致的不稳定,但整体上涨非常迅速。目前参数都是用的我经验上给出的通用配置,对于具体任务可能还需要具体调整。 |
Beta Was this translation helpful? Give feedback.
-
|
好的,感谢,我这边会试试flux.2 kelin的。然后您说的那个不参考奖励计算的样例,取均值还是置零这个确实是个问题,我之前没有考虑到这点。所以现在把所有训练数据都设置成同时都有主体和风格的,都参与奖励计算了,避免了这种偏差问题。但我现在在训练过程中,先加载了我之前基于qwen-edit-2511训练了4个epoch的sft版本的lora权重,再这个基础上来继续在grpo进行lora训练,发现一开始的eval效果并不好,这点很奇怪,毕竟我微调了4个epoch后,模型的性能并不差,我只是想借助grpo再看看还有没有提升,结果反而一开始就表现很差,感觉都没有在sft版本的基础上进行学习。我sft lora用的是https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/qwen_image/model_training/lora/Qwen-Image-Edit-2511.sh的代码,我发现这边lora是只需设置lora_rank,并没有指定您这边在yaml中使用到的lora_alpha参数,所以不确定是不是这个原因。
希望能得到您的解答,我之所以这么做,是想看看使用sft冷启动后,再grpo,效果会不会更好,但从目前来看,并没有。
|
Beta Was this translation helpful? Give feedback.
-
|
按理来说sft冷启动后再rl是标准做法,效果也应该会好。 你在sft微调4个epoch后有做过eval吗?和RL第0步给出的eval结果一致吗?如果一致的话,应该不是加载的问题,可能就是sft没训好。如果不一致,可能是模型加载的哪里出了问题,我这里加载LoRA采用是和
Flow-Factory/src/flow_factory/models/abc.py Lines 1056 to 1082 in 59eb12a
Flow-Factory/src/flow_factory/models/abc.py Lines 1349 to 1356 in 59eb12a 关于 DIffSynth-Studio采用了 对了,eval的话得看 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
这里其实就是说明加载LoRA的时候权重没匹配上。 这个base_model.model是 |
Beta Was this translation helpful? Give feedback.
-
|
@Weistrass 我大概分析了一下,Diffsynth-Studio保存LoRA的格式和
格式为transformer.网络层.lora_A/B.适配器名.weight 相应的 格式为base_model.model.网络层.lora_A/B.weight 如下是一个两种格式相互转换的脚本,找了几个例子自己测试了一下,你可以参考参考。总体来说Peft格式保存的信息更全一些,我后面也会在Flow-Factory保留这种格式。我还需要进一步研究一下DiffSynth-Studio和diffusers的源码,尽可能把他们保存的checkpoint也能兼容进来。你可以拿下面的 import torch
import os
import json
from safetensors.torch import save_file, load_file
import torch
from peft import LoraConfig
import re
def peft_to_diffusers(peft_model_path, output_file, prefix="transformer"):
"""
将 PEFT 格式转为 Diffusers 格式
prefix: FLUX 模型通常使用 'transformer',SDXL 使用 'unet' 或 'text_encoder'
"""
peft_state_dict = load_file(os.path.join(peft_model_path, "adapter_model.safetensors"))
new_state_dict = {}
for k, v in peft_state_dict.items():
# 1. 移除 PEFT 的固定前缀 base_model.model.
new_key = k.replace("base_model.model.", "")
new_key = new_key.replace(".lora_A.weight", ".lora_A.weight")
new_key = new_key.replace(".lora_B.weight", ".lora_B.weight")
# 2. 加上 Diffusers 加载器识别组件的前缀
# 如果 new_key 已经有了前缀则不再重复添加
if prefix and not new_key.startswith(f"{prefix}."):
new_key = f"{prefix}.{new_key}"
new_state_dict[new_key] = v
save_file(new_state_dict, output_file)
print(f"Successfully converted PEFT to Diffusers with prefix '{prefix}': {output_file}")
def diffusers_to_peft(diffusers_file, output_dir, prefix="transformer", target_modules=None, r=None):
"""
将 Diffusers 单文件转为 PEFT 文件夹
"""
os.makedirs(output_dir, exist_ok=True)
diffusers_state_dict = load_file(diffusers_file)
peft_state_dict = {}
detected_r = 0
for k, v in diffusers_state_dict.items():
# 1. 移除 Diffusers 的组件前缀 (如 'transformer.')
new_key = k
if prefix and k.startswith(f"{prefix}."):
new_key = k.replace(f"{prefix}.", "")
# 2. 加上 PEFT 的固定前缀
peft_key = f"base_model.model.{new_key}"
peft_state_dict[peft_key] = v
# 3. 自动检测 Rank (从 lora_A 的形状推断)
if r is None and "lora_A.weight" in k:
detected_r = v.shape[0]
final_r = r if r is not None else (detected_r if detected_r > 0 else 64)
save_file(peft_state_dict, os.path.join(output_dir, "adapter_model.safetensors"))
# 写入 config
config = {
"peft_type": "LORA",
"r": final_r,
"lora_alpha": final_r * 2, # 常见的 alpha 设定
"target_modules": target_modules or [],
"lora_dropout": 0.0,
"bias": "none",
"inference_mode": True,
"base_model_name_or_path": None
}
with open(os.path.join(output_dir, "adapter_config.json"), "w") as f:
json.dump(config, f, indent=4)
print(f"Successfully converted Diffusers to PEFT (Rank: {final_r})")
def diffusers_to_peft_auto(diffusers_file, output_dir, prefix=None):
os.makedirs(output_dir, exist_ok=True)
diffusers_state_dict = load_file(diffusers_file)
first_key = list(diffusers_state_dict.keys())[0]
# 尝试自动推断 prefix
if prefix is None:
if first_key.startswith("transformer."):
prefix = "transformer"
elif first_key.startswith("unet."):
prefix = "unet"
else:
prefix = ""
peft_state_dict = {}
full_module_paths = set() # 存储完整路径
detected_r = None
# 匹配 transformer.xxx.lora_A.weight
pattern = re.compile(r"^(?:" + prefix + r"\.)?(.*)\.lora_[AB](?:\.[^.]+)?\.weight$")
for k, v in diffusers_state_dict.items():
match = pattern.match(k)
if match:
module_full_path = match.group(1) # 例如 single_blocks.0.attn.to_out.0
full_module_paths.add(module_full_path)
if ".lora_A" in k and detected_r is None:
detected_r = v.shape[0]
# 转换为 PEFT 键名
new_key = k.replace(f"{prefix}.", "") if prefix and k.startswith(f"{prefix}.") else k
if ".lora_A.default.weight" in new_key:
new_key = new_key.replace(".lora_A.default.weight", ".lora_A.weight")
elif ".lora_B.default.weight" in new_key:
new_key = new_key.replace(".lora_B.default.weight", ".lora_B.weight")
peft_state_dict[f"base_model.model.{new_key}"] = v
save_file(peft_state_dict, os.path.join(output_dir, "adapter_model.safetensors"))
rank = detected_r if detected_r else 64
config = {
"peft_type": "LORA",
"r": rank,
"lora_alpha": rank, # or rank * 2
# 关键修改:使用完整路径列表,避免模糊匹配到容器模块
"target_modules": sorted(list(full_module_paths)),
"lora_dropout": 0.0,
"bias": "none",
"inference_mode": True,
"base_model_name_or_path": None,
"init_lora_weights": True
}
with open(os.path.join(output_dir, "adapter_config.json"), "w") as f:
json.dump(config, f, indent=4)
print(f"✅ 转换成功!")
print(f"精确匹配到 {len(config['target_modules'])} 个线性层路径")
return config
def test_p2d(peft_lora_path, diffusers_output):
# Peft -> diffusers
peft_to_diffusers(
peft_lora_path,
diffusers_output,
)
pipeline.load_lora_weights(diffusers_output)
print(list([k for k in pipeline.transformer.state_dict().keys() if 'lora' in k][:10]))
def test_d2p(pipeline, diffusers_output, peft_output):
# # 2. Diffusers -> PEFT
# target_modules = [......]
# # 手动输入target modules,自动检测lora_rank
# diffusers_to_peft(
# diffusers_file=diffusers_output,
# output_dir=peft_output,
# target_modules=target_modules,
# )
# 自动检测target modules & Lora rank
diffusers_to_peft_auto(
diffusers_file=diffusers_output,
output_dir=peft_output,
)
# 然后用 PEFT 加载
from peft import PeftModel
transformer = PeftModel.from_pretrained(
pipeline.transformer,
peft_output
)
print(list([k for k in transformer.state_dict().keys() if 'lora' in k][:10]))
from diffusers import Flux2KleinPipeline
model = 'black-forest-labs/FLUX.2-klein-base-4B'
pipeline = Flux2KleinPipeline.from_pretrained(model, torch_dtype=torch.bfloat16)
peft_lora_path = 'flux2/checkpoint-0/'
diffusers_output = 'flux2/diff_checkpoint.safetensors'
peft_output = 'flux2/temp_peft_checkpoint' # Should be the same as peft_lora_path
def test1():
test_p2d(peft_lora_path=peft_lora_path, diffusers_output=diffusers_output)
def test2():
test_d2p(pipeline=pipeline, diffusers_output=diffusers_output, peft_output=peft_output)
from diffusers import QwenImageEditPlusPipeline
model = 'Qwen/Qwen-Image-Edit-2509'
pipeline = QwenImageEditPlusPipeline.from_pretrained(model, torch_dtype=torch.bfloat16)
diffusers_output = 'qwen/epoch-4.safetensors'
peft_output = 'qwen/temp_peft'
diffusers_output_2 = 'qwen/epoch-4_new.safetensors'
diff_weight = load_file(diffusers_output)
# print(list(diff_weight.keys())[:10])
def test3():
test_d2p(pipeline=pipeline, diffusers_output=diffusers_output, peft_output=peft_output)
def test4():
test_p2d(peft_lora_path=peft_lora_path, diffusers_output=diffusers_output_2) |
Beta Was this translation helpful? Give feedback.
-
|
十分感谢您提供的建议,我会立即尝试一下,祝您一切顺利! |
Beta Was this translation helpful? Give feedback.
-
您提供的转换代码有效,十分感谢,将diffuser格式的.safetensor转成peft版本后,模型在grpo训练前load进来,保持住了sft训练时的效果 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
我写代码时是基于diffusers的QwenImageEditPlusPipeline。在diffusers中没有提供这个参数,我看了一下Diffsyth-Studio有这个参数。这个参数好像是控制一致性之类的,在timestep上做了一些特殊处理。你可以对应看看DiffSynth-Studio的推理代码,修改这边 |
Beta Was this translation helpful? Give feedback.
-
好的,感谢,不过我最近发现虽然我把DiffSynth训练的关于sft lora的权重通过您之前提供的转换代码成功转成peft版本后,在flow-factory加载时,跑出来的结果仍然会有差异(合并权重时没有报warning警告了,并且在flow-factory中成功显示load进Lora权重),但可能由于您使用的时diffuser默认的推理框架,而DiffSynth则是自己搭建的推理框架:https://github.com/modelscope/DiffSynth-Studio/blob/main/diffsynth/pipelines/qwen_image.py 那么DiffSynth这边的权重训练好后,成功转换成对应的peft版本,flow-factory虽然能够成功加载,但也可能因为推理流程的不同,导致结果出现差异。 |
Beta Was this translation helpful? Give feedback.
-
|
最新的代码框架是支持加载单个safetensors文件的,你可以试试直接加载sft之后的safetensor,指定
这个是diffusers pipeline和DiffSynth-Studio在推理中的略微差异了,比如你遇到的qwen-image-edit-2511引入新的参数
模型加载权重如果没问题的话,结果不应该有很大的不一致。可以把相同种子相同输入的对比结果放到这里看看嘛?不知道数据是否方便公开。
如果基模本身具有一定的能力是可以直接上RL的。SFT的作用主要就是激活基模本身没有的能力。关于flow-factory的推理代码,我会更新一版上来供大家参考。 |
Beta Was this translation helpful? Give feedback.
-
|
好的,感谢您的建议,我会更新一下代码,直接加载sft之后的单个safetensors文件,并看看相同的种子,同样的输入的对比结果(按照您的说法,这两个框架推理差异不应该很大)如果还有问题,我会把测试的对比结果和您分享一下。感谢您的高效回复!🥰 |
Beta Was this translation helpful? Give feedback.
-
您好,我这边测试了一下,还是有些结果不太一样,给您发了google邮件,文档里面有对应的cases可以参考一下,如果需要,我可以把我写的在flow-factory下的推理代码也发给您,我想知道这种差异是否是正常情况🤔。感谢! |
Beta Was this translation helpful? Give feedback.















Uh oh!
There was an error while loading. Please reload this page.
-
Hi, thanks for sharing this great repository.
I noticed that your repo supports GRPO training for Qwen-Image-Edit-2509, and that it allows choosing between LoRA and full-parameter training modes. I have a few questions regarding training setup and customization:
Multi-reference image training
(1) I would like to train the model to generate a target image conditioned on 4–5 reference images simultaneously. In this case, do you think a configuration like 8 × 140 is sufficient for either LoRA or full training?
(2) Are there any practical differences in feasibility or stability between LoRA and full training for this multi-reference setting?
Configuration considerations
For the above setup, which configuration aspects should I pay special attention to? For example:
(1) Image resolution / sequence length
(2) GRPO-specific hyperparameters (e.g., rollout length, reward normalization)
(3) Any model- or data-related constraints specific to Qwen-Image-Edit-2509 or 2511
Adding custom reward functions:
(1) If I want to add custom reward functions (e.g., for reference consistency or visual alignment). Which part of the codebase should I modify or extend?
(2) Is there a recommended interface or example for registering new reward functions in the GRPO pipeline?
Thanks a lot for your time and for open-sourcing this work. Any guidance would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions