-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Could u consider to support this strategy introduced in Enhancing Spatial Understanding in Image Generation via Reward Modeling? In the paper, the say this could stablize the training and boost convergency.
It might be a great feature if implemented in both GroupwiseRewardModel and PointwiseRewardModel.
for GroupwiseRewardModel, we should filter the middle samples of one prompt (condition) in one entire epoch. for PointwiseRewardModel,we could filter the middle samples for samples in one batch or one entire epoch using pointwise scores.
such strategy might work for all kinds of tasks and trainner type.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels