Skip to content

support for "Top-k filtering strategy"? #53

@MengHao666

Description

@MengHao666

Could u consider to support this strategy introduced in Enhancing Spatial Understanding in Image Generation via Reward Modeling? In the paper, the say this could stablize the training and boost convergency.

It might be a great feature if implemented in both GroupwiseRewardModel and PointwiseRewardModel.
for GroupwiseRewardModel, we should filter the middle samples of one prompt (condition) in one entire epoch. for PointwiseRewardModel,we could filter the middle samples for samples in one batch or one entire epoch using pointwise scores.

such strategy might work for all kinds of tasks and trainner type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions