support for "Top-k filtering strategy"?

Could u consider to support this strategy introduced in [Enhancing Spatial Understanding in Image Generation via Reward Modeling](https://github.com/DAGroup-PKU/SpatialReward)? In the paper, the say this could stablize the training and boost convergency.

It might be a great feature if implemented in both `GroupwiseRewardModel` and `PointwiseRewardModel`.
for `GroupwiseRewardModel`, we should filter the middle samples of one prompt (condition) in one entire epoch. for  `PointwiseRewardModel`,we could filter the middle samples for samples in one batch or one entire epoch using pointwise scores.

such strategy might work for all kinds of tasks and trainner type.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for "Top-k filtering strategy"? #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

support for "Top-k filtering strategy"? #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions