The training performance of basic MLLM RLVR configuration is slow

I ran your basic MLLM RLVR configuration `examples/qwen2.5-vl-7B-rlvr/rlvr_megatron.yaml`, but the training performance was almost stand, swanlab url:
https://swanlab.cn/@canghongjian/web_public/runs/n5yko5po0dgktwbl9zs0d/chart


while the test performance also has no change after training:
|                      | RefCOCO_test | CountBenchQA | RefCOCO_g_test | RefCOCO_plus_test | MathVista_MINI | MathVerse_MINI | CountQA_test |
|----------------------|--------------|--------------|----------------|-------------------|----------------|----------------|--------------|
| Qwen2_5_VL_7B_Instruct | 0.8992       | 0.8789       | 0.8624         | 0.8113            | 0.6850         | 0.4508         | 0.2075       |
| vlm_roll_rlvr_ckpt_20 | 0.8976       | 0.8727       | 0.8610         | 0.8089            | 0.6950         | 0.4114         | 0.2055       |
| vlm_roll_rlvr_ckpt_100 | 0.8970       | 0.8645       | 0.8606         | 0.8147            | 0.6940         | 0.4363         | 0.2016       |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training performance of basic MLLM RLVR configuration is slow #310

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	RefCOCO_test	CountBenchQA	RefCOCO_g_test	RefCOCO_plus_test	MathVista_MINI	MathVerse_MINI	CountQA_test
Qwen2_5_VL_7B_Instruct	0.8992	0.8789	0.8624	0.8113	0.6850	0.4508	0.2075
vlm_roll_rlvr_ckpt_20	0.8976	0.8727	0.8610	0.8089	0.6950	0.4114	0.2055
vlm_roll_rlvr_ckpt_100	0.8970	0.8645	0.8606	0.8147	0.6940	0.4363	0.2016

The training performance of basic MLLM RLVR configuration is slow #310

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions