Skip to content

Questions about reproducing training results with default configuration #13

@li-tianqi

Description

@li-tianqi

Reproduction Environment

  • GPU: H20*4
  • Configuration: Codebase default settings
  • Dataset: First 15k SAT samples (as per default config)

My Results

Figure 1. My training curve
Image

Figure 2. My test curve
Image

  • My reproduction (base+GRPO): 58.4 (step 1000)
  • Qwen2-VL instruct model: 61.6

Results from report

Figure 3. Test curve from report
Image

Key Questions

  1. Performance gap (58.4 vs ~59.5) between my reproduction and reported results.
  2. Inconsistent qwen2-VL instruct model performance (61.6 locally vs ~56 in report).
  3. Abnormal trend in reproduced SFT curve and GRPO curve (Figure 2) .
  4. Why does default config only use first 15k SAT samples instead of full dataset?
  5. According to your experience, what are the possible reasons for the abnormal reproduction results?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions