Skip to content

Unofficial Evaluation for LLaDA and LLaDA1.5 with LM-eval

License

Notifications You must be signed in to change notification settings

preordinary/LLaDA

Repository files navigation

Evaluation for LLaDA and LLaDA1.5

This repository provides an unofficial evaluation implementation for LLaDA, based on the lm-evaluation-harness.

⚠️ Disclaimer: Since the official evaluation based on lm-eval are not yet available, the results presented below are based on independent testing conducted on my own equipment. They may not fully represent the model's official performance capabilities.

⚙️ Environment

  • Hardware: NVIDIA A100 GPU
  • Software:
    • torch == 2.5.1
    • transformers == 4.57.1

🚀 Quick Start

1. Sanity Check

First, run the test script to ensure the environment is set up correctly and the model can generate samples:

python chat.py

2. Run Evaluation

Execute the shell script to start the evaluation process For LLaDA-Instruct:

bash eval_LLaDA.sh

For LLaDA-1.5:

bash eval_LLaDA1p5.sh

⚠️ Post-processing & Logs

  • Log Samples: You must enable the log_samples option, as the final metrics rely heavily on Python post-processing of these logs.

  • Data Management: The post-processing script calculates the average accuracy based on ALL .jsonl files found in the current result directory

    • Recommendation: Before starting a new run, please delete old JSONL files or specify a new output directory to avoid mixing results from different experiments.

📊 Evaluation Results

Model Len HumanEval:Acc MBPP:Acc GSM8K:Acc MATH500:Acc
LLaDA-Instruct 256 38.7 36.9 77.4 33.8
512 43.9 38.2 81.3 37.7
1024 44.6 37.4 82.3 39.4
LLaDA-1.5 256 38.4 38.6 79.2 33.4
512 45.1 37.6 82.9 38.6
1024 45.7 37.4 82.5 39.6

🙌 Acknowledgements

This project is built upon the open-source repository daedal. Special thanks to the author for their contributions.

About

Unofficial Evaluation for LLaDA and LLaDA1.5 with LM-eval

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published