-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hi! I try to run the LiveCodeBench evaluation scripts in instruct/code_eval/lcb/ and it yields weird results.
# Result: 0.37888198757763975
CUDA_VISIBLE_DEVICES=0 python run_lcb.py \
--n 1 \
--difficulty easy \
--model Dream-org/Dream-v0-Instruct-7B \
--use_instruct_prompt \
--diffusion_steps 512 \
--max_new_tokens 1024 \
--evaluate \
--diffusion_remask_alg maskgit_plus \
--temperature 0.1 \
--use_cache
# Result: 0.3695652173913043
CUDA_VISIBLE_DEVICES=0 python run_lcb.py \
--n 1 \
--difficulty easy \
--model Dream-org/Dream-Coder-v0-Instruct-7B \
--use_instruct_prompt \
--diffusion_steps 512 \
--max_new_tokens 1024 \
--evaluate \
--diffusion_remask_alg maskgit_plus \
--temperature 0.1 \
--use_cacheDo the base and code models yielding identical results indicate expected behavior, or is there an issue with the evaluation scripts or released checkpoint?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels