Skip to content

Performance for LivecodeBench #13

@leonardodalinky

Description

@leonardodalinky

Hi! I try to run the LiveCodeBench evaluation scripts in instruct/code_eval/lcb/ and it yields weird results.

# Result: 0.37888198757763975
CUDA_VISIBLE_DEVICES=0 python run_lcb.py \
    --n 1 \
    --difficulty easy \
    --model Dream-org/Dream-v0-Instruct-7B \
    --use_instruct_prompt \
    --diffusion_steps 512 \
    --max_new_tokens 1024 \
    --evaluate \
    --diffusion_remask_alg maskgit_plus \
    --temperature 0.1 \
    --use_cache


# Result: 0.3695652173913043
CUDA_VISIBLE_DEVICES=0 python run_lcb.py \
    --n 1 \
    --difficulty easy \
    --model Dream-org/Dream-Coder-v0-Instruct-7B \
    --use_instruct_prompt \
    --diffusion_steps 512 \
    --max_new_tokens 1024 \
    --evaluate \
    --diffusion_remask_alg maskgit_plus \
    --temperature 0.1 \
    --use_cache

Do the base and code models yielding identical results indicate expected behavior, or is there an issue with the evaluation scripts or released checkpoint?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions