Missing thinking tokens flag for no budget forcing eval#100
Missing thinking tokens flag for no budget forcing eval#100kothasuhas wants to merge 2 commits intosimplescaling:mainfrom
Conversation
|
Thanks a lot for the PR!!
Line 31 in 73dc02f I So I guess to get the think to still be appended but not budget force you could do sth like
though I haven't tested this |
|
Makes sense, I tested locally and making the length longer works as well, committed that change |
The no budget forcing evaluation is missing a "max_thinking_tokens" flag. Since the model released on huggingface does not naturally use a thinking token at the start of its response, this results in degraded performance (around 20% lower than the original paper on aime24_nofigures). Adding max_thinking_tokens=auto results in the performance reported in the original paper for AIME 24.