Hi, @Md-Ashraful-Pramanik @rizwan09
As I am very interested in this work, I am currently replicating this experiment. I have encountered two problems and hope to receive your guidance.
1. When I reproduced the experiment using the codesim method, the model sometimes output some prompt words, resulting in a relatively low accuracy rate. The model I used was the locally deployed Qwen2.5-7B-Instruct, and I employed vllm to accelerate reasoning. Below is a screenshot of Result.jsonl

2. **Whether or not vllm is used, there will always be a large amount of repetitive information in codesim. Is this normal?**
