Skip to content

[PZ COMPETITION] yuanboyang final#167

Open
JokerYang0703 wants to merge 2 commits intoFlagAI-Open:mainfrom
JokerYang0703:add-final-code
Open

[PZ COMPETITION] yuanboyang final#167
JokerYang0703 wants to merge 2 commits intoFlagAI-Open:mainfrom
JokerYang0703:add-final-code

Conversation

@JokerYang0703
Copy link

No description provided.

Signed-off-by: yuanboyang <951569335@qq.com>
Signed-off-by: yuanboyang <951569335@qq.com>
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @JokerYang0703, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在为竞赛准备最终代码,主要侧重于更新数据处理、模型加载配置以及verl框架内的特定奖励逻辑。这些更改可能旨在优化或调整大型语言模型(LLM)以更好地完成数学推理任务。

Highlights

  • 数据处理流程增强: 引入了新的download.py脚本,用于将ModelScope上的"Big-Math-RL-Verified-Processed"数据集处理并转换为verl兼容的Parquet格式。
  • GSM8k数据集集成更新: 修改了gsm8k.py数据预处理脚本,将数据源从openai/gsm8k (Hugging Face datasets) 切换到modelscope/gsm8k (ModelScope MsDataset)。
  • 提示模板标准化: 更新了数学推理任务的指令提示,强制使用\boxed{}格式输出最终答案。
  • trust_remote_code配置简化: 在verl的多个组件(模型合并器、训练器、FSDP worker和配置文件)中将trust_remote_code硬编码为True,以简化模型加载流程。
  • 奖励函数调整: 修改了geo3k.py中奖励函数的正则表达式模式,移除了对.*的匹配要求,简化了匹配逻辑。
  • VLLM版本降级: 在安装脚本中将vllm依赖版本从0.8.5.post1降级到0.8.2。
  • 增加Rollout响应数量: 将rollout.yaml中的n参数从1更改为8,增加了Rollout过程中生成的响应数量。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个拉取请求似乎是“yuanboyang final”竞赛提交的最终代码。它包括了运行完整代码库所需的所有文件:修改后的 verl 源码、数据下载脚本、环境依赖文件以及详细的 README.md 说明文档。主要改动包括:

  1. 修改数据处理脚本以从 ModelScope 加载数据集,并调整了 prompt 格式。
  2. 在多个文件中硬编码了 trust_remote_code=True,这是一个严重的安全隐患。
  3. README.md 文件提供了详细的设置和运行指南,但包含了一些硬编码的路径和不推荐的操作(如直接修改 site-packages 中的文件),这会影响代码的可移植性和可维护性。

我的审查意见主要集中在提高代码的安全性、可维护性和可复现性方面。

Comment on lines +295 to +297
model = auto_model_class.from_config(
self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=True
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

在此处硬编码 trust_remote_code=True 存在严重的安全风险。它允许从 Hugging Face Hub 执行任意代码,这可能导致恶意代码的执行。此选项应作为可配置参数,并默认设置为 False,仅在完全信任模型来源时才由用户显式启用。

Suggested change
model = auto_model_class.from_config(
self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = auto_model_class.from_config(
self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=self.config.trust_remote_code
)

# Instantiate the tokenizer and processor.
from verl.utils import hf_processor, hf_tokenizer

trust_remote_code = True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

在此处硬编码 trust_remote_code = True 存在严重的安全风险,因为它允许从模型中心执行任意代码。这应该是一个可配置的选项,并且默认情况下应为 False,以防止潜在的恶意代码攻击。

Suggested change
trust_remote_code = True
trust_remote_code = config.data.get("trust_remote_code", False)

Comment on lines +342 to +347
actor_module = actor_module_class.from_pretrained(
pretrained_model_name_or_path=local_path,
torch_dtype=torch_dtype,
config=actor_model_config,
trust_remote_code=True,
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

在此处硬编码 trust_remote_code=True 是一个严重的安全漏洞。它允许模型仓库中的任意代码在您的机器上执行。强烈建议将此值作为配置选项,并默认禁用,仅在您完全信任代码来源时才启用。

Suggested change
actor_module = actor_module_class.from_pretrained(
pretrained_model_name_or_path=local_path,
torch_dtype=torch_dtype,
config=actor_model_config,
trust_remote_code=True,
)
actor_module = actor_module_class.from_pretrained(
pretrained_model_name_or_path=local_path,
torch_dtype=torch_dtype,
config=actor_model_config,
trust_remote_code=trust_remote_code,
)

Comment on lines +95 to +102
### 基于 [transformers](https://github.com/huggingface/transformers) 源码的修改
- 修改文件:
- `/root/miniconda3/envs/verl/lib/python3.10/site-packages/transformers/configuration_utils.py`
- 修改内容:
- 将第 917 行改为:
```python
json.dumps(config_dict, indent=2, sort_keys=False) + "\n"
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

直接修改 site-packages 中的库文件是一种不良实践。这会使环境变得脆弱且难以复现。如果其他人尝试设置此项目,他们可能会忘记手动应用此补丁,导致行为不一致或出错。更好的方法是 fork transformers 仓库,应用您的修改,然后从您的 fork 安装,或者使用 .patch 文件和脚本来应用更改。

Comment on lines +114 to +139
nohup env PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
algorithm.adv_estimator=grpo \
data.train_files=/usr/train3.parquet \ # 需要自己修改位置
data.train_batch_size=264 \
data.max_prompt_length=2048 \
data.max_response_length=512 \
actor_rollout_ref.model.path=/root/.cache/modelscope/hub/models/BAAI/OpenSeek-Small-v1-SFT \ # 需要自己修改位置
actor_rollout_ref.actor.optim.lr=1e-5 \
actor_rollout_ref.actor.ppo_mini_batch_size=72 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.rollout.name=vllm \
+actor_rollout_ref.actor.fsdp_config.model_dtype=bf16 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
trainer.logger=tensorboard \
trainer.val_before_train=True \
trainer.n_gpus_per_node=6 \
trainer.nnodes=1 \
trainer.save_freq=200 \
trainer.test_freq=10 \
trainer.total_epochs=15 \
data.val_files=$HOME/data/gsm8k/test.parquet \
actor_rollout_ref.rollout.n=6 \
> train.log 2>&1 &

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

运行指令中包含了多个硬编码的路径(例如 /usr/train3.parquet, /root/.cache/...)。这使得脚本在不同环境中难以直接运行。建议使用环境变量或命令行参数来传递这些路径,以提高脚本的可移植性。

例如:

# 在您的脚本或环境中设置环境变量
export TRAIN_FILES=/path/to/your/train3.parquet
export MODEL_PATH=/path/to/your/model

# 然后在命令中使用它们
nohup env PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
  data.train_files=$TRAIN_FILES \
  actor_rollout_ref.model.path=$MODEL_PATH \
  ...

dataset_name = 'open-r1/Big-Math-RL-Verified-Processed'
subset_name = 'all'
split = 'train'
data_source_name = "Big-Math" # 用于在数据中标记来源

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

变量 data_source_name 已被定义但从未在代码中使用。建议移除未使用的变量以保持代码整洁。


# 组装成最终的数据结构
processed_data = {
"data_source": 'hiyouga/geometry3k',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这里的 data_source 硬编码为 'hiyouga/geometry3k',但加载的数据集是 'open-r1/Big-Math-RL-Verified-Processed'。这似乎不一致,并且可能是一个错误,正如 README.md 中所提到的。为了清晰和避免潜在的错误,建议将此值作为参数或常量进行管理,并确保其与数据处理逻辑一致。

Suggested change
"data_source": 'hiyouga/geometry3k',
"data_source": data_source_name,

train_dataset = MsDataset.load('modelscope/gsm8k', subset_name='main', split='train',trust_remote_code=True)
test_dataset = MsDataset.load('modelscope/gsm8k', subset_name='main', split='test',trust_remote_code=True)

instruction_following = instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在这一行中,instruction_followinginstruction 被赋予了相同的值,但 instruction_following 在后续代码中并未使用。这会造成代码冗余。建议只保留 instruction 的赋值。

Suggested change
instruction_following = instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:'
instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:'

@ftgreat ftgreat changed the title yuanboyang final [PZ COMPETITION] yuanboyang final Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant