[PZ COMPETITION] yuanboyang final#167
[PZ COMPETITION] yuanboyang final#167JokerYang0703 wants to merge 2 commits intoFlagAI-Open:mainfrom
Conversation
Signed-off-by: yuanboyang <951569335@qq.com>
Signed-off-by: yuanboyang <951569335@qq.com>
There was a problem hiding this comment.
Summary of Changes
Hello @JokerYang0703, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
此拉取请求旨在为竞赛准备最终代码,主要侧重于更新数据处理、模型加载配置以及verl框架内的特定奖励逻辑。这些更改可能旨在优化或调整大型语言模型(LLM)以更好地完成数学推理任务。
Highlights
- 数据处理流程增强: 引入了新的download.py脚本,用于将ModelScope上的"Big-Math-RL-Verified-Processed"数据集处理并转换为verl兼容的Parquet格式。
- GSM8k数据集集成更新: 修改了gsm8k.py数据预处理脚本,将数据源从openai/gsm8k (Hugging Face datasets) 切换到modelscope/gsm8k (ModelScope MsDataset)。
- 提示模板标准化: 更新了数学推理任务的指令提示,强制使用\boxed{}格式输出最终答案。
- trust_remote_code配置简化: 在verl的多个组件(模型合并器、训练器、FSDP worker和配置文件)中将trust_remote_code硬编码为True,以简化模型加载流程。
- 奖励函数调整: 修改了geo3k.py中奖励函数的正则表达式模式,移除了对.*的匹配要求,简化了匹配逻辑。
- VLLM版本降级: 在安装脚本中将vllm依赖版本从0.8.5.post1降级到0.8.2。
- 增加Rollout响应数量: 将rollout.yaml中的n参数从1更改为8,增加了Rollout过程中生成的响应数量。
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
这个拉取请求似乎是“yuanboyang final”竞赛提交的最终代码。它包括了运行完整代码库所需的所有文件:修改后的 verl 源码、数据下载脚本、环境依赖文件以及详细的 README.md 说明文档。主要改动包括:
- 修改数据处理脚本以从 ModelScope 加载数据集,并调整了 prompt 格式。
- 在多个文件中硬编码了
trust_remote_code=True,这是一个严重的安全隐患。 README.md文件提供了详细的设置和运行指南,但包含了一些硬编码的路径和不推荐的操作(如直接修改site-packages中的文件),这会影响代码的可移植性和可维护性。
我的审查意见主要集中在提高代码的安全性、可维护性和可复现性方面。
| model = auto_model_class.from_config( | ||
| self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=True | ||
| ) |
There was a problem hiding this comment.
在此处硬编码 trust_remote_code=True 存在严重的安全风险。它允许从 Hugging Face Hub 执行任意代码,这可能导致恶意代码的执行。此选项应作为可配置参数,并默认设置为 False,仅在完全信任模型来源时才由用户显式启用。
| model = auto_model_class.from_config( | |
| self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=True | |
| ) | |
| model = auto_model_class.from_config( | |
| self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=self.config.trust_remote_code | |
| ) |
| # Instantiate the tokenizer and processor. | ||
| from verl.utils import hf_processor, hf_tokenizer | ||
|
|
||
| trust_remote_code = True |
| actor_module = actor_module_class.from_pretrained( | ||
| pretrained_model_name_or_path=local_path, | ||
| torch_dtype=torch_dtype, | ||
| config=actor_model_config, | ||
| trust_remote_code=True, | ||
| ) |
There was a problem hiding this comment.
在此处硬编码 trust_remote_code=True 是一个严重的安全漏洞。它允许模型仓库中的任意代码在您的机器上执行。强烈建议将此值作为配置选项,并默认禁用,仅在您完全信任代码来源时才启用。
| actor_module = actor_module_class.from_pretrained( | |
| pretrained_model_name_or_path=local_path, | |
| torch_dtype=torch_dtype, | |
| config=actor_model_config, | |
| trust_remote_code=True, | |
| ) | |
| actor_module = actor_module_class.from_pretrained( | |
| pretrained_model_name_or_path=local_path, | |
| torch_dtype=torch_dtype, | |
| config=actor_model_config, | |
| trust_remote_code=trust_remote_code, | |
| ) |
| ### 基于 [transformers](https://github.com/huggingface/transformers) 源码的修改 | ||
| - 修改文件: | ||
| - `/root/miniconda3/envs/verl/lib/python3.10/site-packages/transformers/configuration_utils.py` | ||
| - 修改内容: | ||
| - 将第 917 行改为: | ||
| ```python | ||
| json.dumps(config_dict, indent=2, sort_keys=False) + "\n" | ||
| ``` |
| nohup env PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ | ||
| algorithm.adv_estimator=grpo \ | ||
| data.train_files=/usr/train3.parquet \ # 需要自己修改位置 | ||
| data.train_batch_size=264 \ | ||
| data.max_prompt_length=2048 \ | ||
| data.max_response_length=512 \ | ||
| actor_rollout_ref.model.path=/root/.cache/modelscope/hub/models/BAAI/OpenSeek-Small-v1-SFT \ # 需要自己修改位置 | ||
| actor_rollout_ref.actor.optim.lr=1e-5 \ | ||
| actor_rollout_ref.actor.ppo_mini_batch_size=72 \ | ||
| actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \ | ||
| actor_rollout_ref.rollout.name=vllm \ | ||
| +actor_rollout_ref.actor.fsdp_config.model_dtype=bf16 \ | ||
| actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \ | ||
| actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ | ||
| actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \ | ||
| actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \ | ||
| trainer.logger=tensorboard \ | ||
| trainer.val_before_train=True \ | ||
| trainer.n_gpus_per_node=6 \ | ||
| trainer.nnodes=1 \ | ||
| trainer.save_freq=200 \ | ||
| trainer.test_freq=10 \ | ||
| trainer.total_epochs=15 \ | ||
| data.val_files=$HOME/data/gsm8k/test.parquet \ | ||
| actor_rollout_ref.rollout.n=6 \ | ||
| > train.log 2>&1 & |
There was a problem hiding this comment.
运行指令中包含了多个硬编码的路径(例如 /usr/train3.parquet, /root/.cache/...)。这使得脚本在不同环境中难以直接运行。建议使用环境变量或命令行参数来传递这些路径,以提高脚本的可移植性。
例如:
# 在您的脚本或环境中设置环境变量
export TRAIN_FILES=/path/to/your/train3.parquet
export MODEL_PATH=/path/to/your/model
# 然后在命令中使用它们
nohup env PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
data.train_files=$TRAIN_FILES \
actor_rollout_ref.model.path=$MODEL_PATH \
...| dataset_name = 'open-r1/Big-Math-RL-Verified-Processed' | ||
| subset_name = 'all' | ||
| split = 'train' | ||
| data_source_name = "Big-Math" # 用于在数据中标记来源 |
|
|
||
| # 组装成最终的数据结构 | ||
| processed_data = { | ||
| "data_source": 'hiyouga/geometry3k', |
There was a problem hiding this comment.
| train_dataset = MsDataset.load('modelscope/gsm8k', subset_name='main', split='train',trust_remote_code=True) | ||
| test_dataset = MsDataset.load('modelscope/gsm8k', subset_name='main', split='test',trust_remote_code=True) | ||
|
|
||
| instruction_following = instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:' |
There was a problem hiding this comment.
在这一行中,instruction_following 和 instruction 被赋予了相同的值,但 instruction_following 在后续代码中并未使用。这会造成代码冗余。建议只保留 instruction 的赋值。
| instruction_following = instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:' | |
| instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:' |
No description provided.