[PZ COMPETITION] yuanboyang final by JokerYang0703 · Pull Request #167 · FlagAI-Open/OpenSeek

JokerYang0703 · 2025-09-12T09:27:37Z

No description provided.

Signed-off-by: yuanboyang <951569335@qq.com>

gemini-code-assist

Summary of Changes

Hello @JokerYang0703, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在为竞赛准备最终代码，主要侧重于更新数据处理、模型加载配置以及verl框架内的特定奖励逻辑。这些更改可能旨在优化或调整大型语言模型（LLM）以更好地完成数学推理任务。

Highlights

数据处理流程增强: 引入了新的download.py脚本，用于将ModelScope上的"Big-Math-RL-Verified-Processed"数据集处理并转换为verl兼容的Parquet格式。
GSM8k数据集集成更新: 修改了gsm8k.py数据预处理脚本，将数据源从openai/gsm8k (Hugging Face datasets) 切换到modelscope/gsm8k (ModelScope MsDataset)。
提示模板标准化: 更新了数学推理任务的指令提示，强制使用\boxed{}格式输出最终答案。
trust_remote_code配置简化: 在verl的多个组件（模型合并器、训练器、FSDP worker和配置文件）中将trust_remote_code硬编码为True，以简化模型加载流程。
奖励函数调整: 修改了geo3k.py中奖励函数的正则表达式模式，移除了对.*的匹配要求，简化了匹配逻辑。
VLLM版本降级: 在安装脚本中将vllm依赖版本从0.8.5.post1降级到0.8.2。
增加Rollout响应数量: 将rollout.yaml中的n参数从1更改为8，增加了Rollout过程中生成的响应数量。

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

这个拉取请求似乎是“yuanboyang final”竞赛提交的最终代码。它包括了运行完整代码库所需的所有文件：修改后的 verl 源码、数据下载脚本、环境依赖文件以及详细的 README.md 说明文档。主要改动包括：

修改数据处理脚本以从 ModelScope 加载数据集，并调整了 prompt 格式。
在多个文件中硬编码了 trust_remote_code=True，这是一个严重的安全隐患。
README.md 文件提供了详细的设置和运行指南，但包含了一些硬编码的路径和不推荐的操作（如直接修改 site-packages 中的文件），这会影响代码的可移植性和可维护性。

我的审查意见主要集中在提高代码的安全性、可维护性和可复现性方面。

gemini-code-assist · 2025-09-12T09:29:23Z

openseek/competition/pz/yuanboyang/verl/verl/model_merger/base_model_merger.py

+            model = auto_model_class.from_config(
+                self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=True
+            )


在此处硬编码 trust_remote_code=True 存在严重的安全风险。它允许从 Hugging Face Hub 执行任意代码，这可能导致恶意代码的执行。此选项应作为可配置参数，并默认设置为 False，仅在完全信任模型来源时才由用户显式启用。

Suggested change

model = auto_model_class.from_config(

self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=True

)

model = auto_model_class.from_config(

self.model_config, torch_dtype=torch.bfloat16, trust_remote_code=self.config.trust_remote_code

)

gemini-code-assist · 2025-09-12T09:29:23Z

openseek/competition/pz/yuanboyang/verl/verl/trainer/main_ppo.py

+        # Instantiate the tokenizer and processor.
+        from verl.utils import hf_processor, hf_tokenizer
+
+        trust_remote_code = True


在此处硬编码 trust_remote_code = True 存在严重的安全风险，因为它允许从模型中心执行任意代码。这应该是一个可配置的选项，并且默认情况下应为 False，以防止潜在的恶意代码攻击。

Suggested change

trust_remote_code = True

trust_remote_code = config.data.get("trust_remote_code", False)

gemini-code-assist · 2025-09-12T09:29:23Z

openseek/competition/pz/yuanboyang/verl/verl/workers/fsdp_workers.py

+            actor_module = actor_module_class.from_pretrained(
+                pretrained_model_name_or_path=local_path,
+                torch_dtype=torch_dtype,
+                config=actor_model_config,
+                trust_remote_code=True,
+            )


在此处硬编码 trust_remote_code=True 是一个严重的安全漏洞。它允许模型仓库中的任意代码在您的机器上执行。强烈建议将此值作为配置选项，并默认禁用，仅在您完全信任代码来源时才启用。

Suggested change

actor_module = actor_module_class.from_pretrained(

pretrained_model_name_or_path=local_path,

torch_dtype=torch_dtype,

config=actor_model_config,

trust_remote_code=True,

)

actor_module = actor_module_class.from_pretrained(

pretrained_model_name_or_path=local_path,

torch_dtype=torch_dtype,

config=actor_model_config,

trust_remote_code=trust_remote_code,

)

gemini-code-assist · 2025-09-12T09:29:23Z

openseek/competition/pz/yuanboyang/README.md

+### 基于 [transformers](https://github.com/huggingface/transformers) 源码的修改
+- 修改文件：
+  - `/root/miniconda3/envs/verl/lib/python3.10/site-packages/transformers/configuration_utils.py`
+- 修改内容：
+  - 将第 917 行改为：
+    ```python
+    json.dumps(config_dict, indent=2, sort_keys=False) + "\n"
+    ```


直接修改 site-packages 中的库文件是一种不良实践。这会使环境变得脆弱且难以复现。如果其他人尝试设置此项目，他们可能会忘记手动应用此补丁，导致行为不一致或出错。更好的方法是 fork transformers 仓库，应用您的修改，然后从您的 fork 安装，或者使用 .patch 文件和脚本来应用更改。

gemini-code-assist · 2025-09-12T09:29:23Z

openseek/competition/pz/yuanboyang/README.md

+nohup env PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
+  algorithm.adv_estimator=grpo \
+  data.train_files=/usr/train3.parquet \   # 需要自己修改位置
+  data.train_batch_size=264 \
+  data.max_prompt_length=2048 \
+  data.max_response_length=512 \
+  actor_rollout_ref.model.path=/root/.cache/modelscope/hub/models/BAAI/OpenSeek-Small-v1-SFT \   # 需要自己修改位置
+  actor_rollout_ref.actor.optim.lr=1e-5 \
+  actor_rollout_ref.actor.ppo_mini_batch_size=72 \
+  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
+  actor_rollout_ref.rollout.name=vllm \
+  +actor_rollout_ref.actor.fsdp_config.model_dtype=bf16 \
+  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
+  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
+  actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
+  actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
+  trainer.logger=tensorboard \
+  trainer.val_before_train=True \
+  trainer.n_gpus_per_node=6 \
+  trainer.nnodes=1 \
+  trainer.save_freq=200 \
+  trainer.test_freq=10 \
+  trainer.total_epochs=15 \
+  data.val_files=$HOME/data/gsm8k/test.parquet \
+  actor_rollout_ref.rollout.n=6 \
+  > train.log 2>&1 &


运行指令中包含了多个硬编码的路径（例如 /usr/train3.parquet, /root/.cache/...）。这使得脚本在不同环境中难以直接运行。建议使用环境变量或命令行参数来传递这些路径，以提高脚本的可移植性。

例如:

# 在您的脚本或环境中设置环境变量 export TRAIN_FILES=/path/to/your/train3.parquet export MODEL_PATH=/path/to/your/model # 然后在命令中使用它们 nohup env PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \ data.train_files=$TRAIN_FILES \ actor_rollout_ref.model.path=$MODEL_PATH \ ...

gemini-code-assist · 2025-09-12T09:29:24Z

openseek/competition/pz/yuanboyang/download.py

+    dataset_name = 'open-r1/Big-Math-RL-Verified-Processed'
+    subset_name = 'all'
+    split = 'train'
+    data_source_name = "Big-Math" # 用于在数据中标记来源


变量 data_source_name 已被定义但从未在代码中使用。建议移除未使用的变量以保持代码整洁。

gemini-code-assist · 2025-09-12T09:29:24Z

openseek/competition/pz/yuanboyang/download.py

+
+        # 组装成最终的数据结构
+        processed_data = {
+            "data_source": 'hiyouga/geometry3k',


这里的 data_source 硬编码为 'hiyouga/geometry3k'，但加载的数据集是 'open-r1/Big-Math-RL-Verified-Processed'。这似乎不一致，并且可能是一个错误，正如 README.md 中所提到的。为了清晰和避免潜在的错误，建议将此值作为参数或常量进行管理，并确保其与数据处理逻辑一致。

Suggested change

"data_source": 'hiyouga/geometry3k',

"data_source": data_source_name,

gemini-code-assist · 2025-09-12T09:29:24Z

openseek/competition/pz/yuanboyang/verl/examples/data_preprocess/gsm8k.py

+    train_dataset =  MsDataset.load('modelscope/gsm8k', subset_name='main', split='train',trust_remote_code=True)
+    test_dataset =  MsDataset.load('modelscope/gsm8k', subset_name='main', split='test',trust_remote_code=True)
+
+    instruction_following = instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:'


在这一行中，instruction_following 和 instruction 被赋予了相同的值，但 instruction_following 在后续代码中并未使用。这会造成代码冗余。建议只保留 instruction 的赋值。

Suggested change

instruction_following = instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:'

instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:'

JokerYang0703 added 2 commits September 12, 2025 15:25

team yuanboyang提交决赛代码与说明文档

bc6786f

Signed-off-by: yuanboyang <951569335@qq.com>

team yuanboyang提交决赛代码与说明文档

0e10457

Signed-off-by: yuanboyang <951569335@qq.com>

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

ftgreat changed the title ~~yuanboyang final~~ [PZ COMPETITION] yuanboyang final Sep 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PZ COMPETITION] yuanboyang final#167

[PZ COMPETITION] yuanboyang final#167
JokerYang0703 wants to merge 2 commits intoFlagAI-Open:mainfrom
JokerYang0703:add-final-code

JokerYang0703 commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	trust_remote_code = True
	trust_remote_code = config.data.get("trust_remote_code", False)

	"data_source": 'hiyouga/geometry3k',
	"data_source": data_source_name,

	instruction_following = instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:'
	instruction = r'Please reason step by step,and must put your final answer within \boxed{}.Question:'

Conversation

JokerYang0703 commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant