-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Description
When finetuning starting from model_zoo/UniPixel-7B, the produced checkpoint config.json records base_model_path as model_zoo/Qwen2.5-VL-7B-Instruct, which is misleading.
Steps to Reproduce
- Run:
#!/bin/bash
set -e
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export ASCEND_RT_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
export PYTHONPATH="./:$PYTHONPATH"
export NCCL_TIMEOUT=600
torchrun --nproc_per_node 8 unipixel/train/train.py \
--deepspeed scripts/zero2.json \
--model_name_or_path model_zoo/UniPixel-3B \
--base_model qwen2_5_vl \
--conv_type chatml \
--sam2_config configs/sam2.1_hiera_b+ \
--sam2_checkpoint model_zoo/sam2.1/sam2.1_hiera_base_plus.pt \
--sam2_image_size 768 \
--sam2_apply_postprocessing False \
--sam2_inference_mode False \
--sam2_hidden_tokens 2 \
--sam2_batch_mode False \
--sam2_enable_decoder True \
--sam2_lr 5e-6 \
--lora_enable True \
--lora_type qkvo_all \
--lora_r 128 \
--lora_alpha 256 \
--lora_dropout 0.1 \
--lora_bias none \
--tuning_modules embedding,ref,ref_enc,msk,seg,sam2 \
--datasets <your-datasets> \
--sample_frames 8 \
--sample_type random \
--sample_objects 5 \
--num_threads 1 \
--max_conv_turns 3 \
--max_video_frames 500 \
--max_video_len 300 \
--max_num_words 200 \
--max_num_tokens 40960 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 4 \
--output_dir work_dirs/8p_stage4_1e \
--save_full_model True \
--save_strategy steps \
--save_steps 1000 \
--save_total_limit 500 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type cosine \
--logging_steps 1 \
--gradient_checkpointing True \
--dataloader_num_workers 2 \
--tf32 True \
--bf16 True \
--report_to wandb- Open
outputs/.../checkpoint-XX/config.jsonand inspectbase_model_path.
{
"architectures": [
"PixelQwen2_5_VLForConditionalGeneration"
],
"attention_dropout": 0.0,
"base_model": "qwen2_5_vl",
"base_model_path": "model_zoo/Qwen2.5-VL-7B-Instruct",
"bos_token_id": 151643,
...
Log during inference on this model:
Loading base model from model_zoo/Qwen2.5-VL-7B-Instruct...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00, 4.15it/s]
Some weights of PixelQwen2_5_VLForConditionalGeneration were not initialized from the model checkpoint at model_zoo/Qwen2.5-VL-7B-Instruct and are newly initialized: ['msk_proj.0.bias', 'msk_proj.0.weight', 'msk_proj.2.bias', 'msk_proj.2.weight', 'ref_encoder.mask_downs...
...
'tem_proj.bias', 'tem_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading adapter from /mnt/pfs/rmgf7p/tanzl/model_space/UniPixel/outputs/overnight_80k/lora_qkvo_r64_a128_d005/2025-11-06-00-36-17/checkpoint-6000...
Loading state dict from /mnt/pfs/rmgf7p/tanzl/model_space/UniPixel/outputs/overnight_80k/lora_qkvo_r64_a128_d005/2025-11-06-00-36-17/checkpoint-6000/pytorch_model.safetensors...
Merging adapter and unloading...
Expected
base_model_path should match the starting model: model_zoo/UniPixel-7B.
Actual
base_model_path remains model_zoo/Qwen2.5-VL-7B-Instruct while model_name_or_path is model_zoo/UniPixel-7B.
Suspected Cause
In unipixel/train/train.py, base_model_path is assigned only in the non-UniPixel branch. For UniPixel (model_type = pixel_qwen2_5_vl) via build_model(...), it inherits the Qwen value from the source config.
Proposed Fix
Assign base_model_path after build_model(...) as well:
# unipixel/train/train.py
model, processor = build_model(...)
model.config.base_model_path = model_args.model_name_or_pathThe log disappeared after this fix:
Loading full model from /mnt/pfs/rmgf7p/tanzl/model_space/UniPixel/outputs/debug/2025-11-10-11-54-53...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.17it/s]
This ensures checkpoints reflect the actual starting model (model_zoo/UniPixel-7B).