Skip to content

Checkpoints record base_model_path as Qwen instead of UniPixel when finetuning from UniPixel-7B #9

@Kenn3o3

Description

@Kenn3o3

Description

When finetuning starting from model_zoo/UniPixel-7B, the produced checkpoint config.json records base_model_path as model_zoo/Qwen2.5-VL-7B-Instruct, which is misleading.

Steps to Reproduce

  1. Run:
#!/bin/bash

set -e

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export ASCEND_RT_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
export PYTHONPATH="./:$PYTHONPATH"
export NCCL_TIMEOUT=600

torchrun --nproc_per_node 8 unipixel/train/train.py \
    --deepspeed scripts/zero2.json \
    --model_name_or_path model_zoo/UniPixel-3B \
    --base_model qwen2_5_vl \
    --conv_type chatml \
    --sam2_config configs/sam2.1_hiera_b+ \
    --sam2_checkpoint model_zoo/sam2.1/sam2.1_hiera_base_plus.pt \
    --sam2_image_size 768 \
    --sam2_apply_postprocessing False \
    --sam2_inference_mode False \
    --sam2_hidden_tokens 2 \
    --sam2_batch_mode False \
    --sam2_enable_decoder True \
    --sam2_lr 5e-6 \
    --lora_enable True \
    --lora_type qkvo_all \
    --lora_r 128 \
    --lora_alpha 256 \
    --lora_dropout 0.1 \
    --lora_bias none \
    --tuning_modules embedding,ref,ref_enc,msk,seg,sam2 \
    --datasets <your-datasets> \
    --sample_frames 8 \
    --sample_type random \
    --sample_objects 5 \
    --num_threads 1 \
    --max_conv_turns 3 \
    --max_video_frames 500 \
    --max_video_len 300 \
    --max_num_words 200 \
    --max_num_tokens 40960 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --output_dir work_dirs/8p_stage4_1e \
    --save_full_model True \
    --save_strategy steps \
    --save_steps 1000 \
    --save_total_limit 500 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type cosine \
    --logging_steps 1 \
    --gradient_checkpointing True \
    --dataloader_num_workers 2 \
    --tf32 True \
    --bf16 True \
    --report_to wandb
  1. Open outputs/.../checkpoint-XX/config.json and inspect base_model_path.
{
  "architectures": [
    "PixelQwen2_5_VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "base_model": "qwen2_5_vl",
  "base_model_path": "model_zoo/Qwen2.5-VL-7B-Instruct",
  "bos_token_id": 151643,
  ...

Log during inference on this model:

Loading base model from model_zoo/Qwen2.5-VL-7B-Instruct...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  4.15it/s]
Some weights of PixelQwen2_5_VLForConditionalGeneration were not initialized from the model checkpoint at model_zoo/Qwen2.5-VL-7B-Instruct and are newly initialized: ['msk_proj.0.bias', 'msk_proj.0.weight', 'msk_proj.2.bias', 'msk_proj.2.weight', 'ref_encoder.mask_downs...
...
'tem_proj.bias', 'tem_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading adapter from /mnt/pfs/rmgf7p/tanzl/model_space/UniPixel/outputs/overnight_80k/lora_qkvo_r64_a128_d005/2025-11-06-00-36-17/checkpoint-6000...
Loading state dict from /mnt/pfs/rmgf7p/tanzl/model_space/UniPixel/outputs/overnight_80k/lora_qkvo_r64_a128_d005/2025-11-06-00-36-17/checkpoint-6000/pytorch_model.safetensors...
Merging adapter and unloading...

Expected

base_model_path should match the starting model: model_zoo/UniPixel-7B.

Actual

base_model_path remains model_zoo/Qwen2.5-VL-7B-Instruct while model_name_or_path is model_zoo/UniPixel-7B.

Suspected Cause

In unipixel/train/train.py, base_model_path is assigned only in the non-UniPixel branch. For UniPixel (model_type = pixel_qwen2_5_vl) via build_model(...), it inherits the Qwen value from the source config.

Proposed Fix

Assign base_model_path after build_model(...) as well:

# unipixel/train/train.py
model, processor = build_model(...)
model.config.base_model_path = model_args.model_name_or_path

The log disappeared after this fix:

Loading full model from /mnt/pfs/rmgf7p/tanzl/model_space/UniPixel/outputs/debug/2025-11-10-11-54-53...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  5.17it/s]

This ensures checkpoints reflect the actual starting model (model_zoo/UniPixel-7B).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions