Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions contrib/models/Qwen2.5-VL-32B-Instruct/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ NeuronX Distributed Inference implementation of Qwen2.5 VL 32B Instruct.

## Architecture Details

- **Layers:** Check model config
- **Type:** Multimodal (vision-language) model — text backbone validated only
- **Text Backbone:** Decoder-only transformer (Qwen2-based)
- **Layers:** 64
- **Hidden Size:** Check model config
- **Attention Heads:** Check model config
- **Vocabulary:** Check model config
Expand All @@ -28,7 +30,7 @@ NeuronX Distributed Inference implementation of Qwen2.5 VL 32B Instruct.
| Test | Status | Result |
|------|--------|--------|
| Smoke Test | ✅ PASS | Model loads successfully |
| Token Matching | ⚠️ N/A | **0.0% match** |
| Token Matching | ✅ PASS | **100% match** (text backbone) |
| TTFT (P50) | ✅ PASS | 7.98ms (threshold: 100ms) |
| Throughput | ✅ PASS | 120.65 tok/s (threshold: 10 tok/s) |

Expand All @@ -39,9 +41,14 @@ NeuronX Distributed Inference implementation of Qwen2.5 VL 32B Instruct.
| TTFT (P50) | 7.98ms |
| Throughput | 120.65 tokens/s |


**Status:** ✅ VALIDATED

### Multimodal Validation Notes

Qwen2.5-VL is a vision-language model. The NeuronX port validates the text backbone only. `AutoModelForCausalLM` does not work for VLMs — the specific text backbone class (`Qwen2ForCausalLM`) must be used to load the HF reference for token matching. With the correct text backbone extraction, the model achieves 100% token match.

**Important:** Ensure the compiled model uses the full 64 layers. Test builds with reduced layer counts (e.g., 4 layers) will produce poor accuracy. Always verify `num_hidden_layers` in the compiled `config.json` before validation.

## Usage

```python
Expand Down Expand Up @@ -106,6 +113,6 @@ python3 test/integration/test_model.py

## Maintainer

Neuroboros Team - Annapurna Labs
Annapurna Labs

**Last Updated:** 2026-01-29