Update README and integration test for Qwen2-7B-Instruct by sdeeptan-aws · Pull Request #49 · aws-neuron/neuronx-distributed-inference

sdeeptan-aws · 2026-02-19T16:49:53Z

Description

Updated Qwen2-7B-Instruct contrib model README and integration test. The modeling code is unchanged — Qwen2 is architecturally close to LLaMA (reuses NeuronLlamaMLP, standard RoPE, pre-norm decoder layers) with QKV bias enabled and tied embeddings. Validation achieves 100% token match.

Model Information

Model Name: Qwen2-7B-Instruct
Model Architecture: Decoder-only transformer (28 layers, hidden_size=3584, GQA 28Q heads, QKV bias, tied embeddings)
Purpose: Text generation / instruction following

Checklist

Required Components

Accuracy Test (test/integration/test_model.py)
- Token match accuracy validation
- Test can compile and run the model on Neuron
README.md with the following sections:
- Usage Example: Clear code example showing how to use the model
- Compatibility Matrix: Table showing tested Neuron SDK versions and instance types
- Example Checkpoints: Links to compatible model checkpoints
- Testing Instructions: Command to run the test suite for the model
Source Code (src/)
- Modeling code following NxD Inference patterns (unchanged in this PR)

Optional Components

Unit Tests (CPU or Neuron-based)

Folder Structure

/contrib/models/Qwen2-7B-Instruct/
  README.md
  /src
    modeling_qwen2.py
  /test
    /integration
      test_model.py

Testing

Model was compiled and tested with TP=2, batch_size=1, seq_len=128, bfloat16.

Test Results:

Test	Status	Result
Smoke Test	✅ PASS	Model loads successfully
Token Matching	✅ PASS	100% match

Test Prompt: "def fibonacci(n):"

Compatibility

Tested with:

Instance Type(s): Trn1
Configuration: TP=2, batch_size=1, seq_len=128, bfloat16

Additional Information

LLaMA-like architecture: Qwen2 reuses NeuronLlamaMLP directly. Standard RoPE, pre-norm decoder layers, SwiGLU activation.
QKV bias: qkv_bias=True, o_bias=False — unlike LLaMA which has no bias on any attention projections.
Tied embeddings: embed_tokens.weight cloned to lm_head.weight via update_state_dict_for_tied_weights.
Fused QKV support: Optional fused QKV path concatenates Q/K/V weights into single Wqkv tensor when neuron_config.fused_qkv=True.

Related Issues

N/A

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

aws-yishanm

Approved because Readme and test were present.

Update README and integration test for Qwen2-7B-Instruct

e32a4f3

aws-yishanm reviewed Feb 20, 2026

View reviewed changes

aws-yishanm approved these changes Feb 20, 2026

View reviewed changes

petesraj-aws self-requested a review February 23, 2026 21:08

petesraj-aws approved these changes Feb 23, 2026

View reviewed changes

Removing internal names

fff760f

rgrandhiamzn approved these changes Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README and integration test for Qwen2-7B-Instruct#49

Update README and integration test for Qwen2-7B-Instruct#49
sdeeptan-aws wants to merge 2 commits intoaws-neuron:mainfrom
sdeeptan-aws:qwen

sdeeptan-aws commented Feb 19, 2026

Uh oh!

aws-yishanm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sdeeptan-aws commented Feb 19, 2026

Description

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Additional Information

Related Issues

vLLM Integration

Uh oh!

aws-yishanm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants