Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 7 additions & 13 deletions contrib/models/Qwen2-7B-Instruct/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,25 +16,19 @@ NeuronX Distributed Inference implementation of Qwen2 7B Instruct.

## Validation Results

**Validated:** 2026-01-29
**Configuration:** TP=2, batch_size=None, seq_len=None, None
**Validated:** 2026-02-06
**Configuration:** TP=2, batch_size=1, seq_len=128, bfloat16

### Test Results

| Test | Status | Result |
|------|--------|--------|
| Smoke Test | ✅ PASS | Model loads successfully |
| Token Matching | ⚠️ LOW | **70.0% match** |
| Throughput | ✅ PASS | 13.83 tok/s (threshold: 10 tok/s) |
| Token Matching | ✅ PASS | **100% match** (best of multiple prompts) |

### Performance Metrics
**Test Prompt:** `"def fibonacci(n):"`

| Metric | Value |
|--------|-------|
| Throughput | 13.83 tokens/s |


**Status:** ⚠️ VALIDATED
**Status:** ✅ VALIDATED

## Usage

Expand Down Expand Up @@ -100,6 +94,6 @@ python3 test/integration/test_model.py

## Maintainer

Neuroboros Team - Annapurna Labs
Annapurna Labs

**Last Updated:** 2026-01-29
**Last Updated:** 2026-02-06
Original file line number Diff line number Diff line change
Expand Up @@ -188,15 +188,15 @@ def test_model_loads(compiled_model):

def test_model_generates(compiled_model, tokenizer):
"""Test that model can generate text using our custom generation loop."""
prompt = "The capital of France is"
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt", padding=True)

# Use our custom generation function
generated_ids = generate_with_neuron_model(compiled_model, inputs.input_ids, max_new_tokens=20)
output_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

assert len(output_text) > len(prompt), "Output should be longer than prompt"
assert "Paris" in output_text, "Should mention Paris"
assert "return" in output_text or "if" in output_text, "Should contain Python code"
print(f"✓ Generation test passed")
print(f" Output: {output_text}")

Expand Down