Skip to content

Conversation

@seyeong-han
Copy link
Contributor

@seyeong-han seyeong-han commented Jan 26, 2026

Summary

This PR adds support for running the Gemma-3-1B-IT text-only model on ExecuTorch with CPU backend. The new gemma3_text_runner provides a lightweight alternative to the existing multimodal gemma3_e2e_runner, without requiring image processing dependencies.

Dependencies

⚠️ Required: This PR depends on huggingface/optimum-executorch#206 which adds proper EOS token handling for Gemma models.

The optimum-executorch PR modifies utils.py to include the <end_of_turn> token (ID 106) in get_eos_ids for Gemma models. Without this change, the text runner will not stop generation at <end_of_turn> and will continue until max_new_tokens is reached.

Changes

New Files

  • examples/models/gemma3/text_runner.cpp - Text-only inference runner with:
    • gflags command-line interface (model_path, tokenizer_path, prompt, temperature, max_new_tokens, cpu_threads, warmup)
    • Gemma3 chat template formatting (<start_of_turn>user\n...<end_of_turn>\n<start_of_turn>model\n)
    • Integration with TextLLMRunner for text generation
    • Threadpool configuration support

Modified Files

  • examples/models/gemma3/CMakeLists.txt - Added gemma3_text_runner executable target
  • examples/models/gemma3/CMakePresets.json - Added gemma3-text-cpu build and workflow presets
  • Makefile - Added gemma3-text-cpu target
  • examples/models/gemma3/README.md - Comprehensive documentation for both models

Test Plan

# Build
make gemma3-text-cpu

# Export (requires optimum-executorch with PR #206 merged)
optimum-cli export executorch \
  --model "google/gemma-3-1b-it" \
  --task "text-generation" \
  --recipe "xnnpack" \
  --use_custom_sdpa \
  --use_custom_kv_cache \
  --output_dir="gemma-3/gemma-3-1b-it"

# Download tokenizer
curl -L https://huggingface.co/google/gemma-3-1b-it/resolve/main/tokenizer.json -o gemma-3/tokenizer.json

# Run
./cmake-out/examples/models/gemma3/gemma3_text_runner \
  --model_path=gemma-3/gemma-3-1b-it/model.pte \
  --tokenizer_path=gemma-3/tokenizer.json \
  --prompt="What is the capital of France?" \
  --max_new_tokens=50

Result

./cmake-out/examples/models/gemma3/gemma3_text_runner \
    --model_path=/Users/younghan/project/executorch/gemma-3/gemma-3-1b-it/model.pte \
    --tokenizer_path=/Users/younghan/project/executorch/gemma-3/tokenizer.json \
    --prompt="What is the capital of France?" \
    --max_new_tokens=50 --warmup

I tokenizers:regex.cpp:27] Registering override fallback regex
I tokenizers:hf_tokenizer.cpp:142] Setting up normalizer...
I tokenizers:hf_tokenizer.cpp:146] Normalizer set up
I tokenizers:hf_tokenizer.cpp:160] Setting up pretokenizer...
I tokenizers:hf_tokenizer.cpp:164] Pretokenizer set up
I tokenizers:hf_tokenizer.cpp:180] Loading BPE merges...
I tokenizers:hf_tokenizer.cpp:240] Loaded 513511 BPE merge rules
I tokenizers:hf_tokenizer.cpp:252] Built merge ranks map with 236249 entries
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1769466105.443906 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.443994 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444053 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444080 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444089 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444107 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466105.444161 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444169 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444189 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444194 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444206 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'user' -> 'user'
E0000 00:00:1769466105.444228 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444249 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444748 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.444769 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'What is the capital of France?' -> 'What▁is▁the▁capital▁of▁France?'
E0000 00:00:1769466105.445614 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.445625 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.445639 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466105.445653 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466105.445667 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466105.445676 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'model' -> 'model'

E0000 00:00:1769466106.615110 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615139 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615147 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615168 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615173 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615185 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466106.615198 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615205 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615225 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615229 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615245 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'user' -> 'user'
E0000 00:00:1769466106.615256 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615281 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615287 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615300 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'What is the capital of France?' -> 'What▁is▁the▁capital▁of▁France?'
E0000 00:00:1769466106.615351 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615358 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615371 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466106.615381 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
E0000 00:00:1769466106.615391 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: '' -> ''
E0000 00:00:1769466106.615399 16535932 re2.cc:804] DFA out of memory: pattern length 96883, program size 17069, list count 10654, bytemap range 45
I tokenizers:hf_tokenizer.cpp:415] normalized input: 'model' -> 'model'
<start_of_turn>user
What is the capital of France?<end_of_turn>
<start_of_turn>model
The capital of France is **Paris**.

<end_of_turn>

PyTorchObserver {"prompt_tokens":15,"generated_tokens":9,"model_load_start_ms":0,"model_load_end_ms":0,"inference_start_ms":1769466106615,"inferen
ce_end_ms":1769466107233,"prompt_eval_end_ms":1769466106746,"first_token_ms":1769466106746,"aggregate_sampling_time_ms":4,"SCALING_FACTOR_UNITS_PER_SECOND":1000}                                                                                                                                   

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 26, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16885

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 690d2dd with merge base ecc7dd0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 26, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant