Skip to content

Conversation

@seyeong-han
Copy link

Summary

When exporting Gemma models with --task "text-generation", the C++ text runner wasn't stopping at the <end_of_turn> token (ID 106), only at the <eos> token (ID 1). This caused generation to continue beyond expected stopping points.

This diff modifies save_config_to_constant_methods() to export multiple EOS token IDs via get_eos_ids, which the C++ runner's kEosIds method already supports.

Changes

  • EOS list handling: Build a list of EOS token IDs that handles both single int and list cases from config.eos_token_id
  • Gemma detection: Detect Gemma models via config.model_type containing "gemma" and automatically add token 106 (<end_of_turn>) to the EOS list
  • Dual export: Export both get_eos_ids (full list) for C++ runner compatibility and get_eos_id (first element) for backward compatibility

Compatibility

  • Backward compatible: Python modeling.py already checks for both get_eos_id and get_eos_ids
  • C++ runner ready: The C++ runner's get_eos_ids() function already supports reading a list of EOS tokens via the kEosIds method

Test Plan

  • Export Gemma-3-1b-it with --task "text-generation" and verify metadata contains get_eos_ids: [1, 106]
  • Run the C++ text runner and verify generation stops at <end_of_turn>
  • Run existing tests: pytest tests/models/test_modeling_gemma3.py -v

…tion

Add support for exporting multiple EOS token IDs (`get_eos_ids`) in model
metadata to enable proper generation stopping for Gemma models.

- Handle cases where config.eos_token_id is already a list vs single int
- Detect Gemma models via config.model_type and add <end_of_turn> token (106)
- Export get_eos_ids (list) for C++ runner compatibility
- Maintain get_eos_id (first ID) for backward compatibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant