Skip to content

Add OLMo2/3 models support in fairseq2#1410

Open
YunchaoYang wants to merge 25 commits intomainfrom
yy/add-olmo2-model
Open

Add OLMo2/3 models support in fairseq2#1410
YunchaoYang wants to merge 25 commits intomainfrom
yy/add-olmo2-model

Conversation

@YunchaoYang
Copy link

@YunchaoYang YunchaoYang commented Nov 5, 2025

What does this PR do? Please describe:

  • Add OLMo2 model architecture (1B, 7B, 13B) support in fairseq2

  • The key architecture changes include:
    OLMO2 is similar to the LLaMA architecture with the following differences:

    • Olmo2RMSNorm: In Olmo2 the order of operation for RMSNorm is normalize -> multiply by weight -> cast to original dtype.
    • OLMO2TransformerLMDecoderLayer: OLMO2 uses Post-Norm in decoder layer: Attention/FFN -> Norm -> Add Residual, which is different than the existing Pre-Norm and Post-Norm order.
    • OLMO2MultiheadAttention:
      • OLMO2 adds Q/K Norm in attention layers, the Q/K has slight difference in the order of normalization and reshape: Project → Normalize → Reshape → RoPE.
      • OLMO2 MHA instead of GQA. OLMO2-32B model use GQA.
    • OLMO2RotaryEmbedding: The Rope Module reuse the existing ReferenceRotaryEncoder module.
  • An integration test is added to ensure the output is consistent with HF Transformers. The integration test has passed for the 1B model.

Note:
OLMO2MultiheadAttention inherits from StandardMultiheadAttention (marked @final)
because the only difference is the order of normalization in _project_q() and _project_kv().
Reimplementing the entire class would duplicate ~150 lines of boilerplate code. Right now, the type checker warning is suppressed.

Fixes #1402

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

  • Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • Did you read the contributor guideline?
  • Did you make sure that your PR does only one thing instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2025
@YunchaoYang YunchaoYang changed the title Add OLMo2 models (1B, 7B, 13B) support to fairseq2 Add OLMo2/3 models support in fairseq2 Jan 24, 2026
yunchaoyang1 user and others added 24 commits February 23, 2026 15:00
- Rename olmo2.yaml to olmo.yaml and update model_family to 'olmo'
- Fix imports in composition/models.py and composition/tokenizers.py
- Fix hub.py import (fairseq2.hub.model -> fairseq2.models.hub)
- Add tokenizer exports and OLMO3 aliases to __init__.py
- Fix OLMORMSNorm with proper __init__ method in normalization.py
- Implement per-layer RoPE support in factory.py for OLMO3 sliding window
- Fix yarn_rope.py init order (params before super().__init__)
- Fix yarn_rope.py import (unsqueeze from fairseq2.ops)
- Update OLMO3 configs: vocab_size=100278, num_layers=64 for 32B
- Update test_olmo2.py to use local model paths
- Add test_olmo2_all.py and test_olmo3_all.py for comprehensive testing
Fix all E501 line length violations (>88 chars) in the OLMO module:
- Reformat long docstrings and comments across all files
- Split long dictionary mappings in interop.py for better readability
- Wrap long error messages and function arguments
- All tests still passing after formatting changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement OLMO2/3 models support in Fairseq2

1 participant