Add OLMo2/3 models support in fairseq2 by YunchaoYang · Pull Request #1410 · facebookresearch/fairseq2

YunchaoYang · 2025-11-05T17:04:11Z

What does this PR do? Please describe:

Add OLMo2 model architecture (1B, 7B, 13B) support in fairseq2
The key architecture changes include:
OLMO2 is similar to the LLaMA architecture with the following differences:
- Olmo2RMSNorm: In Olmo2 the order of operation for RMSNorm is normalize -> multiply by weight -> cast to original dtype.
- OLMO2TransformerLMDecoderLayer: OLMO2 uses Post-Norm in decoder layer: Attention/FFN -> Norm -> Add Residual, which is different than the existing Pre-Norm and Post-Norm order.
- OLMO2MultiheadAttention:
  - OLMO2 adds Q/K Norm in attention layers, the Q/K has slight difference in the order of normalization and reshape: Project → Normalize → Reshape → RoPE.
  - OLMO2 MHA instead of GQA. OLMO2-32B model use GQA.
- OLMO2RotaryEmbedding: The Rope Module reuse the existing ReferenceRotaryEncoder module.
An integration test is added to ensure the output is consistent with HF Transformers. The integration test has passed for the 1B model.

Note:
OLMO2MultiheadAttention inherits from StandardMultiheadAttention (marked @final)
because the only difference is the order of normalization in _project_q() and _project_kv().
Reimplementing the entire class would duplicate ~150 lines of boilerplate code. Right now, the type checker warning is suppressed.

Fixes #1402

Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

…to ensure outputs align with HF Transformer

- Rename olmo2.yaml to olmo.yaml and update model_family to 'olmo' - Fix imports in composition/models.py and composition/tokenizers.py - Fix hub.py import (fairseq2.hub.model -> fairseq2.models.hub) - Add tokenizer exports and OLMO3 aliases to __init__.py - Fix OLMORMSNorm with proper __init__ method in normalization.py - Implement per-layer RoPE support in factory.py for OLMO3 sliding window - Fix yarn_rope.py init order (params before super().__init__) - Fix yarn_rope.py import (unsqueeze from fairseq2.ops) - Update OLMO3 configs: vocab_size=100278, num_layers=64 for 32B - Update test_olmo2.py to use local model paths - Add test_olmo2_all.py and test_olmo3_all.py for comprehensive testing

Fix all E501 line length violations (>88 chars) in the OLMO module: - Reformat long docstrings and comments across all files - Split long dictionary mappings in interop.py for better readability - Wrap long error messages and function arguments - All tests still passing after formatting changes

YunchaoYang requested review from MartinGleize, cbalioglu, cirquit and zyaoj as code owners November 5, 2025 17:04

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2025

YunchaoYang changed the title ~~Add OLMo2 models (1B, 7B, 13B) support to fairseq2~~ Add OLMo2/3 models support in fairseq2 Jan 24, 2026

yunchaoyang1 user and others added 24 commits February 23, 2026 15:00

add __init__.py file for the olmo model

5b2860a

implement fs2 olmo2

67efe34

Add the olmo2-0425-1B model to fs2, with 3 integrations tests passed …

940a260

…to ensure outputs align with HF Transformer

add 13b model

20a5978

add olmo2 model to the hub

1a4baf4

add test to olmo2 model output consistency

e0630d7

fix the model name for 7b and 13b

d038c82

use the ReferenceRotaryEncoder

f0b5b72

rename arch and add olmo2 tokenizer family

b4822cd

update tests on using olmo2 own tokenizer

57dee4c

Tests passed! OLMO2 HF and FS2 produce same outputs

3b87164

refactor olmo2 attention and normalization; fix linter type check issues

1704d8e

fix format

749ef9f

run test_olmo2 on cpu

9237d47

fix format issues in test_olmo2.py

5593d6f

fix lint issues

feb36d2

first commit to add olmo3 models

95f26f1

refine yarn rope function

7b9e703

fix composition imports: olmo2 -> olmo

04c13c0

reorganize tests

b4c3bd6

fix lint error

40f6701

remove unnecessary files

08e6591

fix CI tests

7c22617

YunchaoYang force-pushed the yy/add-olmo2-model branch from 4975b9a to 1d050dd Compare February 23, 2026 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OLMo2/3 models support in fairseq2#1410

Add OLMo2/3 models support in fairseq2#1410
YunchaoYang wants to merge 25 commits intomainfrom
yy/add-olmo2-model

YunchaoYang commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YunchaoYang commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YunchaoYang commented Nov 5, 2025 •

edited

Loading