Add Modernbert sequence packing #57

galopyz · 2025-10-13T23:19:45Z

This PR adds sequence unpadding and repadding with flash attention 2 varlen for the attention and flash attention rope. Inside of the model, attention_mask is created and sequences are unpadded before passing to decoder layers. After all the decoder layers and final norm, the unpadded sequences are padded again before passing to lm_head. We repad for the loss function calculation. In the future, we can get rid of repadding and do loss calculation ourselves as _unpad_modernbert_input also returns unpadded labels.

Results for sdpa (green) vs. FA2 (red) on wandb: https://wandb.ai/local-research-group/smollm2-135m-training-avelina/workspace?nw=nwusergalopyz. Grey one uses FA2 with batch size of 8 instead of 4. Because there are no padding tokens, I could fit 8bs inside of RTX 3090. (24GB RAM)

This PR also adds ModernBert sequence packing. With sequence packing, we can pack sequences together into a packed sequence of size [int(batch_size / micro_batch_size), int(micro_batch_size * max_seq_len)]. This ensures we are feeding fixed number of tokens every microbatch, unlike just unpadding approach where microbatches can have varying number of tokens depending on the length of the sequences.

There are also some minor changes:

Also used variables for micro batch size and batch size in yaml.
To use sequence packing, numba is added.
Disabled aim logger.
Added full fine tuning option to custom_model_training.py.

vishalbakshi

@galopyz and I have both run multiple successful training runs (locally and on Modal) using this branch (modernbert_sequence_packing).

galopyz added 20 commits August 23, 2025 18:08

Add FA rotary and unpadding

4a2f96f

Base model works with sdpa/FA2 without peft

7177fcc

Fix linter error

1bf9d89

Remove extra rotary_emb argument and turn off varlen deterministic

2d3f09b

Fix wandb prompt logging problem.

2d925b4

Add step_sz to text generation callback

cfed198

Update yaml

7fb47d0

Move callbacks to llmfoundry/callbacks

b74822e

Fix lint errors

c8a4ac4

Sequence packing works with token levels

77ed9e6

Make batch_inspection callback work with evaluations.

395f454

Add PackingEfficiency callback

bb2bbca

Add license

616cae0

Update yaml scripts with sequence packing

91575f8

Add variable

8e640d3

Add numba

578a523

Disable aim logger

cae0525

Add variable

a4da87f

Add Full finetuning support

ac3573c

Fix lint error

b79882d

vishalbakshi approved these changes Oct 14, 2025

View reviewed changes

vishalbakshi merged commit 8d45a5c into main Oct 14, 2025
1 check passed

vishalbakshi deleted the modernbert_sequence_packing branch October 14, 2025 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Modernbert sequence packing #57

Add Modernbert sequence packing #57

Uh oh!

galopyz commented Oct 13, 2025

Uh oh!

vishalbakshi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Modernbert sequence packing #57

Add Modernbert sequence packing #57

Uh oh!

Conversation

galopyz commented Oct 13, 2025

Uh oh!

vishalbakshi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants