Unpadding sequences #56

galopyz · 2025-08-25T22:40:39Z

This PR adds sequence unpadding and repadding with flash attention 2 varlen for the attention and flash attention rope. Inside of the model, attention_mask is created and sequences are unpadded before passing to decoder layers. After all the decoder layers and final norm, the unpadded sequences are padded again before passing to lm_head. We repad for the loss function calculation. In the future, we can get rid of repadding and do loss calculation ourselves as _unpad_modernbert_input also returns unpadded labels.

Results for sdpa (green) vs. FA2 (red) on wandb: https://wandb.ai/local-research-group/smollm2-135m-training-avelina/workspace?nw=nwusergalopyz. Grey one uses FA2 with batch size of 8 instead of 4. Because there are no padding tokens, I could fit 8bs inside of RTX 3090. (24GB RAM)

galopyz · 2025-08-26T00:50:01Z

To switch between flash attention and sdpa, we need to change

_attn_implementation = "flash_attention_2",

inside of LlamaConfig to either "flash_attention_2" or "sdpa".

galopyz added 2 commits August 23, 2025 18:08

Add FA rotary and unpadding

4a2f96f

Base model works with sdpa/FA2 without peft

7177fcc

galopyz marked this pull request as draft August 25, 2025 22:41

galopyz added 2 commits August 25, 2025 18:10

Fix linter error

1bf9d89

Remove extra rotary_emb argument and turn off varlen deterministic

2d3f09b

galopyz marked this pull request as ready for review August 26, 2025 00:21

galopyz changed the title ~~Sequence packing~~ Unpadding Aug 26, 2025

galopyz changed the title ~~Unpadding~~ Unpadding sequences Aug 26, 2025

galopyz added 5 commits October 2, 2025 17:17

Fix wandb prompt logging problem.

2d925b4

Add step_sz to text generation callback

cfed198

Update yaml

7fb47d0

Move callbacks to llmfoundry/callbacks

b74822e

Fix lint errors

c8a4ac4

galopyz closed this Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unpadding sequences #56

Unpadding sequences #56

Uh oh!

galopyz commented Aug 25, 2025

Uh oh!

galopyz commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Unpadding sequences #56

Unpadding sequences #56

Uh oh!

Conversation

galopyz commented Aug 25, 2025

Uh oh!

galopyz commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant