Add VRAM efficient Gumbel sampling by sean-lamont · Pull Request #124 · ML-GSAI/LLaDA

sean-lamont · 2026-02-06T05:09:24Z

The original generate.py file set the whole (b, s, v) logit tensor to 64 bit for adding gumbel noise. This PR adds a version which does it sequentially over the batch, saving up to a factor of batch_size VRAM. The time overhead is minimal. The file generate_optimized.py contains the new script, with test_efficient_gumbel.py testing the approach for validity and confirming the significant VRAM savings. Output from the test file is also included in gumbel_test_output.txt, showing the significant VRAM savings and minimal time overhead. It also identifies and fixes a possible bug when setting the eot/eos confidences to -inf.

… correctness and VRAM savings

sean-lamont added 4 commits February 6, 2026 15:10

Add optimized gumbel sampling, test script and output test confirming…

1c79a20

… correctness and VRAM savings

Add timing comparison in tests.

6063e08

Add direct logit comparison in CPU testing.

e9476d0

Add direct logit comparison in CPU testing.

43b2e92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VRAM efficient Gumbel sampling#124

Add VRAM efficient Gumbel sampling#124
sean-lamont wants to merge 4 commits intoML-GSAI:mainfrom
sean-lamont:main

sean-lamont commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sean-lamont commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sean-lamont commented Feb 6, 2026 •

edited

Loading