Skip to content

Add VRAM efficient Gumbel sampling#124

Open
sean-lamont wants to merge 4 commits intoML-GSAI:mainfrom
sean-lamont:main
Open

Add VRAM efficient Gumbel sampling#124
sean-lamont wants to merge 4 commits intoML-GSAI:mainfrom
sean-lamont:main

Conversation

@sean-lamont
Copy link

@sean-lamont sean-lamont commented Feb 6, 2026

The original generate.py file set the whole (b, s, v) logit tensor to 64 bit for adding gumbel noise. This PR adds a version which does it sequentially over the batch, saving up to a factor of batch_size VRAM. The time overhead is minimal. The file generate_optimized.py contains the new script, with test_efficient_gumbel.py testing the approach for validity and confirming the significant VRAM savings. Output from the test file is also included in gumbel_test_output.txt, showing the significant VRAM savings and minimal time overhead. It also identifies and fixes a possible bug when setting the eot/eos confidences to -inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant