Export sampler for Seq2Seq models #207

larryliu0820 · 2026-01-27T00:20:45Z

For cuda backend we want to keep most if not all calculation on device. Assuming most of the ASR applications are doing a greedy argmax sampling, we are exporting and lowering sampling into a method of ExecuTorch model. This way the runner can choose to run it.

With this PR: huggingface/optimum-executorch#207 we are adding a new method "sampler" to ASR models, alongside with "encoder" and "text_decoder". The flow becomes: if temperature is 0 and sampler method is available, run that method. Otherwise still go with the old path. This change should largely improve the performance on CUDA since we don't have to copy logits from device to CPU for sampling purpose. Benchmark result:

With this PR: huggingface/optimum-executorch#207 we are adding a new method "sampler" to ASR models, alongside with "encoder" and "text_decoder". The flow becomes: if temperature is 0 and sampler method is available, run that method. Otherwise still go with the old path. This change should largely improve the performance on CUDA since we don't have to copy logits from device to CPU for sampling purpose. Benchmark result on RTX 5080: ``` ====================================================================== BENCHMARK SUMMARY ====================================================================== Total runs: 30 Generated tokens per run: 104 THROUGHPUT (tokens/sec): Min: 793.89 t/s Max: 845.53 t/s Mean: 820.35 t/s Stdev: 11.86 t/s MODEL LOAD TIME (ms): Min: 620 ms Max: 2170 ms Mean: 700 ms Stdev: 279 ms ENCODE TIME (ms, inference_start to prompt_eval_end): Min: 36 ms Max: 38 ms Mean: 37 ms Stdev: 1 ms DECODE TIME (ms, prompt_eval_end to inference_end): Min: 123 ms Max: 131 ms Mean: 127 ms Stdev: 2 ms ====================================================================== ```

larryliu0820 requested a review from mergennachin January 27, 2026 00:21

mergennachin approved these changes Jan 27, 2026

View reviewed changes

larryliu0820 merged commit f8aa919 into main Jan 27, 2026
69 of 83 checks passed

larryliu0820 deleted the export_sampler branch January 27, 2026 00:25

larryliu0820 mentioned this pull request Jan 27, 2026

Run sampler as a method if available pytorch/executorch#16888

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export sampler for Seq2Seq models #207

Export sampler for Seq2Seq models #207

Uh oh!

larryliu0820 commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Export sampler for Seq2Seq models #207

Export sampler for Seq2Seq models #207

Uh oh!

Conversation

larryliu0820 commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants