Add Apple Silicon (MPS) inference support [claude] by mparrett · Pull Request #358 · yl4579/StyleTTS2

mparrett · 2026-02-26T06:09:21Z

Summary

Plumb use_fp16 through Decoder → Generator → SourceModuleHnNSF to support fp16 inference on MPS (fixes dtype mismatch when decoder runs in half precision)
Add Demo/inference_mps.py — clean MPS inference script with -p flag for one-shot benchmarking
Fix ref_texts → texts variable name bug in LibriTTS demo notebook
Add pyproject.toml + lockfile for reproducible uv sync setup

MPS notes

TextEncoder must stay on CPU because MPS doesn't support pack_padded_sequence. All other modules run on MPS. After text encoding on CPU, tensors are transferred to the GPU device.

Scope is inference only — the training path has hardcoded .to('cuda') in the Decoder forward method which is a separate fix.

Usage

uv sync
USE_MPS=1 uv run python Demo/inference_mps.py -p "Hello world"

# With fp16 decoder (faster on some hardware)
USE_MPS=1 USE_FP16=1 uv run python Demo/inference_mps.py -p "Hello world"

Benchmarks (Apple Macbook Air M2, ~60 word passage → 17.7s audio)

Config	Inference	RTF	vs CPU
CPU	6.1s	0.36	—
MPS	4.4s	0.25	1.4x faster
MPS + FP16	2.9s	0.16	2.1x faster

RTF = real-time factor (lower is better). MPS+FP16 sustains ~5x real-time synthesis.

If the supply of fruit is greater than the family needs, it may be made a source of income by sending the fresh fruit to the market if there is one
near enough, or by preserving, canning, and making jelly for sale. To make such an enterprise a success the fruit and work must be first class.

Test plan

Synthesized audio on Apple Silicon M1 with USE_MPS=1
Verify fp16 path with USE_MPS=1 USE_FP16=1
Verify CUDA path still works (no regressions — default use_fp16=False leaves all paths unchanged)
Verify istftnet decoder path is unaffected (no changes to that module)

🤖 Generated with Claude Code

Plumb use_fp16 parameter through Decoder → Generator → SourceModuleHnNSF to support fp16 inference on MPS. Cast sine_wavs to the configured dtype before the linear layer to prevent dtype mismatch when the decoder runs in half precision. Also remove unused f0_buf allocation in SineGen.forward. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clean script for running StyleTTS 2 inference on Apple Silicon. TextEncoder stays on CPU (MPS lacks pack_padded_sequence support), all other modules run on MPS. Supports optional fp16 decoder via USE_FP16 env var. Features: - -p/--prompt flag for one-shot synthesis (useful for benchmarking) - -r/--reference flag to specify reference audio - Interactive text input loop when no prompt given - RTF (real-time factor) timing on each synthesis Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename ref_texts to texts in the Style Transfer section to be consistent with the variable name used in every other section of the notebook. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Mirrors the existing requirements.txt with minimum version pins for torch, torchaudio, and transformers. Adds phonemizer and scipy which were missing from requirements.txt but needed at import time. Enables reproducible setup via: uv sync Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mparrett and others added 4 commits February 25, 2026 22:07

Fix ref_texts variable name in LibriTTS demo notebook

88cd908

Rename ref_texts to texts in the Style Transfer section to be consistent with the variable name used in every other section of the notebook. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mparrett changed the title ~~Add Apple Silicon (MPS) inference support~~ Add Apple Silicon (MPS) inference support [claude] Feb 26, 2026

mparrett mentioned this pull request Feb 26, 2026

Mac (Metal) support? #114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Apple Silicon (MPS) inference support [claude]#358

Add Apple Silicon (MPS) inference support [claude]#358
mparrett wants to merge 4 commits intoyl4579:mainfrom
mparrett:mps-support

mparrett commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mparrett commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

MPS notes

Usage

Benchmarks (Apple Macbook Air M2, ~60 word passage → 17.7s audio)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mparrett commented Feb 26, 2026 •

edited

Loading