Add Apple Silicon (MPS) inference support [claude]#358
Open
mparrett wants to merge 4 commits intoyl4579:mainfrom
Open
Add Apple Silicon (MPS) inference support [claude]#358mparrett wants to merge 4 commits intoyl4579:mainfrom
mparrett wants to merge 4 commits intoyl4579:mainfrom
Conversation
Plumb use_fp16 parameter through Decoder → Generator → SourceModuleHnNSF to support fp16 inference on MPS. Cast sine_wavs to the configured dtype before the linear layer to prevent dtype mismatch when the decoder runs in half precision. Also remove unused f0_buf allocation in SineGen.forward. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clean script for running StyleTTS 2 inference on Apple Silicon. TextEncoder stays on CPU (MPS lacks pack_padded_sequence support), all other modules run on MPS. Supports optional fp16 decoder via USE_FP16 env var. Features: - -p/--prompt flag for one-shot synthesis (useful for benchmarking) - -r/--reference flag to specify reference audio - Interactive text input loop when no prompt given - RTF (real-time factor) timing on each synthesis Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename ref_texts to texts in the Style Transfer section to be consistent with the variable name used in every other section of the notebook. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mirrors the existing requirements.txt with minimum version pins for torch, torchaudio, and transformers. Adds phonemizer and scipy which were missing from requirements.txt but needed at import time. Enables reproducible setup via: uv sync Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
use_fp16throughDecoder → Generator → SourceModuleHnNSFto support fp16 inference on MPS (fixes dtype mismatch when decoder runs in half precision)Demo/inference_mps.py— clean MPS inference script with-pflag for one-shot benchmarkingref_texts→textsvariable name bug in LibriTTS demo notebookpyproject.toml+ lockfile for reproducibleuv syncsetupMPS notes
TextEncoder must stay on CPU because MPS doesn't support
pack_padded_sequence. All other modules run on MPS. After text encoding on CPU, tensors are transferred to the GPU device.Scope is inference only — the training path has hardcoded
.to('cuda')in the Decoder forward method which is a separate fix.Usage
Benchmarks (Apple Macbook Air M2, ~60 word passage → 17.7s audio)
RTF = real-time factor (lower is better). MPS+FP16 sustains ~5x real-time synthesis.
Test plan
USE_MPS=1USE_MPS=1 USE_FP16=1use_fp16=Falseleaves all paths unchanged)🤖 Generated with Claude Code