A Python tool for speech segment alignment using Dynamic Time Warping (DTW). Assumes input audio has been pre‑processed by VAD (voice activity detection).
- Subsequence DTW alignment of two speech segments (query vs. reference)
- Supports MFCC and/or fundamental‑frequency (F0) features
- Outputs clipped reference audio and optional diagnostic plots
python main.py \
--query_path path/to/query_audio \
--reference_path path/to/reference_audio \
[--feat_types mfcc f0] \
[--save_plot]query_path: path to the (pre‑VAD) query audio.reference_path: path to the (pre‑VAD) reference audio.feat_types: which features to use: mfcc, f0 (default: mfcc).save_plot: save DTW & spectrogram plots.
Query (TTS): query_chinese.wav
Reference (Human): reference_chinese.wav
Clipped segment (DTW): clip_chinese.wav
Query (TTS): query_english.wav
Reference (Human): reference_english.wav
Clipped segment (DTW): clip_english.wav
Query (TTS): query_taiwanese.wav
Reference (Human): reference_taiwanese.mp3
Clipped segment (DTW): clip_taiwanese.wav

















