Skip to content

A Python tool for speech segment alignment using Dynamic Time Warping (DTW).

License

Notifications You must be signed in to change notification settings

SXKA/dtw-speech-aligner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dtw-speech-aligner

A Python tool for speech segment alignment using Dynamic Time Warping (DTW). Assumes input audio has been pre‑processed by VAD (voice activity detection).

Features

  • Subsequence DTW alignment of two speech segments (query vs. reference)
  • Supports MFCC and/or fundamental‑frequency (F0) features
  • Outputs clipped reference audio and optional diagnostic plots

Usage

python main.py \
  --query_path   path/to/query_audio \
  --reference_path path/to/reference_audio \
  [--feat_types mfcc f0] \
  [--save_plot]
  • query_path: path to the (pre‑VAD) query audio.
  • reference_path: path to the (pre‑VAD) reference audio.
  • feat_types: which features to use: mfcc, f0 (default: mfcc).
  • save_plot: save DTW & spectrogram plots.

Examples

Chinese

Audio

Query (TTS): query_chinese.wav

Reference (Human): reference_chinese.wav

Clipped segment (DTW): clip_chinese.wav

Visualization

Alignment
alignment_chinese
Mel-spectrogram & f0
f0_mel_spec_query_chinese f0_mel_spec_reference_chinese
MFCC DTW paths
mfcc_dtw_chinese mfcc_delta_dtw_chinese mfcc_delta_delta_dtw_chinese

English

Audio

Query (TTS): query_english.wav

Reference (Human): reference_english.wav

Clipped segment (DTW): clip_english.wav

Visualization

Alignment
alignment_english
Mel-spectrogram & f0
f0_mel_spec_query_english f0_mel_spec_reference_english
MFCC DTW paths
mfcc_dtw_english mfcc_delta_dtw_english mfcc_delta_delta_dtw_english

Taiwanese

Audio

Query (TTS): query_taiwanese.wav

Reference (Human): reference_taiwanese.mp3

Clipped segment (DTW): clip_taiwanese.wav

Visualization

Alignment
alignment_taiwanese
Mel-spectrogram & f0
f0_mel_spec_query_taiwanese f0_mel_spec_reference_taiwanese
MFCC DTW paths
mfcc_dtw_taiwanese mfcc_delta_dtw_taiwanese mfcc_delta_delta_dtw_taiwanese

About

A Python tool for speech segment alignment using Dynamic Time Warping (DTW).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages