BLT is a toolkit for lyrics and singing voice. The toolkit contains three modular components that can be used independently or combined through pre-defined pipelines.
demo.mp4
from blt.translators import SoramimiTranslationAgent
# Soramimi translation (phonetic matching)
agent = SoramimiTranslationAgent()
result = agent.translate(["Your lyrics here"])
print(result.soramimi_lines) # Phonetically matched translationIPA-based lyrics translation tools with music constraints:
| Tool | Description |
|---|---|
LyricsTranslationAgent |
Main translator with syllable/rhyme preservation |
SoramimiTranslationAgent |
そらみみ (空耳) translator - creates text that sounds like the original |
Music Constraints Extracted:
-
syllable_counts:
list[int](ex. [4, 3])- Chinese: Character-based
- Other languages: IPA vowel nuclei
-
syllable_patterns:
list[list[int]](ex. [[1, 1, 2], [1, 2]])- With audio (WIP): Alignment problem - timing sync with vocals
- Without audio: Word segmentation problem
- Chinese: HanLP tokenizer
- English: Space splitting
- Other languages: LLM-based
-
rhyme_scheme:
str(ex. AB)- Chinese: Pinyin finals
- Other languages: IPA phonemes
-
ipa_similarity:
float(ex. 0.5)- Phonetic similarity threshold for soramimi translation
- Measured using IPA phoneme matching between source and target
Translation Flow
flowchart TD
A[Source Lyrics] --> B[LyricsAnalyzer]
B --> |Extract Constraints| C{TranslationAgent}
C --> |Generate Translation| D[Validator]
D --> |Check Constraints| E{Valid or Max Retries}
E --> |No| C
E --> |Yes| F[Target Lyrics]
style B fill:#64b5f6,stroke:#1976d2,stroke-width:2px,color:#fff
style C fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff
style D fill:#42a5f5,stroke:#1976d2,stroke-width:2px,color:#fff
| Tool | Description |
|---|---|
VocalSeparator |
Vocal / instrumental separation |
VoiceConverter |
Voice conversion (RVC) |
LyricsAligner |
Timing alignment |
AudioMixer |
Audio mixing with automatic resampling |
VideoGenerator |
Video generation (KTV, Lip-Synced) |
| Pipeline | Description |
|---|---|
RVCKTVPipeline |
RVC voice conversion + KTV video with subtitles |
- Python 3.11+
- espeak-ng (IPA analysis)
- Ollama + Qwen3:
ollama pull qwen3:30b-a3b-instruct-2507-q4_K_M - (Optional) LangSmith API key for tracing/monitoring
- (Optional) RVC_ZERO for voice conversion
uv venv --python 3.11
source .venv/bin/activate
uv syncDownload and place these model files in assets/:
-
Wav2Lip model (for lip-sync):
assets/wav2lip_gan.pth -
RVC model (for voice conversion):
assets/model.pthandassets/model.index- Download: https://huggingface.co/spaces/r3gm/rvc_zero or train your own
Built with: LangGraph, LangChain, Ollama, PyTorch, Demucs, HanLP, Phonemizer, Panphon, RVC, Wav2Lip, Whisper, Qwen3
This project is intended for research and educational purposes only. All demo content is used for demonstration purposes. If you believe any content infringes on your rights, please contact us and we will remove it promptly.