I am an AI researcher focused on pushing the boundaries of generative audio and speech synthesis. My work centers on building high-performance, efficient architectures for TTS and neural audio restoration.
I specialize in developing novel architectures for speech and rapid inference. Some of my key work includes:
- LavaSR: A novel architecture for Bandwidth Extension (BWE) and speech restoration. It is designed to be the fastest and most flexible model in its class. (Submitted to Interspeech 2026).
- LuxTTS: A high-quality, rapid voice cloning model reaching speeds of 150x realtime through advanced distillation techniques.
- NovaSR: A lightning-fast audio upsampler utilizing a novel architecture for high-fidelity BWE.
- MiraTTS: Emotionally fine-tuned Spark-TTS integrated with a custom-built upsampler for expressive, high-resolution speech.
- LinaCodec: A highly compressive neural audio codec(compressing 60x more then previous codecs) optimized for speech models.
I am actively looking to collaborate on writing and publishing research papers in the deep learning and audio DSP space. If you're working on novel speech architectures or efficient transformer scaling, would be happy to connect.
