This repository contains my submission for the Voice AI Startup Assignment.
The project is built in Google Colab and analyzes sales call recordings to extract useful insights.
- ✅ Talk-time ratio (percentage each person spoke)
- ✅ Number of questions asked
- ✅ Longest monologue duration
- ✅ Call sentiment (positive / negative / neutral)
- ✅ One actionable insight for improvement
- 🎯 Bonus: Speaker diarization (identify Sales Rep vs Customer)
- Python
- Google Colab
- OpenAI Whisper – Speech-to-text
- HuggingFace Transformers – Sentiment analysis
- Pyannote / WhisperX – Speaker diarization
- yt-dlp – Extract audio from YouTube
My approach uses speech-to-text + text analysis.
I first extract the call audio and transcribe it using Whisper, which handles poor-quality audio.
Using timestamps, I calculate talk-time ratio and longest monologue. Questions are counted by detecting ? and interrogatives. Sentiment is identified with HuggingFace transformers. Finally, I generate an actionable insight to improve sales interactions.
For the bonus task, I used speaker diarization with Pyannote/WhisperX to differentiate between the sales rep and the customer.
The system runs under 30 seconds on the free Colab tier.
📦 Call_Quality_Analyzer ┣ 📜 Call_Quality_Analyzer.ipynb # Main Colab notebook ┣ 📜 README.md # Project documentation
- Open the notebook in Google Colab
- Run all cells in order (install → import → download audio → transcription → analysis)
- Results will be printed at the end
- Assignment test file: YouTube Call Recording
Vimal Anand