Skip to content

KalkiDh/YtChatBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ YouTube Transcript Chatbot A Python-based chatbot that fetches transcripts from YouTube videos, embeds them using Hugging Face models, stores them in ChromaDB, and allows users to ask context-aware questions about the content. Powered by LangChain, transformers, and Azure AI for LLM-based answering.

πŸš€ Features πŸŽ₯ Transcript Extraction: Automatically extracts English transcripts from YouTube videos.

🧠 Embedding with Transformers: Converts transcript chunks into vector embeddings using sentence-transformers/all-MiniLM-L6-v2.

🧬 ChromaDB Integration: Stores transcript embeddings in an in-memory ChromaDB for fast similarity search.

πŸ—£οΈ Conversational Q&A: Ask natural language questions about the video content.

🌐 FastAPI Backend: Offers RESTful endpoints for frontend integration or other services.

☁️ Azure OpenAI Integration: Uses Azure-hosted GPT-4o model to answer questions based on transcript context.

πŸ–ΌοΈ Sample Interface

πŸ› οΈ Project Structure bash Copy Edit πŸ“ your-project/ β”œβ”€β”€ chatbot_transcript.py # Transcript processing and embedding chain β”œβ”€β”€ chatbot_query.py # Querying and conversation logic β”œβ”€β”€ api_server.py # FastAPI app to expose endpoints β”œβ”€β”€ .env # Stores Hugging Face and GitHub credentials └── requirements.txt # All Python dependencies 🧰 Requirements Python 3.8+

Hugging Face Transformers

LangChain

ChromaDB

YouTube Transcript API

Azure AI SDK

FastAPI

πŸ” Environment Variables Create a .env file in the root directory with the following:

ini Copy Edit HUGGINGFACE_API_KEY=your_huggingface_api_key GITHUB_TOKEN=your_github_access_token_for_azure_models πŸ“¦ Installation bash Copy Edit

Clone the repo

git clone https://github.com/your-username/yt-transcript-chatbot.git cd yt-transcript-chatbot

Set up virtual environment

python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows

Install dependencies

pip install -r requirements.txt πŸ§ͺ Running the CLI Version Run the terminal chatbot interface:

bash Copy Edit python chatbot_query.py Paste a YouTube URL when prompted.

Then ask questions like:

"What is the video about?"

"Summarize the first section."

"What are the main points discussed?"

Type exit to end the session.

🌐 Running the API Server To launch the FastAPI backend:

bash Copy Edit uvicorn api_server:app --reload Endpoints POST /upload-video-url Uploads and processes a YouTube video.

Request Body:

json Copy Edit { "url": "https://www.youtube.com/watch?v=abc123xyz" } POST /query Ask a question based on the uploaded transcript.

Request Body:

json Copy Edit { "query": "What does the speaker say about machine learning?" } GET /response Returns the latest model response.

🧠 How It Works Extract Video ID – From a full YouTube URL.

Fetch Transcript – Using youtube_transcript_api.

Split Transcript – Into overlapping chunks using a custom recursive splitter.

Embed Text – Convert text chunks into embeddings using Hugging Face Transformers.

Store in ChromaDB – Enables fast vector search for later queries.

Query Handling – User queries are embedded and matched to similar chunks.

LLM Answering – GPT-4o (via Azure) responds using the retrieved transcript chunks.

πŸ“ˆ Use Cases Educational video summarization

Customer service video analysis

Podcast and lecture question answering

Content review for accessibility

πŸ“‹ Future Improvements Persistent storage (ChromaDB with file-backed DB)

Multi-language transcript support

Frontend UI integration (React/Next.js)

Upload support for custom audio/video files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published