Transform YouTube videos into language learning materials by extracting transcripts and adapting them to different proficiency levels
- 📝 Automatic Transcript Extraction - Fetch transcripts from YouTube videos
- 🌏 Multi-language Support - Specialized for Cantonese (粵語) transcripts
- 🧠 AI-Powered Processing - Transform content to match different language proficiency levels
- ⚡ Smart Text Chunking - Intelligently split content based on token limits
- 📊 Token Counting - Precise token management using tiktoken
- 💾 File Output - Save processed results to text files
- 🎧 Podcast Integration - Compatible with ElevenReader to convert YouTube videos into English podcasts
from main import process_youtube
# Process a YouTube video
video_url = "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
results = process_youtube(video_url, level="b1", max_tokens=4000)
# Results are automatically saved to text filespython main.py- 🔗 URL Parsing - Extracts video ID from YouTube URLs
- 📜 Transcript Retrieval - Fetches Cantonese transcripts using YouTube Transcript API
- ✂️ Smart Chunking - Splits text into manageable chunks while preserving sentence integrity
- 🤖 AI Processing - Sends chunks to AI model for language level adaptation
- 💾 File Export - Saves processed content to organized text files
a1- Beginnera2- Elementaryb1- Intermediateb2- Upper Intermediatec1- Advancedc2- Proficient
Default: 4000 tokens per chunk
- Adjustable based on your AI model's context window
- Automatically handles sentences that exceed token limits
youtube-transcript-processor/
├── main.py # Main processing script
├── robot.py # AI model interface
├── text/ # Output directory for processed files
├── requirements.txt # Python dependencies
└── README.md # This file
Parameters:
link(str): YouTube video URLlevel(str): Target language proficiency levelmax_tokens(int): Maximum tokens per chunkis_chinese(bool): Enable Chinese text processing
Returns:
- List of processed text chunks
Parameters:
video_url(str): YouTube video URL
Returns:
- Full transcript text or None if error
- Language Learning - Adapt YouTube content to your proficiency level
- Content Creation - Generate educational materials from videos
- Research - Process video content for analysis
- Accessibility - Create readable transcripts from video content
- 🎧 Podcast Creation - Use with ElevenReader to transform YouTube videos into English podcasts for on-the-go learning
Input: Complex Cantonese YouTube video
Output: Simplified text adapted to B1 level with proper sentence structure and vocabulary
Original: 今日我哋要講嘅係一個好複雜嘅概念...
Processed: Today what we're going to talk about is a very complex concept...
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- YouTube Transcript API for transcript extraction
- tiktoken for accurate token counting
- Gemini for AI processing capabilities
If you encounter any issues or have questions:
- Open an issue on GitHub
- Check the Wiki for detailed documentation
- Join our Discussions for community support
Transform your processed transcripts into engaging audio content:
- Process YouTube Video - Extract and adapt transcript using this tool
- Export Text File - Save the processed content to a text file
- Upload to ElevenReader - Visit ElevenReader.io and upload your text file
- Generate Podcast - Convert your adapted transcript into an English podcast
- Listen & Learn - Enjoy your personalized audio content on any device
Perfect Workflow:
YouTube Video → Transcript Extraction → AI Processing → Text File → ElevenReader → English Podcast
This integration allows you to:
- Turn any YouTube video into an English learning transcript
- Create audio content at your desired proficiency level (use with ElevenReader)
- Learn through multiple modalities (reading + listening)
Made with ❤️ for language learners worldwide