ViNote = Video + Note
ViNote AI · Turn Every Video into Your Knowledge Asset
ViNoter · Super Video Agent
Video to Everything: Notes, Q&A, Articles, Subtitles, Cards, Mind Maps - All in One
English | 中文文档
- Conversational Operation: Complete all video processing tasks through natural language dialogue
- Intelligent Intent Understanding: Automatically recognize user needs without manual function switching
- Cross-Platform Search: Support for Bilibili, YouTube and other multi-platform video search
- Process Automation: Search→Transcribe→Notes→Translate, seamlessly integrated
- Based on ANP Protocol: Leading open-source decentralized Agent collaboration standard
- Multi-Platform Support: YouTube, Bilibili, and other major video platforms
- Local Video Support: Support for local video file path input (MP4, AVI, MOV, MKV, etc.)
- High-Quality Transcription: Local audio transcription based on Faster-Whisper
- Smart Optimization: AI-driven text optimization and formatting
- Multi-Language Support: Automatic language detection and translation
- Structured Output: Automatically generate outlines, key points, and summaries
- Markdown Format: Perfect compatibility with all note-taking apps
- Real-Time Progress: SSE real-time progress updates
- Intelligent Q&A: AI Q&A system based on video content
- Context Understanding: Deep comprehension of video content
- Streaming Output: Real-time responses for better user experience
- Multi-Format Support: Support for various video formats and resolutions
- Preview Feature: Preview video information before downloading
- Progress Tracking: Real-time download progress display
- Docker 20.10+
- Docker Compose 2.0+
- Clone the Project
git clone https://github.com/zrt-ai-lab/ViNote.git
cd ViNote- Configure Environment Variables and Cookies
# Copy environment configuration file
cp .env.example .env
# Edit .env file and add your OpenAI API Key
# OPENAI_API_KEY=your-api-key-here
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_MODEL=gpt-4o
# Copy cookies configuration (optional, required for Bilibili)
cp cookies.txt.example bilibili_cookies.txt
# Edit bilibili_cookies.txt if you need to download Bilibili videos
# See "🍪 Cookies Configuration" section below for details- Start Services
# Build and start
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down- Access Application Open your browser and visit: http://localhost:8999
- Python 3.10+
- FFmpeg (for audio/video processing)
- uv package manager
- Clone the Project
git clone https://github.com/zrt-ai-lab/ViNote.git
cd ViNote- Install uv Package Manager
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"- Install FFmpeg
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get update && sudo apt-get install ffmpeg
# Windows
# Download and install from https://ffmpeg.org/download.html- Install Dependencies
# Install dependencies using uv (will automatically create .venv virtual environment)
uv pip install -e .
# Or use uv sync (recommended)
uv sync- Configure Environment Variables and Cookies
# Copy environment configuration file
cp .env.example .env
# Edit .env file with your configuration
# Copy cookies configuration (optional, required for Bilibili)
cp cookies.txt.example bilibili_cookies.txt
# Edit bilibili_cookies.txt if you need to download Bilibili videos
# See "🍪 Cookies Configuration" section below for details- Start Services
🚀 One-Click Start (Recommended)
Start all services with a single command:
# Make script executable (first time only)
chmod +x start.sh
# Start all services
./start.shThis will automatically:
- ✅ Generate DID keys (if not exists)
- ✅ Start DID Authentication Server (port 9000)
- ✅ Start Video Search Server (port 8000)
- ✅ Start ViNote Main Application (port 8999)
Manual Start (Advanced)
If you prefer to start services manually in separate terminals:
💡 Terminal 1 - Generate DID Keys (first time only)
cd backend/anp python gen_did_keys.py cd ../..
Terminal 2 - DID Authentication Server:
cd backend/anp python client_did_server.pyTerminal 3 - Video Search Server:
cd backend/anp python search_server_agent.pyTerminal 4 - ViNote Main Application:
uv run uvicorn backend.main:app --reload --port 8999
Basic Usage (Without ViNoter Super Agent)
If you only need basic features without ViNoter:
# Using uv run (recommended)
uv run uvicorn backend.main:app --reload --port 8999
# Or activate virtual environment first
source .venv/bin/activate # macOS/Linux
uvicorn backend.main:app --reload --port 8999- Access Application Open your browser and visit: http://localhost:8999
ViNoter is a super agent based on the ANP protocol that completes video search, transcription, note generation, and all other operations through natural conversation.
Before using ViNoter, you need to start the ANP server:
- Generate DID Keys (first time only)
cd backend/anp
python gen_did_keys.py- Start ANP Services (requires 3 terminals)
Terminal 1 - DID Authentication Server:
cd backend/anp
python client_did_server.pyTerminal 2 - Video Search Server:
cd backend/anp
python search_server_agent.pyTerminal 3 - ViNote Main Application:
# Return to project root directory
cd ../..
uv run uvicorn backend.main:app --reload --port 8999- Open the application homepage and select the "ViNoter Super Search" tab
- Enter your request in the dialogue box, for example:
Scenario 1: Search Videos
You: "Help me search for Python tutorials on Bilibili"
ViNoter: "Found 10 related videos for you:
1. [Black Horse Programmer] Python Zero-Based Introduction
2. [Tsinghua University] Python Data Analysis
...
Which one would you like to choose?"
Scenario 2: Video Transcription
You: "Choose the first one and transcribe it for me"
ViNoter: "Sure, processing for you:
✓ Downloading video
✓ Extracting audio
✓ Transcribing... (Progress 45%)
✓ Transcription complete!
I've saved the transcript for you. Would you like me to generate notes?"
Scenario 3: Multi-Platform Search
You: "Help me search for machine learning tutorials on both YouTube and Bilibili"
ViNoter: "Searching across platforms...
YouTube results: 5 videos
Bilibili results: 8 videos
Showing you the 10 most relevant..."
- 🗣️ Natural Conversation: Just say what you need, like chatting with a friend
- 🤖 Intelligent Understanding: Automatically understands intent, no need to manually switch functions
- 🔄 Process Integration: Search→Transcribe→Notes→Translate, seamlessly integrated
- 📊 Real-time Feedback: Streaming output with real-time progress visibility
- 🌐 Cross-Platform: Supports multiple platforms including Bilibili, YouTube, etc.
💡 Tip: ViNoter is based on ANP (Agent Network Protocol), an open-source decentralized Agent collaboration standard. For more details, see
backend/anp/README.md
- Open the application homepage and select "AI Video Notes"
- In "Online URL" mode, paste video link (supports YouTube, Bilibili, etc.)
- Click "Preview" to view video information
- Select summary language (Chinese/English/Japanese and 11 languages)
- Click "Generate Notes"
- Wait for completion (view real-time progress)
- Download generated Markdown notes
- Open the application homepage and select "AI Video Notes"
- Switch to "Local Path" mode
- Enter the absolute path of your local video file, for example:
- Mac/Linux:
/Users/zhangsan/Videos/lecture.mp4 - Windows:
C:\Users\zhangsan\Videos\lecture.mp4 - Docker:
/app/videos/lecture.mp4(requires mounted directory)
- Mac/Linux:
- Click "Preview" to verify the file
- Select summary language
- Click "Generate Notes"
- Wait for completion and download notes
💡 Supported Video Formats: MP4, AVI, MOV, MKV, MP3, WAV, etc.
- Open the application homepage and select "AI Video Q&A"
- In "Online URL" mode, paste video link (supports YouTube, Bilibili, etc.)
- Click "Preview" to view video information
- Click "Start Preprocessing" button
- Wait for AI preprocessing to complete (extract audio and transcribe)
- Enter your question in the input box
- AI will answer in real-time based on video content
- Open the application homepage and select "AI Video Q&A"
- Switch to "Local Path" mode
- Enter the absolute path of your local video file
- Click "Preview" to verify the file
- Click "Start Preprocessing" button
- Wait for AI preprocessing to complete
- Enter questions in the input box, AI answers in real-time
💡 Tip: After preprocessing is complete, you can ask any questions about the video content, and AI will provide accurate answers based on the complete video content
- Select "Video Download" tab
- Paste video link and click "Preview"
- Choose desired video quality
- Click "Start Download"
- Save file after download completes
vinote/
├── backend/ # Backend code
│ ├── anp/ # ANP Agent Protocol Demo Module 🆕
│ │ ├── search_client_agent.py # Client agent
│ │ ├── search_server_agent.py # Server agent (needs to be started before using ViNoter)
│ │ ├── client_did_server.py # DID authentication server
│ │ ├── gen_did_keys.py # DID key generation tool
│ │ ├── README.md # ANP module documentation
│ │ ├── client_did_keys/ # Client DID keys
│ │ ├── did_keys/ # Server DID keys
│ │ └── jwt_keys/ # JWT keys
│ ├── config/ # Configuration management
│ │ ├── ai_config.py # AI model configuration
│ │ └── settings.py # Application settings
│ ├── core/ # Core functionality
│ │ └── ai_client.py # AI client singleton
│ ├── models/ # Data models
│ │ └── schemas.py # Pydantic models
│ ├── services/ # Business logic layer
│ │ ├── note_generator.py # Note generation
│ │ ├── content_summarizer.py # Content summarization
│ │ ├── text_optimizer.py # Text optimization
│ │ ├── text_translator.py # Text translation
│ │ ├── audio_transcriber.py # Audio transcription
│ │ ├── video_downloader.py # Video download
│ │ ├── video_preview_service.py # Video preview
│ │ ├── video_download_service.py # Download service
│ │ ├── video_qa_service.py # Video Q&A
│ │ └── video_search_agent.py # Video search agent service 🆕
│ ├── utils/ # Utility functions
│ │ ├── file_handler.py # File handling
│ │ └── text_processor.py # Text processing
│ └── main.py # FastAPI application entry
├── static/ # Frontend static files
│ ├── index.html # Main page
│ ├── css/ # Style files
│ │ └── search-agent.css # Smart search styles 🆕
│ ├── js/ # JavaScript files
│ │ ├── app.js # Main frontend logic
│ │ ├── modules/
│ │ │ ├── searchAgent.js # Smart search module 🆕
│ │ │ ├── transcription.js # Transcription module
│ │ │ ├── videoPreview.js # Video preview
│ │ │ └── ... # Other modules
│ │ └── utils/
│ └── *.png/jpg # Image resources
├── temp/ # Temporary files directory
│ ├── downloads/ # Downloaded files
│ └── backups/ # Task backups
├── cookies.txt.example # Cookies configuration example 🆕
├── .env.example # Environment variables example
├── pyproject.toml # Project configuration (uv)
├── uv.lock # Dependency version lock 🆕
├── Dockerfile # Docker image configuration
├── docker-compose.yml # Docker compose configuration
└── README.md # Project documentation
| Variable | Description | Default | Required |
|---|---|---|---|
OPENAI_API_KEY |
OpenAI API Key | - | ✅ |
OPENAI_BASE_URL |
OpenAI API Base URL | https://api.openai.com/v1 |
✅ |
OPENAI_MODEL |
Model to use | gpt-4o |
✅ |
WHISPER_MODEL_SIZE |
Whisper model size | base |
✅ |
YOUTUBE_API_KEY |
YouTube Data API v3 Key | - | ⭐ Recommended |
APP_HOST |
Service listening address | 0.0.0.0 |
❌ |
APP_PORT |
Service port | 8001 |
❌ |
Performance Comparison:
- ❌ Without Configuration: Using yt-dlp to fetch video info (2-5 seconds)
- ✅ With Configuration: Using YouTube API to fetch video info (0.1-0.3 seconds) ⚡
- 🚀 Speed Improvement: 10-50x faster!
Use Cases:
| Feature | Without Config | With Config | Improvement |
|---|---|---|---|
| YouTube Video Preview | 2-5s | 0.1-0.3s | ⚡ 10-50x |
| YouTube Note Generation | 5-10s | 4-9s | ⚡ Faster |
| YouTube Video Search | Slow | Super Fast | ⚡ 10-50x |
| Bilibili Videos | 2-5s | 2-5s | Unchanged ✅ |
💡 Important Notes:
- ✅ Highly recommended for YouTube videos: Dramatically improves processing speed
- ✅ Bilibili videos unaffected: Continue using cookies method
- ✅ Optional configuration: Works without it (auto-fallback to yt-dlp)
- ✅ Generous free quota: 10,000 units/day, sufficient for daily use
1. Get YouTube API Key
Visit Google Cloud Console:
- Create or select a project
- Enable "YouTube Data API v3"
- Create API credentials (API Key)
- Copy the generated API Key
2. Configure Environment Variable
Add to .env file:
YOUTUBE_API_KEY=your_api_key_here3. Verify Configuration
After starting the service, check logs:
✅ YouTube API configured successfully
API Key: AIzaSy...
Free Quota (per day):
- Total quota: 10,000 units
- Video preview: 1 unit/request → 10,000 requests/day
- Video search: 100 units/request → 100 searches/day
Auto-Fallback on Quota Exhausted:
- Quota exhausted → Auto-switch to yt-dlp
- API failure → Auto-switch to yt-dlp
- Seamless for users, no impact on functionality ✅
For complete setup tutorial, FAQ, and troubleshooting: 📖 YOUTUBE_API_SETUP.md
| Model | Parameters | GPU VRAM (fp16) | CPU RAM (int8) | Speed | Quality | Use Case |
|---|---|---|---|---|---|---|
tiny |
39M | ~1GB | ~600MB | ⚡⚡⚡⚡⚡ | ⭐⭐ | Quick testing, real-time transcription |
base |
74M | ~1GB | ~800MB | ⚡⚡⚡⚡ | ⭐⭐⭐ | Balanced choice ✅ |
small |
244M | ~2GB | ~1.5GB (1477MB) | ⚡⚡⚡ | ⭐⭐⭐⭐ | Medium quality |
medium |
769M | ~3-4GB | ~2.5GB | ⚡⚡ | ⭐⭐⭐⭐ | High quality |
large-v1 |
1550M | ~4.5GB | ~3GB | ⚡ | ⭐⭐⭐⭐⭐ | Highest quality (legacy) |
large-v2 |
1550M | ~4.5GB (4525MB) | ~2.9GB (2926MB int8) | ⚡ | ⭐⭐⭐⭐⭐ | Highest quality |
large-v3 / large |
1550M | ~4.5GB | ~3GB | ⚡ | ⭐⭐⭐⭐⭐ | Highest quality (recommended) |
Bilibili has anti-scraping mechanisms that require login credentials. If you encounter download failures (such as HTTP 412 errors), you need to configure the cookies file.
- ✅ Bypass anti-scraping verification on Bilibili
- ✅ Support downloading videos that require login to watch
- ✅ Improve download success rate and stability
💡 Important Notice:
- YouTube videos do NOT need cookies: System automatically accesses publicly
- Bilibili videos need cookies: Configure following the steps below
Method 1: Using yt-dlp Command (Recommended ⭐⭐⭐⭐⭐)
# 1. Ensure yt-dlp is installed
pip install yt-dlp
# 2. Export Bilibili Cookies
yt-dlp --cookies-from-browser chrome --cookies bilibili_cookies.txt https://www.bilibili.com
# Note:
# - chrome can be replaced with firefox, edge, safari, brave, etc.
# - macOS will prompt for system password to access keychainMethod 2: Copy Example File Manually
# 1. Copy the example file
cp cookies.txt.example bilibili_cookies.txt
# 2. Edit bilibili_cookies.txt and fill in real cookie values (Netscape format)
# Refer to comments in the fileMethod 3: Using Browser Extension
- Install a browser extension (such as EditThisCookie or Cookie-Editor)
- Log in to bilibili.com
- Export cookies in Netscape format
- Save as
bilibili_cookies.txt
bilibili_cookies.txt file format (Netscape HTTP Cookie File):
# Netscape HTTP Cookie File
# Bilibili Cookies
.bilibili.com TRUE / FALSE 1893456000 SESSDATA your_SESSDATA_value (required)
.bilibili.com TRUE / FALSE 1893456000 bili_jct your_bili_jct_value
.bilibili.com TRUE / FALSE 1893456000 DedeUserID your_user_id
.bilibili.com TRUE / FALSE 1893456000 buvid3 device_fingerprint
.bilibili.com TRUE / FALSE 1893456000 sid session_id
- 🔒
bilibili_cookies.txtcontains login credentials - 🔄 Cookies typically expire in 3-6 months, need regular updates
1. ViNoter Super Search Module ⭐⭐⭐⭐⭐
- ✅ Super video agent based on ANP protocol
- ✅ Conversational video search on websites (supports Bilibili, YouTube, etc.)
- ✅ Conversational video transcription with direct download after completion
- ✅ Intelligently understands user intent and automatically calls appropriate tools
- ✅ Streaming conversation experience with real-time progress feedback
2. ANP Protocol Video Search Demo System 🔐
- ✅ Client Agent: Intelligent conversation client (
search_client_agent.py) - ✅ DID Server: Decentralized identity authentication server (
client_did_server.py) - ✅ Server Agent: Video search server (
search_server_agent.py) - ✅ Complete DID identity authentication process
- ✅ Secure communication mechanism between Agents
3. Transcription Progress Optimization 📊
- ✅ Backend adds detailed transcription progress tracking
- ✅ Frame-by-frame progress output for developer debugging
- ✅ Real-time progress percentage display
- ✅ Real-time transcription status updates
4. Bilibili Video 412 Error Fix 🛠️
- ✅ Added Cookie authentication support
- ✅ Bilibili uses dedicated
bilibili_cookies.txt - ✅ Built-in Developer Tools for easy Cookie format conversion
5. Dependency Management Improvements 📦
- ✅ Added ANP protocol related dependencies
- ✅ Ensured environment reproducibility
Prerequisites for using ViNoter Agent:
- Must locally start ANP's
search_server_agent.pyserver- Detailed configuration see
backend/anp/README.md- Need to generate DID key pairs
- ✅ Local Video Support: Support for local video file input via absolute path
- Supported formats: MP4, AVI, MOV, MKV, MP3, WAV, etc.
- Support for Mac/Linux/Windows paths
- Docker environment supports directory mounting
- ✅ Video Notes Local Mode: Process local videos directly to generate notes
- ✅ Video Q&A Local Mode: Intelligent Q&A based on local video content
- Optimized path validation logic
- Improved user interface experience
- Enhanced documentation
- ✅ Online video download and transcription
- ✅ AI-driven note generation
- ✅ Video Q&A system
- ✅ Video download functionality
- ✅ Multi-language support
- ✅ Real-time progress tracking
- ✅ ViNoter Super Agent
- ✅ Video audio download and transcription
- ✅ AI-driven note generation
- ✅ Intelligent text optimization
- ✅ Multi-language translation support
- ✅ Video Q&A system
- ✅ Video download functionality
- 🔲 Video content to article
- 🔲 Multi-platform publishing (WeChat, Zhihu, Xiaohongshu, etc.)
- 🔲 Custom publishing templates
- 🔲 Image-text mixed layout editor
- 🔲 Extract video subtitles
- 🔲 Multi-format support (SRT, VTT, ASS, etc.)
- 🔲 Automatically extract knowledge points
- 🔲 Generate study cards
- 🔲 Automatically generate mind maps
- 🔲 Multiple mind map styles
- 🔲 Export as image/PDF
ViNote integrates an ANP (Agent Network Protocol) based video search demo system, demonstrating decentralized identity authentication and intelligent Agent communication capabilities.
ANP (Agent Network Protocol) is an Agent network protocol based on DID (Decentralized Identity), supporting:
- 🔐 Decentralized Identity Authentication: Secure authentication based on DID standards
- 🤖 Intelligent Agent Communication: Supports multi-Agent collaboration and tool invocation
- 🌐 Distributed Architecture: No need for centralized servers
cd backend/anp
python gen_did_keys.pyThis will generate DID documents and keys for both server and client.
Terminal 1 - Client DID Server:
cd backend/anp
python client_did_server.pyTerminal 2 - Video Search Server:
cd backend/anp
python search_server_agent.pyTerminal 3 - Intelligent Client:
cd backend/anp
python search_client_agent.pyEnter natural language queries in the client terminal:
You: Help me search for Python tutorials on Bilibili
The system will automatically:
- 🤔 Parse your intent
- 🔍 Call corresponding search interface
- 📊 Return summarized results
ViNote main application has integrated ANP video search functionality. You can configure ANP server address via environment variables:
# .env file
ANP_SERVER_URL=http://localhost:8999/ad.jsonFor detailed ANP documentation and example code, see:
Contributions are welcome! Please follow these steps:
- Fork this repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- 📋 Check the Roadmap to select features to develop
- 🐛 Fix bugs in Issues
- 📝 Improve documentation and examples
- ✨ Propose new feature ideas
This project is licensed under the MIT License - see the LICENSE file for details
This project is built upon the following excellent open-source projects and services:
- yt-dlp - Powerful video download tool supporting hundreds of video platforms
- Faster-Whisper - Efficient Whisper implementation with excellent transcription performance
- FastAPI - Modern Python web framework, high-performance and easy to use
- OpenAI API - Powerful AI text processing capabilities
- AI-Video-Transcriber - An open-source AI video transcription and summarization tool that provided important design inspiration for this project
Thanks to all open-source contributors! 💖
- Issue Feedback: GitHub Issues
- Email: 864410260@qq.com
If this project helps you, please give it a ⭐️ Star!
Made with ❤️ by ViNote Team

