AI 影片生成 API 服務(agent 專用後端)。
- Framework: FastAPI + SQLModel
- Language: Python 3.11+
- Video Generation: Fal AI Kling 2.5 Turbo Pro (雲端 API)
- Image Generation: Google Gemini (gemini-2.5-flash-preview-image-generation)
- Language Engine: Google Gemini 3 Flash (google-genai SDK)
- Web Search: Tavily
- TTS: ElevenLabs (多語言 + 字級時間戳)
- Database: SQLite (local)
- Architecture: 單進程,asyncio.create_task()(無佇列、無 worker)
1. generate_script # Tavily 搜尋 + Gemini 分析想法、生成腳本
2. enhance_prompts # 優化 t2i/subtitle 提示詞
3. generate_character # 生成角色參考圖 (Gemini Image)
4. generate_images # 各場景圖片 (Gemini Image)
5. generate_video_prompts # Vision 分析圖片 + 生成 i2v 動態提示
6. generate_audio # ElevenLabs TTS + 時間戳 + BGM
7. compose_scenes # 每場景獨立合成 (影片/KB + 音訊 + 字幕)
8. finalize_video # 串接場景 (copy) + 混入 BGM
9. generate_metadata # 標題、描述、標籤
10. upload # 儲存到本地輸出目錄
| 端點 | 方法 | 說明 |
|---|---|---|
/health |
GET | 健康檢查 |
/api/jobs |
POST | 建立任務 |
/api/jobs |
GET | 任務列表 |
/api/jobs/{id} |
GET | 任務詳情 |
/api/jobs/{id}/stream |
GET | SSE 進度串流 |
/api/jobs/{id} |
DELETE | 取消/刪除任務 |
{
"idea": "影片描述",
"style": "cinematic | anime",
"voice": "male | female",
"test_mode": false
}uv sync # 安裝依賴
uv run media-engine # 啟動服務
uv run pytest # 測試
uv run ruff check src testsFAL_API_KEY= # Fal AI (Kling 2.5)
GOOGLE_API_KEY= # Google Gemini
TAVILY_API_KEY= # Tavily 搜尋
ELEVENLABS_API_KEY= # ElevenLabs TTS
API_HOST=0.0.0.0
API_PORT=8000
OUTPUT_DIR=./outputs資料庫為 SQLite(media_engine.db),無需額外設定。
- Gemini only: 文字和圖片都用 Gemini,不需要 OpenAI
- 雲端影片生成: Fal AI Kling 2.5 Turbo Pro,無需本地 GPU
- AI 自動選擇聲音: 分析內容後自動決定男聲/女聲,可用
voice參數覆蓋 - 圖片安全: 自動過濾不當內容,重試加安全修飾詞
AI video generation API backend, designed for agent-driven workflows.
- Framework: FastAPI + SQLModel
- Language: Python 3.11+
- Video Generation: Fal AI Kling 2.5 Turbo Pro (cloud API)
- Image Generation: Google Gemini (gemini-2.5-flash-preview-image-generation)
- Language Engine: Google Gemini 3 Flash (google-genai SDK)
- Web Search: Tavily
- TTS: ElevenLabs (multilingual + word-level timestamps)
- Database: SQLite (local)
- Architecture: Single process, asyncio.create_task() (no queue, no worker)
1. generate_script # Tavily search + Gemini idea analysis & script generation
2. enhance_prompts # Refine t2i/subtitle prompts
3. generate_character # Generate character reference image (Gemini Image)
4. generate_images # Generate scene images (Gemini Image)
5. generate_video_prompts # Vision analysis + i2v motion prompts
6. generate_audio # ElevenLabs TTS + timestamps + BGM
7. compose_scenes # Per-scene composition (video/KB + audio + subtitles)
8. finalize_video # Concatenate scenes (stream copy) + mix BGM
9. generate_metadata # Title, description, tags
10. upload # Save to local output directory
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/api/jobs |
POST | Create job |
/api/jobs |
GET | List jobs |
/api/jobs/{id} |
GET | Job details |
/api/jobs/{id}/stream |
GET | SSE progress stream |
/api/jobs/{id} |
DELETE | Cancel/delete job |
{
"idea": "Video description",
"style": "cinematic | anime",
"voice": "male | female",
"test_mode": false
}uv sync # Install dependencies
uv run media-engine # Start server
uv run pytest # Run tests
uv run ruff check src testsFAL_API_KEY= # Fal AI (Kling 2.5)
GOOGLE_API_KEY= # Google Gemini
TAVILY_API_KEY= # Tavily search
ELEVENLABS_API_KEY= # ElevenLabs TTS
API_HOST=0.0.0.0
API_PORT=8000
OUTPUT_DIR=./outputsDatabase is SQLite (media_engine.db), no extra configuration needed.
- Gemini only: Both text and image generation use Gemini — no OpenAI dependency
- Cloud video generation: Fal AI Kling 2.5 Turbo Pro, no local GPU required
- Auto voice selection: AI picks male/female voice based on content analysis; override with
voiceparameter - Image safety: Automatic content filtering with safe-prompt retry
Private - All rights reserved.