Skip to content

wcAmon/media-engine

Repository files navigation

Media Engine

AI 影片生成 API 服務(agent 專用後端)。

English version below

技術架構

  • Framework: FastAPI + SQLModel
  • Language: Python 3.11+
  • Video Generation: Fal AI Kling 2.5 Turbo Pro (雲端 API)
  • Image Generation: Google Gemini (gemini-2.5-flash-preview-image-generation)
  • Language Engine: Google Gemini 3 Flash (google-genai SDK)
  • Web Search: Tavily
  • TTS: ElevenLabs (多語言 + 字級時間戳)
  • Database: SQLite (local)
  • Architecture: 單進程,asyncio.create_task()(無佇列、無 worker)

工作流程

1. generate_script        # Tavily 搜尋 + Gemini 分析想法、生成腳本
2. enhance_prompts        # 優化 t2i/subtitle 提示詞
3. generate_character     # 生成角色參考圖 (Gemini Image)
4. generate_images        # 各場景圖片 (Gemini Image)
5. generate_video_prompts # Vision 分析圖片 + 生成 i2v 動態提示
6. generate_audio         # ElevenLabs TTS + 時間戳 + BGM
7. compose_scenes         # 每場景獨立合成 (影片/KB + 音訊 + 字幕)
8. finalize_video         # 串接場景 (copy) + 混入 BGM
9. generate_metadata      # 標題、描述、標籤
10. upload                # 儲存到本地輸出目錄

API 端點

端點 方法 說明
/health GET 健康檢查
/api/jobs POST 建立任務
/api/jobs GET 任務列表
/api/jobs/{id} GET 任務詳情
/api/jobs/{id}/stream GET SSE 進度串流
/api/jobs/{id} DELETE 取消/刪除任務

建立任務

{
  "idea": "影片描述",
  "style": "cinematic | anime",
  "voice": "male | female",
  "test_mode": false
}

開發

uv sync              # 安裝依賴
uv run media-engine  # 啟動服務
uv run pytest        # 測試
uv run ruff check src tests

環境變數

FAL_API_KEY=        # Fal AI (Kling 2.5)
GOOGLE_API_KEY=     # Google Gemini
TAVILY_API_KEY=     # Tavily 搜尋
ELEVENLABS_API_KEY= # ElevenLabs TTS
API_HOST=0.0.0.0
API_PORT=8000
OUTPUT_DIR=./outputs

資料庫為 SQLite(media_engine.db),無需額外設定。

注意事項

  • Gemini only: 文字和圖片都用 Gemini,不需要 OpenAI
  • 雲端影片生成: Fal AI Kling 2.5 Turbo Pro,無需本地 GPU
  • AI 自動選擇聲音: 分析內容後自動決定男聲/女聲,可用 voice 參數覆蓋
  • 圖片安全: 自動過濾不當內容,重試加安全修飾詞

Media Engine (English)

AI video generation API backend, designed for agent-driven workflows.

Tech Stack

  • Framework: FastAPI + SQLModel
  • Language: Python 3.11+
  • Video Generation: Fal AI Kling 2.5 Turbo Pro (cloud API)
  • Image Generation: Google Gemini (gemini-2.5-flash-preview-image-generation)
  • Language Engine: Google Gemini 3 Flash (google-genai SDK)
  • Web Search: Tavily
  • TTS: ElevenLabs (multilingual + word-level timestamps)
  • Database: SQLite (local)
  • Architecture: Single process, asyncio.create_task() (no queue, no worker)

Pipeline

1. generate_script        # Tavily search + Gemini idea analysis & script generation
2. enhance_prompts        # Refine t2i/subtitle prompts
3. generate_character     # Generate character reference image (Gemini Image)
4. generate_images        # Generate scene images (Gemini Image)
5. generate_video_prompts # Vision analysis + i2v motion prompts
6. generate_audio         # ElevenLabs TTS + timestamps + BGM
7. compose_scenes         # Per-scene composition (video/KB + audio + subtitles)
8. finalize_video         # Concatenate scenes (stream copy) + mix BGM
9. generate_metadata      # Title, description, tags
10. upload                # Save to local output directory

API Endpoints

Endpoint Method Description
/health GET Health check
/api/jobs POST Create job
/api/jobs GET List jobs
/api/jobs/{id} GET Job details
/api/jobs/{id}/stream GET SSE progress stream
/api/jobs/{id} DELETE Cancel/delete job

Create Job

{
  "idea": "Video description",
  "style": "cinematic | anime",
  "voice": "male | female",
  "test_mode": false
}

Development

uv sync              # Install dependencies
uv run media-engine  # Start server
uv run pytest        # Run tests
uv run ruff check src tests

Environment Variables

FAL_API_KEY=        # Fal AI (Kling 2.5)
GOOGLE_API_KEY=     # Google Gemini
TAVILY_API_KEY=     # Tavily search
ELEVENLABS_API_KEY= # ElevenLabs TTS
API_HOST=0.0.0.0
API_PORT=8000
OUTPUT_DIR=./outputs

Database is SQLite (media_engine.db), no extra configuration needed.

Notes

  • Gemini only: Both text and image generation use Gemini — no OpenAI dependency
  • Cloud video generation: Fal AI Kling 2.5 Turbo Pro, no local GPU required
  • Auto voice selection: AI picks male/female voice based on content analysis; override with voice parameter
  • Image safety: Automatic content filtering with safe-prompt retry

License

Private - All rights reserved.

About

automatic video generator

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages