Fix TikTok WAF blocking by using Chrome 120 impersonation#61
Fix TikTok WAF blocking by using Chrome 120 impersonation#61karilaa-dev merged 10 commits intomainfrom
Conversation
Simplify configuration by removing most performance-related env vars and hardcoding values optimized for maximum resource usage: - ThreadPoolExecutor: 500 workers (vs default 32) - aiohttp connections: unlimited (limit=0) - curl_cffi pool: 10000 max_clients - Image downloads: no concurrency limit (removed semaphore) Keep only 3 user-configurable limits via env vars: - MAX_USER_QUEUE_SIZE (default 0 = no limit) - STREAMING_DURATION_THRESHOLD (default 300s) - MAX_VIDEO_DURATION (default 0 = no limit)
…adata TikTok's browser impersonation (impersonate=True) doesn't work through HTTP proxies, causing extraction to fail with "Unable to extract webpage video data". Changed approach: - Use direct connection (no proxy) for video info extraction with impersonate - Use proxy for media downloads to hide server IP This fixes the issue where all proxy attempts would fail due to TikTok's JavaScript challenge blocking non-browser requests through proxies.
Create the new YoutubeDL instance before closing the old one to ensure we have a valid ydl even if initialization fails.
Return extraction error if video_data is None despite a non-error status code, preventing downstream issues from invalid data.
Remove logic that stripped proxy from ydl_opts during extraction. Datacenter IPs are typically blocked by TikTok, so extraction must use the configured proxy to work on servers.
TikTok's WAF blocks newer Chrome versions (136+) when used with proxies due to TLS fingerprint / User-Agent mismatches. This commit: - Use fixed Chrome 120 impersonation target instead of auto-selecting newest - Set matching User-Agent header for yt-dlp extraction and media downloads - Add per-proxy session pool to avoid proxy contamination between requests - Bake proxy into curl_cffi sessions at construction time
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
Failed to generate code suggestions for PR |
User description
Summary
Problem
TikTok's WAF blocks newer Chrome versions (136+) when used with proxies due to TLS fingerprint / User-Agent mismatches. This caused extraction failures with "Unable to extract webpage video data" errors.
Test plan
PR Type
Bug fix, Enhancement
Description
Fix TikTok WAF blocking with Chrome 120 impersonation
Add per-proxy session pool to prevent contamination
Set matching User-Agent for extraction and downloads
Simplify extraction logic to always use configured proxy
Diagram Walkthrough
flowchart LR A["TikTok Request"] --> B{Proxy Configured?} B -->|Yes| C["Get/Create Session for Proxy"] B -->|No| D["Get/Create Direct Session"] C --> E["Use Chrome 120 Impersonation"] D --> E E --> F["Set Matching User-Agent Header"] F --> G["Extract Video Data"] G --> H["Download Media"]File Walkthrough
client.py
Fix TikTok WAF blocking with Chrome 120 impersonationtiktok_api/client.py
constants