Skip to content

Fix TikTok WAF blocking by using Chrome 120 impersonation#61

Merged
karilaa-dev merged 10 commits intomainfrom
dev
Jan 16, 2026
Merged

Fix TikTok WAF blocking by using Chrome 120 impersonation#61
karilaa-dev merged 10 commits intomainfrom
dev

Conversation

@karilaa-dev
Copy link
Owner

@karilaa-dev karilaa-dev commented Jan 16, 2026

User description

Summary

  • Use fixed Chrome 120 impersonation target instead of auto-selecting newest Chrome version
  • Set matching User-Agent header for yt-dlp extraction and media downloads
  • Add per-proxy session pool to avoid proxy contamination between requests
  • Bake proxy into curl_cffi sessions at construction time

Problem

TikTok's WAF blocks newer Chrome versions (136+) when used with proxies due to TLS fingerprint / User-Agent mismatches. This caused extraction failures with "Unable to extract webpage video data" errors.

Test plan

  • Tested slideshow extraction with proxy - works correctly
  • Verified Chrome 120 impersonation bypasses TikTok WAF
  • Confirmed per-proxy session pool prevents contamination

PR Type

Bug fix, Enhancement


Description

  • Fix TikTok WAF blocking with Chrome 120 impersonation

  • Add per-proxy session pool to prevent contamination

  • Set matching User-Agent for extraction and downloads

  • Simplify extraction logic to always use configured proxy


Diagram Walkthrough

flowchart LR
  A["TikTok Request"] --> B{Proxy Configured?}
  B -->|Yes| C["Get/Create Session for Proxy"]
  B -->|No| D["Get/Create Direct Session"]
  C --> E["Use Chrome 120 Impersonation"]
  D --> E
  E --> F["Set Matching User-Agent Header"]
  F --> G["Extract Video Data"]
  G --> H["Download Media"]
Loading

File Walkthrough

Relevant files
Bug fix
client.py
Fix TikTok WAF blocking with Chrome 120 impersonation       

tiktok_api/client.py

  • Added fixed Chrome 120 impersonation target and matching User-Agent
    constants
  • Replaced single curl session with per-proxy session pool dictionary
  • Updated session creation to bake proxy at construction time
  • Simplified extraction logic to always use configured proxy
  • Updated bypass headers to use fixed User-Agent matching impersonation
  • Updated yt-dlp options with Chrome 120 impersonation target
+98/-148

Simplify configuration by removing most performance-related env vars
and hardcoding values optimized for maximum resource usage:

- ThreadPoolExecutor: 500 workers (vs default 32)
- aiohttp connections: unlimited (limit=0)
- curl_cffi pool: 10000 max_clients
- Image downloads: no concurrency limit (removed semaphore)

Keep only 3 user-configurable limits via env vars:
- MAX_USER_QUEUE_SIZE (default 0 = no limit)
- STREAMING_DURATION_THRESHOLD (default 300s)
- MAX_VIDEO_DURATION (default 0 = no limit)
…adata

TikTok's browser impersonation (impersonate=True) doesn't work through HTTP
proxies, causing extraction to fail with "Unable to extract webpage video data".

Changed approach:
- Use direct connection (no proxy) for video info extraction with impersonate
- Use proxy for media downloads to hide server IP

This fixes the issue where all proxy attempts would fail due to TikTok's
JavaScript challenge blocking non-browser requests through proxies.
Create the new YoutubeDL instance before closing the old one to ensure
we have a valid ydl even if initialization fails.
Return extraction error if video_data is None despite a non-error status
code, preventing downstream issues from invalid data.
Remove logic that stripped proxy from ydl_opts during extraction.
Datacenter IPs are typically blocked by TikTok, so extraction must
use the configured proxy to work on servers.
TikTok's WAF blocks newer Chrome versions (136+) when used with proxies
due to TLS fingerprint / User-Agent mismatches. This commit:

- Use fixed Chrome 120 impersonation target instead of auto-selecting newest
- Set matching User-Agent header for yt-dlp extraction and media downloads
- Add per-proxy session pool to avoid proxy contamination between requests
- Bake proxy into curl_cffi sessions at construction time
@zam-review
Copy link

zam-review bot commented Jan 16, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Session Pool Memory Leak

The _curl_session_pool dictionary stores sessions keyed by proxy URL but has no cleanup mechanism. If many different proxies are used over time, this pool could grow indefinitely, causing memory leaks. Consider implementing a session eviction policy (e.g., LRU) or a maximum pool size limit.

_curl_session_pool: dict[Optional[str], CurlAsyncSession] = {}
_curl_session_lock = threading.Lock()
_impersonate_target: Optional[str] = None

@classmethod
def _get_impersonate_target(cls) -> str:
    """Get the impersonation target for TikTok requests.

    TikTok's WAF blocks newer Chrome versions (136+) when used with proxies
    due to TLS fingerprint / User-Agent mismatches. Chrome 120 is known to
    work reliably with proxies.

    Returns:
        curl_cffi-compatible impersonate string (e.g., "chrome120")
    """
    # Use fixed Chrome 120 target that works with TikTok's WAF
    # This must match TIKTOK_IMPERSONATE_TARGET and TIKTOK_USER_AGENT
    logger.debug(
        f"Using impersonation target: {TIKTOK_IMPERSONATE_TARGET} "
        f"(curl_cffi {curl_cffi.__version__})"
    )
    return TIKTOK_IMPERSONATE_TARGET

@classmethod
def _get_curl_session(cls, proxy: Optional[str] = None) -> CurlAsyncSession:
    """Get or create curl_cffi AsyncSession for a specific proxy.

    Sessions are pooled by proxy URL to avoid proxy contamination.
    curl_cffi bakes the proxy into the session at creation time, so we need
    separate sessions for different proxies.

    The session uses yt-dlp's BROWSER_TARGETS to select the best impersonation
    target, ensuring TLS fingerprint matches a real browser.

    Args:
        proxy: Proxy URL string, or None for direct connection.

    Returns:
        CurlAsyncSession configured with the specified proxy.
    """
    with cls._curl_session_lock:
        # Check if session exists for this proxy
        if proxy not in cls._curl_session_pool:
            pool_size = 1000  # Per-proxy pool size
            if cls._impersonate_target is None:
                cls._impersonate_target = cls._get_impersonate_target()

            # Create session with proxy baked in at construction time
            cls._curl_session_pool[proxy] = CurlAsyncSession(
                impersonate=cls._impersonate_target,
                proxy=proxy,  # curl_cffi converts this to {"all": proxy}
                max_clients=pool_size,
            )
            logger.info(
                f"Created curl_cffi session for proxy={_strip_proxy_auth(proxy)}, "
                f"impersonate={cls._impersonate_target}, max_clients={pool_size}"
            )
        return cls._curl_session_pool[proxy]
Missing Null Check

The code uses ImpersonateTarget at line 760 without checking if it's None. If the import fails (caught in the except block at line 37), this will raise an AttributeError. Add a null check before using ImpersonateTarget.

if ImpersonateTarget is not None:
    opts["impersonate"] = ImpersonateTarget("chrome", "120", "macos", None)
opts["http_headers"] = {"User-Agent": TIKTOK_USER_AGENT}
Hardcoded Browser Version

The impersonation target is now hardcoded to Chrome 120. While this fixes the current WAF issue, it may become outdated as TikTok's WAF evolves. Consider making this configurable or adding a fallback mechanism if Chrome 120 stops working.

TIKTOK_IMPERSONATE_TARGET = "chrome120"
TIKTOK_USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

@karilaa-dev karilaa-dev merged commit c0d33dd into main Jan 16, 2026
1 check passed
@zam-review
Copy link

zam-review bot commented Jan 16, 2026

Failed to generate code suggestions for PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant