Skip to content

Fix TikTok extraction with proxies#60

Merged
karilaa-dev merged 7 commits intomainfrom
dev
Jan 16, 2026
Merged

Fix TikTok extraction with proxies#60
karilaa-dev merged 7 commits intomainfrom
dev

Conversation

@karilaa-dev
Copy link
Owner

@karilaa-dev karilaa-dev commented Jan 16, 2026

User description

Summary

  • Fix TikTok video extraction failing when proxies are configured
  • Use direct connection for metadata extraction (with browser impersonation)
  • Use proxies for media downloads to hide server IP

Test plan

  • Tested video extraction with proxies enabled
  • Verified proxy is used for downloads
  • Confirmed extraction bypasses proxy to use impersonate feature

PR Type

Bug fix, Documentation


Description

  • Fix TikTok extraction with proxies using direct connection for metadata

  • Simplify proxy handling by temporarily disabling it for extraction

  • Update CODEBASE_MAP.md with new configuration details

  • Document hardcoded performance values and removed env vars


Diagram Walkthrough

flowchart LR
  A[Proxy Configured] --> B[Disable Proxy Temporarily]
  B --> C[Extract Metadata with Impersonate]
  C --> D[Restore Proxy for Downloads]
  D --> E[Complete Extraction]
Loading

File Walkthrough

Relevant files
Bug fix
client.py
Fix TikTok extraction with proxies                                             

tiktok_api/client.py

  • Simplify proxy handling by temporarily disabling it for metadata
    extraction
  • Use ie._extract_web_data_and_status() for both proxy and non-proxy
    cases
  • Remove complex manual extraction logic for proxy scenario
  • Update download context to use saved proxy for downloads
+25/-74 
Documentation
CODEBASE_MAP.md
Update codebase documentation                                                       

docs/CODEBASE_MAP.md

  • Update last_mapped date and token counts
  • Add Telegram API credentials documentation
  • Update performance configuration section
  • Add notes about removed fields and hardcoded values
+20/-11 

Simplify configuration by removing most performance-related env vars
and hardcoding values optimized for maximum resource usage:

- ThreadPoolExecutor: 500 workers (vs default 32)
- aiohttp connections: unlimited (limit=0)
- curl_cffi pool: 10000 max_clients
- Image downloads: no concurrency limit (removed semaphore)

Keep only 3 user-configurable limits via env vars:
- MAX_USER_QUEUE_SIZE (default 0 = no limit)
- STREAMING_DURATION_THRESHOLD (default 300s)
- MAX_VIDEO_DURATION (default 0 = no limit)
…adata

TikTok's browser impersonation (impersonate=True) doesn't work through HTTP
proxies, causing extraction to fail with "Unable to extract webpage video data".

Changed approach:
- Use direct connection (no proxy) for video info extraction with impersonate
- Use proxy for media downloads to hide server IP

This fixes the issue where all proxy attempts would fail due to TikTok's
JavaScript challenge blocking non-browser requests through proxies.
@zam-review
Copy link

zam-review bot commented Jan 16, 2026

PR Description updated to latest commit (8745d5a)

@karilaa-dev
Copy link
Owner Author

/review

@zam-review
Copy link

zam-review bot commented Jan 16, 2026

PR Reviewer Guide 🔍

(Review updated until commit 14c1abe)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Resource Leak Potential

If yt_dlp.YoutubeDL(ydl_opts) fails on line 864, the old_ydl instance will never be closed, potentially leaking resources. Consider wrapping this in a try-finally block or using a context manager pattern to ensure cleanup even if initialization fails.

old_ydl = ydl
ydl = yt_dlp.YoutubeDL(ydl_opts)
old_ydl.close()  # Close old instance after new one is ready
Error Handling Change

The code now returns None, "extraction", None instead of raising TikTokExtractionError when video data extraction fails (lines 888, 906). This changes the error handling contract and may confuse callers expecting exceptions. Verify that all calling code handles this return value correctly.

# Validate that we got video data
if not video_data:
    logger.error(f"No video data returned for {video_id} (status={status})")
    return None, "extraction", None
Hardcoded Configuration Values

The documentation indicates that thread pool (500 workers) and curl_cffi connections (10,000) are now hardcoded for maximum throughput, removing previous environment variables (THREAD_POOL_SIZE, MAX_USER_QUEUE_SIZE, MAX_CONCURRENT_IMAGES). This reduces deployment flexibility and should be validated against different deployment scenarios.

| `MAX_USER_QUEUE_SIZE` | 0 | Max concurrent per user (0=unlimited) |
| `MAX_VIDEO_DURATION` | 0 | Max video duration (seconds, 0=unlimited) |
| `STREAMING_DURATION_THRESHOLD` | 300 | Stream videos longer than this (seconds) |
| `LOG_LEVEL` | INFO | Logging level |

**Note:** Thread pool (500 workers) and curl_cffi connections (10,000) are hardcoded for maximum throughput.

Create the new YoutubeDL instance before closing the old one to ensure
we have a valid ydl even if initialization fails.
@karilaa-dev
Copy link
Owner Author

/review

@zam-review
Copy link

zam-review bot commented Jan 16, 2026

Persistent review updated to latest commit 248c050

Return extraction error if video_data is None despite a non-error status
code, preventing downstream issues from invalid data.
@karilaa-dev
Copy link
Owner Author

/review

@zam-review
Copy link

zam-review bot commented Jan 16, 2026

Persistent review updated to latest commit 14c1abe

@karilaa-dev karilaa-dev merged commit 9fd94b7 into main Jan 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant