Pythonで失敗したリクエストを管理する

このガイドでは、Pythonで失敗したHTTPリクエストをリトライ戦略とカスタムロジックで処理する方法を解説します。

ステータスコードとは？
リトライ戦略
HTTPAdapter
Tenacity
カスタムリトライメカニズムの構築
結論

ステータスコードとは？

ステータスコードは、リクエストの結果を示すためにさまざまなプロトコルで使用される、標準化された3桁の数字です。Mozillaによると、HTTPステータスコードは次のカテゴリに分類できます。

100-199: 情報レスポンス
200-299: 成功レスポンス
300-399: リダイレクトメッセージ
400-499: クライアントエラーメッセージ
500-599: サーバーエラーメッセージ

Webスクレイピングなどのクライアントサイドアプリケーションを開発する際は、400および500番台のステータスコードに注意を払うことが重要です。400番台のコードは通常、認証の失敗、レート制限、タイムアウト、またはよく知られている 404: Not Found error などのクライアント側のエラーを示します。一方、500番台のステータスコードは、リトライや代替の処理戦略が必要となる可能性のあるサーバー側の問題を示します。

以下は、Webスクレイピングを行う際に遭遇する一般的なエラーコードの一覧です（Mozillaの公式ドキュメントより抜粋）。

Status Code	Meaning	Description
400	Bad Request	リクエスト形式を確認してください
401	Unauthorized	API keyを確認してください
403	Forbidden	このデータにはアクセスできません
404	Not Found	サイト/エンドポイントが存在しません
408	Request Timeout	リクエストがタイムアウトしました。再試行してください
429	Too Many Requests	リクエストの送信を減らしてください
500	Internal Server Error	一般的なサーバーエラーです。リクエストをリトライしてください
501	Not Implemented	サーバーがまだこれをサポートしていません
502	Bad Gateway	上流サーバーからのレスポンスに失敗しました
503	Service Unavailable	サーバーが一時的に停止しています。後でもう一度リトライしてください
504	Gateway Timeout	上流サーバーの待機中にタイムアウトしました

リトライ戦略

Pythonでリトライメカニズムを実装する際は、HTTPAdapter や Tenacity のような既製ライブラリを活用できます。あるいは、特定のニーズに基づいてカスタムのリトライロジックを開発することも可能です。

よく設計されたリトライ戦略には、リトライ回数の上限とバックオフメカニズムの両方を含めるべきです。リトライ上限は無限ループを防ぎ、失敗したリクエストが際限なく続かないようにします。リトライ間の遅延を徐々に増やすバックオフ戦略は、ブロックされたりサーバーを過負荷にしたりする原因となる過剰なリクエストを防ぐのに役立ちます。

リトライ上限: リトライ上限を定義することが不可欠です。指定した回数（X）試行した後は、無限ループを避けるためにスクレイパーはリトライを停止するべきです。
バックオフアルゴリズム: リトライ間の待機時間を段階的に増やすことで、サーバーへの過負荷を防ぎます。0.3秒などの小さな遅延から始め、0.6秒、1.2秒、といった具合に段階的に増やしていきます。

HTTPAdapter

HTTPAdapter を使う場合、設定する必要があるのは total、backoff_factor、status_forcelist の3つです。allowed_methods は必須というわけではありませんが、リトライ条件を定義するのに役立ち、その結果コードがより安全になります。以下のコードでは、httpbin を使用して自動的にエラーを発生させ、リトライロジックをトリガーします。

import logging
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# Create a session
session = requests.Session()

# Configure retry settings
retry = Retry(
    total=3,  # Maximum retries
    backoff_factor=0.3,  # Time between retries (exponential backoff)
    status_forcelist=(429, 500, 502, 503, 504),  # Status codes to trigger a retry
    allowed_methods={"GET", "POST"}  # Allow retries for GET and POST
)

# Mount the adapter with our custom settings
adapter = HTTPAdapter(max_retries=retry)
session.mount("http://", adapter)
session.mount("https://", adapter)

# Function to make a request and test retry logic
def make_request(url, method="GET"):
    try:
        logger.info(f"Making a {method} request to {url} with retry logic...")
        
        if method == "GET":
            response = session.get(url)
        elif method == "POST":
            response = session.post(url)
        else:
            logger.error("Unsupported HTTP method: %s", method)
            return
        
        response.raise_for_status()
        logger.info("✅ Request successful: %s", response.status_code)
    
    except requests.exceptions.RequestException as e:
        logger.error("❌ Request failed after retries: %s", e)
        logger.info("Retries attempted: %d", len(response.history) if response else 0)

# Test Cases
make_request("https://httpbin.org/status/200")  # ✅ Should succeed without retries
make_request("https://httpbin.org/status/500")  # ❌ Should retry 3 times and fail
make_request("https://httpbin.org/status/404")  # ❌ Should fail immediately (no retries)
make_request("https://httpbin.org/status/500", method="POST")  # ❌ Should retry 3 times and fail

Session オブジェクトを作成したら、次を実行してください。

Retry オブジェクトを作成し、次を定義します:
- total: リクエストをリトライする最大上限です。
- backoff_factor: リトライ間の待機時間です。リトライ回数が増えるにつれて指数的に調整されます。
- status_forcelist: 悪いステータスコードのリストです。このリストに含まれるコードは自動的にリトライをトリガーします。
retry 変数を使って HTTPAdapter オブジェクトを作成します: adapter = HTTPAdapter(max_retries=retry).
adapter を作成したら、session.mount() を使用してHTTPとHTTPSにマウントします。

このコードを実行すると、3回のリトライ（total=3）が実行された後、次の出力が得られます。

2024-06-10 12:00:00 - INFO - Making a GET request to https://httpbin.org/status/200 with retry logic...
2024-06-10 12:00:00 - INFO - ✅ Request successful: 200

2024-06-10 12:00:01 - INFO - Making a GET request to https://httpbin.org/status/500 with retry logic...
2024-06-10 12:00:02 - ERROR - ❌ Request failed after retries: 500 Server Error: INTERNAL SERVER ERROR for url: ...
2024-06-10 12:00:02 - INFO - Retries attempted: 3

2024-06-10 12:00:03 - INFO - Making a GET request to https://httpbin.org/status/404 with retry logic...
2024-06-10 12:00:03 - ERROR - ❌ Request failed after retries: 404 Client Error: NOT FOUND for url: ...
2024-06-10 12:00:03 - INFO - Retries attempted: 0

2024-06-10 12:00:04 - INFO - Making a POST request to https://httpbin.org/status/500 with retry logic...
2024-06-10 12:00:05 - ERROR - ❌ Request failed after retries: 500 Server Error: INTERNAL SERVER ERROR for url: ...
2024-06-10 12:00:05 - INFO - Retries attempted: 3

Tenacity

Python向けの人気オープンソースリトライライブラリである Tenacity も使用できます。HTTPに限定されず、リトライを表現力豊かに実装できる方法を提供します。

まず Tenacity をインストールします。

pip install tenacity

インストール後、decorator を作成し、requests関数をそれでラップします。@retry デコレーターで、stop、wait、retry 引数を追加します。

import logging
import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type, retry_if_result, RetryError

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# Define a retry strategy
@retry(
    stop=stop_after_attempt(3),  # Retry up to 3 times
    wait=wait_exponential(multiplier=0.3),  # Exponential backoff
    retry=(
        retry_if_exception_type(requests.exceptions.RequestException) |  # Retry on request failures
        retry_if_result(lambda r: r.status_code in {500, 502, 503, 504})  # Retry on specific HTTP status codes
    ),
)
def make_request(url):
    logger.info("Making a request with retry logic to %s...", url)
    response = requests.get(url)
    response.raise_for_status()
    logger.info("✅ Request successful: %s", response.status_code)
    return response

# Attempt to make the request
try:
    make_request("https://httpbin.org/status/500")  # Test with a failing status code
except RetryError as e:
    logger.error("❌ Request failed after all retries: %s", e)

ここでのロジックと設定は、HTTPAdapter を使った最初の例と非常によく似ています。

stop=stop_after_attempt(3): 失敗したリトライが3回続いたら tenacity が諦めることを示します。
wait=wait_exponential(multiplier=0.3) は、先ほどと同じ待機を使用します。以前と同様に指数的にバックオフします。
retry=retry_if_exception_type(requests.exceptions.RequestException) は、RequestException が発生するたびにこのロジックを使用するよう tenacity に指示します。
make_request() はエラーエンドポイントにリクエストを行います。上で作成したデコレーターからすべての特性を受け取ります。

このコードを実行すると、同様の出力が得られます。

2024-06-10 12:00:00 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:01 - WARNING - Retrying after 0.3 seconds...
2024-06-10 12:00:01 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:02 - WARNING - Retrying after 0.6 seconds...
2024-06-10 12:00:02 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:03 - ERROR - ❌ Request failed after all retries: RetryError[...]

カスタムリトライメカニズムの構築

カスタムのリトライメカニズムを作成することもできます。これは、特殊なコードを扱う場合に最良のアプローチとなることが多いです。比較的少ないコード量で、既存ライブラリが提供するのと同等の機能を実現しつつ、特定のニーズに合わせて調整できます。

以下のコードは、指数バックオフのために sleep をインポートし、設定（total、backoff_factor、bad_codes）を行い、while ループを使ってリトライロジックを保持する方法を示しています。試行回数が残っていて、まだ成功していない while の間、リクエストを試みます。

import logging
import requests
from time import sleep

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# Create a session
session = requests.Session()

# Define retry settings
TOTAL_RETRIES = 3
INITIAL_BACKOFF = 0.3
BAD_CODES = {429, 500, 502, 503, 504}

def make_request(url):
    current_tries = 0
    backoff = INITIAL_BACKOFF
    success = False

    while current_tries < TOTAL_RETRIES and not success:
        try:
            logger.info("Making a request with retry logic to %s...", url)
            response = session.get(url)
            
            if response.status_code in BAD_CODES:
                raise requests.exceptions.HTTPError(f"Received {response.status_code}, triggering retry")
            
            response.raise_for_status()
            logger.info("✅ Request successful: %s", response.status_code)
            success = True
            return response

        except requests.exceptions.RequestException as e:
            logger.error("❌ Request failed: %s, retries left: %d", e, TOTAL_RETRIES - current_tries - 1)
            if current_tries < TOTAL_RETRIES - 1:
                logger.info("⏳ Retrying in %.1f seconds...", backoff)
                sleep(backoff)
                backoff *= 2  # Exponential backoff
            current_tries += 1

    logger.error("🚨 Request failed after all retries.")
    return None

# Test Cases
make_request("https://httpbin.org/status/500")  # ❌ Should retry 3 times and fail
make_request("https://httpbin.org/status/200")  # ✅ Should succeed without retries

ここでの実際のロジックは、単純な while ループで処理されています。

response.status_code が bad_codes のリストに含まれている場合、スクリプトは例外を投げます。
リクエストが失敗した場合、スクリプトは次を行います:
- コンソールにエラーメッセージを出力します。
- sleep(backoff_factor) が次のリクエストを送信する前に待機します。
- backoff_factor = backoff_factor * 2 が次の試行に向けて backoff_factor を倍にします。
- current_tries をインクリメントし、ループに無限に留まらないようにします。

以下はカスタムリトライコードの出力です。

2024-06-10 12:00:00 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:01 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 2
2024-06-10 12:00:01 - INFO - ⏳ Retrying in 0.3 seconds...
2024-06-10 12:00:02 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:03 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 1
2024-06-10 12:00:03 - INFO - ⏳ Retrying in 0.6 seconds...
2024-06-10 12:00:04 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:05 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 0
2024-06-10 12:00:05 - ERROR - 🚨 Request failed after all retries.

Conclusion

あらゆる種類の失敗したリクエストを回避するために、当社では Web Unlocker API や Scraping Browser のような製品を開発してきました。これらのツールは、アンチボット対策、CAPTCHAチャレンジ、IPブロックを自動的に処理し、最も難しいWebサイトに対してもシームレスで効率的なWebスクレイピングを実現します。

今すぐ登録して、無料トライアルを本日開始してください。

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pythonで失敗したリクエストを管理する

ステータスコードとは？

リトライ戦略

HTTPAdapter

Tenacity

カスタムリトライメカニズムの構築

Conclusion

About

Uh oh!

Releases

Packages

bright-jp/manage-failed-python-requests

Folders and files

Latest commit

History

Repository files navigation

Pythonで失敗したリクエストを管理する

ステータスコードとは？

リトライ戦略

HTTPAdapter

Tenacity

カスタムリトライメカニズムの構築

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages