Error Handling and Retries in Scrapers - Python Scraping

Build robust scrapers with proper error handling, automatic retries, exponential backoff, and graceful failure recovery.

Web scraping is inherently unreliable. Servers go down, connections time out, and pages change without warning. A production scraper must handle all of these gracefully.

Common Errors in Scraping

Error	Cause	Solution
`ConnectionError`	Server unreachable	Retry with backoff
`Timeout`	Slow response	Set timeout, retry
HTTP 403	Blocked/Forbidden	Rotate user agents or use proxies
HTTP 429	Rate limited	Slow down, add delays
HTTP 500	Server error	Retry later
`AttributeError`	HTML structure changed	Validate selectors

Basic Error Handling

import requests
from bs4 import BeautifulSoup

def scrape_page(url):
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()  # Raises exception for 4xx/5xx
    except requests.exceptions.Timeout:
        print(f"Timeout: {url}")
        return None
    except requests.exceptions.HTTPError as e:
        print(f"HTTP Error {e.response.status_code}: {url}")
        return None
    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
        return None

    soup = BeautifulSoup(response.text, "html.parser")
    title = soup.select_one("title")
    return title.get_text() if title else "No title found"

result = scrape_page("https://quotes.toscrape.com/")
print(result)

Retry with Exponential Backoff

import time
import requests


def fetch_with_retry(url, max_retries=3, base_delay=1):
    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=10)

            if response.status_code == 200:
                return response

            if response.status_code == 429:
                delay = base_delay * (2 ** attempt)
                print(f"Rate limited. Retrying in {delay}s...")
                time.sleep(delay)
                continue

            if response.status_code >= 500:
                delay = base_delay * (2 ** attempt)
                print(f"Server error {response.status_code}. Retrying in {delay}s...")
                time.sleep(delay)
                continue

            # 4xx errors (except 429), don't retry
            print(f"Client error {response.status_code} for {url}")
            return None

        except requests.exceptions.RequestException as e:
            delay = base_delay * (2 ** attempt)
            print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
            time.sleep(delay)

    print(f"All {max_retries} retries failed for {url}")
    return None

Using the tenacity Library

The tenacity library provides a clean decorator-based retry mechanism.

pip install tenacity

import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type


@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
    retry=retry_if_exception_type(requests.exceptions.RequestException),
)
def fetch_url(url):
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    return response


try:
    resp = fetch_url("https://quotes.toscrape.com/")
    print(f"Success: {resp.status_code}")
except Exception as e:
    print(f"Failed after retries: {e}")

Using requests Session with Retry Adapter

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


def create_session(retries=3, backoff_factor=0.5):
    session = requests.Session()
    retry_strategy = Retry(
        total=retries,
        backoff_factor=backoff_factor,
        status_forcelist=[429, 500, 502, 503, 504],
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session


session = create_session()
response = session.get("https://quotes.toscrape.com/", timeout=10)
print(response.status_code)

Tips

Always set a timeout on every request, never let a request hang indefinitely.
Use exponential backoff to avoid hammering a struggling server.
Log errors with the URL so you can reprocess failed pages later.
Proxy services like ScraperAPI and ScrapingAnt handle retries and IP rotation automatically, reducing the error-handling burden on your code.

Next Steps

Learn to handle login-protected pages and authentication
Build scrapers that manage cookies and sessions