Rate Limiting and Polite Scraping - Anti-Detection

Learn how to implement rate limiting in your scrapers to avoid detection, prevent bans, and scrape responsibly.

Hammering a website with rapid-fire requests is the fastest way to get blocked. Polite scraping means controlling your request rate to stay under the radar and avoid overloading servers.

Why Rate Limiting Matters

Prevents your IP from being banned
Avoids triggering anti-bot systems
Reduces server load on the target site
Makes your traffic look more human-like
Keeps you on the right side of ethical scraping

Simple Delay Between Requests

import requests
import time
import random

urls = [
    "https://example.com/page/1",
    "https://example.com/page/2",
    "https://example.com/page/3",
]

for url in urls:
    response = requests.get(url, timeout=15)
    print(f"{url}: {response.status_code}")

    # Random delay between 1-3 seconds
    delay = random.uniform(1.0, 3.0)
    time.sleep(delay)

Token Bucket Rate Limiter

For more precise control, implement a token bucket algorithm:

import time
import requests
from threading import Lock

class RateLimiter:
    def __init__(self, requests_per_second: float):
        self.min_interval = 1.0 / requests_per_second
        self.last_request = 0.0
        self.lock = Lock()

    def wait(self):
        with self.lock:
            now = time.time()
            elapsed = now - self.last_request
            if elapsed < self.min_interval:
                time.sleep(self.min_interval - elapsed)
            self.last_request = time.time()

# Allow 2 requests per second
limiter = RateLimiter(requests_per_second=2.0)

for i in range(10):
    limiter.wait()
    response = requests.get("https://httpbin.org/ip", timeout=15)
    print(f"Request {i+1} at {time.strftime('%H:%M:%S')}: {response.status_code}")

Respecting Retry-After Headers

When a server returns HTTP 429 (Too Many Requests), it often includes a Retry-After header:

import requests
import time

def polite_fetch(url, max_retries=3):
    for attempt in range(max_retries):
        response = requests.get(url, timeout=15)

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 60))
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after)
            continue

        return response

    return None

Recommended Request Rates

Site Type	Suggested Rate	Notes
Small business sites	1 req/3-5s	Be very gentle
Medium sites	1-2 req/s	Watch for 429s
Large platforms	3-5 req/s	They can handle more
APIs with rate limits	Follow their docs	Stay within limits

Polite Scraping Checklist

Add random delays between requests (1-5 seconds for most sites)
Respect Retry-After headers
Check robots.txt for crawl-delay directives
Scrape during off-peak hours when possible
Identify your bot with a descriptive User-Agent if appropriate
Cache responses to avoid re-fetching the same pages

Rate limiting also saves you money when using services like ScraperAPI since you consume fewer API credits.