Rate Limiting in Web Scraping - How to Be Polite

Learn how to implement rate limiting in your web scrapers. Covers delays, backoff strategies, and respectful scraping practices.

Aggressive scraping gets you blocked and can harm target servers. Smart rate limiting keeps your scrapers running reliably while being respectful.

Why Rate Limiting Matters

Avoid IP bans, Most sites block IPs that send too many requests
Prevent server overload, Excessive traffic can degrade the target site
Legal protection, Polite scraping is less likely to attract legal attention
Better data quality, Rushed scraping leads to missed pages and errors

Basic Delay Implementation

import time
import random
import requests

def scrape_with_delay(urls, min_delay=1, max_delay=3):
    results = []
    for url in urls:
        resp = requests.get(url)
        results.append(resp.text)
        
        # Random delay between requests
        delay = random.uniform(min_delay, max_delay)
        time.sleep(delay)
    
    return results

Exponential Backoff

When you get rate-limited (HTTP 429), back off exponentially.

import time
import requests

def fetch_with_backoff(url, max_retries=5):
    for attempt in range(max_retries):
        resp = requests.get(url)
        
        if resp.status_code == 200:
            return resp
        
        if resp.status_code == 429:
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Waiting {wait_time:.1f}s...")
            time.sleep(wait_time)
        else:
            resp.raise_for_status()
    
    raise Exception(f"Failed after {max_retries} retries")

Respecting robots.txt

from urllib.robotparser import RobotFileParser

rp = RobotFileParser()
rp.set_url("https://example.com/robots.txt")
rp.read()

if rp.can_fetch("*", "https://example.com/page"):
    # Safe to scrape
    crawl_delay = rp.crawl_delay("*")  # Recommended delay
    print(f"Crawl delay: {crawl_delay}s")

Rate Limiting Strategies

Strategy	Implementation	Best For
Fixed delay	`time.sleep(2)`	Simple scrapers
Random delay	`time.sleep(random.uniform(1, 3))`	Most use cases
Exponential backoff	Double delay on each retry	Handling 429 errors
Token bucket	Allow N requests per minute	Production scrapers
Adaptive	Slow down when errors increase	Large-scale crawling

Token Bucket Rate Limiter

import time
from threading import Lock

class RateLimiter:
    def __init__(self, requests_per_second=1):
        self.rate = requests_per_second
        self.last_request = 0
        self.lock = Lock()
    
    def wait(self):
        with self.lock:
            now = time.time()
            elapsed = now - self.last_request
            wait_time = max(0, (1 / self.rate) - elapsed)
            time.sleep(wait_time)
            self.last_request = time.time()

Let ScraperAPI Handle It

ScraperAPI and ScrapingAnt manage rate limiting for you. They distribute requests across their proxy pool at optimal speeds for each target site.

Best Practices

Start slow, Begin with 1 request per 2-3 seconds
Check robots.txt for crawl delay guidelines
Monitor HTTP 429 responses, They mean you are going too fast
Use random delays, Fixed intervals look robotic
Scrape during off-peak hours, Less load on the server, fewer blocks
Log your request rate, Track requests per minute for debugging