Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Tutorial

Rate Limiting in Web Scraping - How to Be Polite

Learn how to implement rate limiting in your web scrapers. Covers delays, backoff strategies, and respectful scraping practices.

Aggressive scraping gets you blocked and can harm target servers. Smart rate limiting keeps your scrapers running reliably while being respectful.

Why Rate Limiting Matters

  • Avoid IP bans, Most sites block IPs that send too many requests
  • Prevent server overload, Excessive traffic can degrade the target site
  • Legal protection, Polite scraping is less likely to attract legal attention
  • Better data quality, Rushed scraping leads to missed pages and errors

Basic Delay Implementation

import time
import random
import requests

def scrape_with_delay(urls, min_delay=1, max_delay=3):
    results = []
    for url in urls:
        resp = requests.get(url)
        results.append(resp.text)
        
        # Random delay between requests
        delay = random.uniform(min_delay, max_delay)
        time.sleep(delay)
    
    return results

Exponential Backoff

When you get rate-limited (HTTP 429), back off exponentially.

import time
import requests

def fetch_with_backoff(url, max_retries=5):
    for attempt in range(max_retries):
        resp = requests.get(url)
        
        if resp.status_code == 200:
            return resp
        
        if resp.status_code == 429:
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Waiting {wait_time:.1f}s...")
            time.sleep(wait_time)
        else:
            resp.raise_for_status()
    
    raise Exception(f"Failed after {max_retries} retries")

Respecting robots.txt

from urllib.robotparser import RobotFileParser

rp = RobotFileParser()
rp.set_url("https://example.com/robots.txt")
rp.read()

if rp.can_fetch("*", "https://example.com/page"):
    # Safe to scrape
    crawl_delay = rp.crawl_delay("*")  # Recommended delay
    print(f"Crawl delay: {crawl_delay}s")

Rate Limiting Strategies

Strategy Implementation Best For
Fixed delay time.sleep(2) Simple scrapers
Random delay time.sleep(random.uniform(1, 3)) Most use cases
Exponential backoff Double delay on each retry Handling 429 errors
Token bucket Allow N requests per minute Production scrapers
Adaptive Slow down when errors increase Large-scale crawling

Token Bucket Rate Limiter

import time
from threading import Lock

class RateLimiter:
    def __init__(self, requests_per_second=1):
        self.rate = requests_per_second
        self.last_request = 0
        self.lock = Lock()
    
    def wait(self):
        with self.lock:
            now = time.time()
            elapsed = now - self.last_request
            wait_time = max(0, (1 / self.rate) - elapsed)
            time.sleep(wait_time)
            self.last_request = time.time()

Let ScraperAPI Handle It

ScraperAPI and ScrapingAnt manage rate limiting for you. They distribute requests across their proxy pool at optimal speeds for each target site.

Best Practices

  1. Start slow, Begin with 1 request per 2-3 seconds
  2. Check robots.txt for crawl delay guidelines
  3. Monitor HTTP 429 responses, They mean you are going too fast
  4. Use random delays, Fixed intervals look robotic
  5. Scrape during off-peak hours, Less load on the server, fewer blocks
  6. Log your request rate, Track requests per minute for debugging