Rate Limiting and Polite Scraping
Learn how to implement rate limiting in your scrapers to avoid detection, prevent bans, and scrape responsibly.
Anti-Detection · #9beginner2 min read
Hammering a website with rapid-fire requests is the fastest way to get blocked. Polite scraping means controlling your request rate to stay under the radar and avoid overloading servers.
Why Rate Limiting Matters
- Prevents your IP from being banned
- Avoids triggering anti-bot systems
- Reduces server load on the target site
- Makes your traffic look more human-like
- Keeps you on the right side of ethical scraping
Simple Delay Between Requests
import requests
import time
import random
urls = [
"https://example.com/page/1",
"https://example.com/page/2",
"https://example.com/page/3",
]
for url in urls:
response = requests.get(url, timeout=15)
print(f"{url}: {response.status_code}")
# Random delay between 1-3 seconds
delay = random.uniform(1.0, 3.0)
time.sleep(delay)
Token Bucket Rate Limiter
For more precise control, implement a token bucket algorithm:
import time
import requests
from threading import Lock
class RateLimiter:
def __init__(self, requests_per_second: float):
self.min_interval = 1.0 / requests_per_second
self.last_request = 0.0
self.lock = Lock()
def wait(self):
with self.lock:
now = time.time()
elapsed = now - self.last_request
if elapsed < self.min_interval:
time.sleep(self.min_interval - elapsed)
self.last_request = time.time()
# Allow 2 requests per second
limiter = RateLimiter(requests_per_second=2.0)
for i in range(10):
limiter.wait()
response = requests.get("https://httpbin.org/ip", timeout=15)
print(f"Request {i+1} at {time.strftime('%H:%M:%S')}: {response.status_code}")
Respecting Retry-After Headers
When a server returns HTTP 429 (Too Many Requests), it often includes a Retry-After header:
import requests
import time
def polite_fetch(url, max_retries=3):
for attempt in range(max_retries):
response = requests.get(url, timeout=15)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
continue
return response
return None
Recommended Request Rates
| Site Type | Suggested Rate | Notes |
|---|---|---|
| Small business sites | 1 req/3-5s | Be very gentle |
| Medium sites | 1-2 req/s | Watch for 429s |
| Large platforms | 3-5 req/s | They can handle more |
| APIs with rate limits | Follow their docs | Stay within limits |
Polite Scraping Checklist
- Add random delays between requests (1-5 seconds for most sites)
- Respect
Retry-Afterheaders - Check
robots.txtfor crawl-delay directives - Scrape during off-peak hours when possible
- Identify your bot with a descriptive User-Agent if appropriate
- Cache responses to avoid re-fetching the same pages
Rate limiting also saves you money when using services like ScraperAPI since you consume fewer API credits.