Tutorial
How to Bypass Cloudflare Protection When Scraping
Learn techniques to bypass Cloudflare's anti-bot protection for web scraping. Covers challenge pages, Turnstile CAPTCHAs, and practical solutions.
Cloudflare protects over 20% of all websites, making it the most common anti-bot system scrapers encounter. Here is how to handle it.
Cloudflare's Protection Layers
- JavaScript Challenge, A "Checking your browser" interstitial page
- Managed Challenge (Turnstile), An interactive CAPTCHA-like challenge
- IP Reputation, Blocking known datacenter and VPN IP ranges
- Browser Fingerprinting, Detecting headless browsers and bots
- Rate Limiting, Blocking IPs that make too many requests
Method 1: ScraperAPI (Recommended)
The simplest solution is using ScraperAPI, which has built-in Cloudflare bypass.
import requests
API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://cloudflare-protected-site.com"
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"
)
print(resp.status_code) # 200 - Cloudflare bypassed
This works because ScraperAPI uses real browser instances with residential IPs that pass Cloudflare's checks.
Method 2: Playwright with Stealth
Use Playwright with stealth plugins to mimic a real browser.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False) # Headed mode helps
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
viewport={"width": 1920, "height": 1080}
)
page = context.new_page()
page.goto("https://protected-site.com")
# Wait for Cloudflare challenge to resolve
page.wait_for_load_state("networkidle")
content = page.content()
browser.close()
Limitation: This only works for JS challenges, not managed challenges.
Method 3: Cloudscraper Library
The cloudscraper library handles basic Cloudflare JS challenges.
import cloudscraper
scraper = cloudscraper.create_scraper()
resp = scraper.get("https://protected-site.com")
print(resp.text)
Limitation: Only works for older Cloudflare versions. Frequently breaks.
What Does NOT Work
- Simple
requests.get(), Immediately blocked - Headless Chrome without stealth, Detected and blocked
- Free proxies, IP reputation too low
- Disabling JavaScript, Challenge requires JS execution
Best Practices
- Use ScraperAPI or ScrapingAnt, They maintain Cloudflare bypass as a core feature
- Use residential proxies, Datacenter IPs are flagged by Cloudflare
- Rotate browser fingerprints, Not just IPs, but TLS fingerprints and headers
- Add realistic delays, Instant page loads trigger bot detection
- Cache cf_clearance cookies, Reuse valid Cloudflare sessions when possible