Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Tutorial

How to Bypass Cloudflare Protection When Scraping

Learn techniques to bypass Cloudflare's anti-bot protection for web scraping. Covers challenge pages, Turnstile CAPTCHAs, and practical solutions.

Cloudflare protects over 20% of all websites, making it the most common anti-bot system scrapers encounter. Here is how to handle it.

Cloudflare's Protection Layers

  1. JavaScript Challenge, A "Checking your browser" interstitial page
  2. Managed Challenge (Turnstile), An interactive CAPTCHA-like challenge
  3. IP Reputation, Blocking known datacenter and VPN IP ranges
  4. Browser Fingerprinting, Detecting headless browsers and bots
  5. Rate Limiting, Blocking IPs that make too many requests

Method 1: ScraperAPI (Recommended)

The simplest solution is using ScraperAPI, which has built-in Cloudflare bypass.

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://cloudflare-protected-site.com"

resp = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"
)
print(resp.status_code)  # 200 - Cloudflare bypassed

This works because ScraperAPI uses real browser instances with residential IPs that pass Cloudflare's checks.

Method 2: Playwright with Stealth

Use Playwright with stealth plugins to mimic a real browser.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)  # Headed mode helps
    context = browser.new_context(
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        viewport={"width": 1920, "height": 1080}
    )
    page = context.new_page()
    page.goto("https://protected-site.com")
    
    # Wait for Cloudflare challenge to resolve
    page.wait_for_load_state("networkidle")
    content = page.content()
    browser.close()

Limitation: This only works for JS challenges, not managed challenges.

Method 3: Cloudscraper Library

The cloudscraper library handles basic Cloudflare JS challenges.

import cloudscraper

scraper = cloudscraper.create_scraper()
resp = scraper.get("https://protected-site.com")
print(resp.text)

Limitation: Only works for older Cloudflare versions. Frequently breaks.

What Does NOT Work

  • Simple requests.get(), Immediately blocked
  • Headless Chrome without stealth, Detected and blocked
  • Free proxies, IP reputation too low
  • Disabling JavaScript, Challenge requires JS execution

Best Practices

  1. Use ScraperAPI or ScrapingAnt, They maintain Cloudflare bypass as a core feature
  2. Use residential proxies, Datacenter IPs are flagged by Cloudflare
  3. Rotate browser fingerprints, Not just IPs, but TLS fingerprints and headers
  4. Add realistic delays, Instant page loads trigger bot detection
  5. Cache cf_clearance cookies, Reuse valid Cloudflare sessions when possible