Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Tutorial

CAPTCHA Solving for Web Scraping - Complete Guide

How to handle CAPTCHAs when web scraping. Covers reCAPTCHA, hCaptcha, Turnstile, and automated solving services.

CAPTCHAs are one of the biggest obstacles in web scraping. Here is how to deal with every type you will encounter.

Types of CAPTCHAs

Type Difficulty Common On
reCAPTCHA v2 Medium Widely used (checkbox + image)
reCAPTCHA v3 Hard Invisible, score-based
hCaptcha Medium Cloudflare sites, many others
Cloudflare Turnstile Hard Cloudflare-protected sites
Image CAPTCHAs Easy Legacy sites
Text CAPTCHAs Easy Older forms

Strategy 1: Avoid CAPTCHAs Entirely

The best approach is to never trigger CAPTCHAs in the first place.

  • Use residential proxies, Clean IPs rarely trigger CAPTCHAs
  • Rotate user agents and fingerprints, Avoid detection patterns
  • Add realistic delays, Human-like browsing speeds
  • Use ScraperAPI, Their smart proxy system avoids CAPTCHAs for most sites
import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
# ScraperAPI handles CAPTCHA avoidance automatically
resp = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url=https://example.com&render=true"
)

Strategy 2: CAPTCHA Solving Services

When you cannot avoid CAPTCHAs, use solving services.

Popular Services

Service reCAPTCHA v2 reCAPTCHA v3 hCaptcha Speed
2Captcha $2.99/1K $2.99/1K $2.99/1K 20-40s
Anti-Captcha $2.00/1K $2.00/1K $2.00/1K 15-30s
CapMonster $1.20/1K $1.20/1K $1.20/1K 10-20s

Example: 2Captcha Integration

import requests
import time

CAPTCHA_API_KEY = "YOUR_2CAPTCHA_KEY"

# Step 1: Submit CAPTCHA
resp = requests.post("http://2captcha.com/in.php", data={
    "key": CAPTCHA_API_KEY,
    "method": "userrecaptcha",
    "googlekey": "site_key_here",
    "pageurl": "https://example.com/page"
})
captcha_id = resp.text.split("|")[1]

# Step 2: Poll for solution
while True:
    time.sleep(10)
    result = requests.get(
        f"http://2captcha.com/res.php?key={CAPTCHA_API_KEY}&action=get&id={captcha_id}"
    )
    if "CAPCHA_NOT_READY" not in result.text:
        token = result.text.split("|")[1]
        break

Strategy 3: Browser Automation

For Turnstile and invisible CAPTCHAs, sometimes a real browser with stealth plugins works.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.goto("https://example.com")
    # Some CAPTCHAs auto-solve for real browsers
    page.wait_for_timeout(5000)

Best Practices

  1. Prevention over solving, Use ScrapingAnt or ScraperAPI to avoid CAPTCHAs entirely
  2. Budget for solving costs, Factor CAPTCHA solving into your project costs
  3. Implement fallbacks, Try multiple solving services
  4. Cache solutions, Some CAPTCHA tokens are valid for minutes
  5. Monitor solve rates, Track success rates and switch services if needed