CAPTCHA Solving for Web Scraping - Complete Guide

How to handle CAPTCHAs when web scraping. Covers reCAPTCHA, hCaptcha, Turnstile, and automated solving services.

CAPTCHAs are one of the biggest obstacles in web scraping. Here is how to deal with every type you will encounter.

Types of CAPTCHAs

Type	Difficulty	Common On
reCAPTCHA v2	Medium	Widely used (checkbox + image)
reCAPTCHA v3	Hard	Invisible, score-based
hCaptcha	Medium	Cloudflare sites, many others
Cloudflare Turnstile	Hard	Cloudflare-protected sites
Image CAPTCHAs	Easy	Legacy sites
Text CAPTCHAs	Easy	Older forms

Strategy 1: Avoid CAPTCHAs Entirely

The best approach is to never trigger CAPTCHAs in the first place.

Use residential proxies, Clean IPs rarely trigger CAPTCHAs
Rotate user agents and fingerprints, Avoid detection patterns
Add realistic delays, Human-like browsing speeds
Use ScraperAPI, Their smart proxy system avoids CAPTCHAs for most sites

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
# ScraperAPI handles CAPTCHA avoidance automatically
resp = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url=https://example.com&render=true"
)

Strategy 2: CAPTCHA Solving Services

When you cannot avoid CAPTCHAs, use solving services.

Popular Services

Service	reCAPTCHA v2	reCAPTCHA v3	hCaptcha	Speed
2Captcha	$2.99/1K	$2.99/1K	$2.99/1K	20-40s
Anti-Captcha	$2.00/1K	$2.00/1K	$2.00/1K	15-30s
CapMonster	$1.20/1K	$1.20/1K	$1.20/1K	10-20s

Example: 2Captcha Integration

import requests
import time

CAPTCHA_API_KEY = "YOUR_2CAPTCHA_KEY"

# Step 1: Submit CAPTCHA
resp = requests.post("http://2captcha.com/in.php", data={
    "key": CAPTCHA_API_KEY,
    "method": "userrecaptcha",
    "googlekey": "site_key_here",
    "pageurl": "https://example.com/page"
})
captcha_id = resp.text.split("|")[1]

# Step 2: Poll for solution
while True:
    time.sleep(10)
    result = requests.get(
        f"http://2captcha.com/res.php?key={CAPTCHA_API_KEY}&action=get&id={captcha_id}"
    )
    if "CAPCHA_NOT_READY" not in result.text:
        token = result.text.split("|")[1]
        break

Strategy 3: Browser Automation

For Turnstile and invisible CAPTCHAs, sometimes a real browser with stealth plugins works.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.goto("https://example.com")
    # Some CAPTCHAs auto-solve for real browsers
    page.wait_for_timeout(5000)

Best Practices

Prevention over solving, Use ScrapingAnt or ScraperAPI to avoid CAPTCHAs entirely
Budget for solving costs, Factor CAPTCHA solving into your project costs
Implement fallbacks, Try multiple solving services
Cache solutions, Some CAPTCHA tokens are valid for minutes
Monitor solve rates, Track success rates and switch services if needed