CAPTCHA Solving Strategies for Scrapers - Anti-Detection

Learn strategies to handle CAPTCHAs when web scraping, including solving services, avoidance techniques, and automation.

CAPTCHAs are designed to stop automated access. When your scraper encounters one, you have several options ranging from avoidance to solving.

Types of CAPTCHAs You Will Encounter

Type	Difficulty	Common On
reCAPTCHA v2	Medium	Forms, login pages
reCAPTCHA v3	Hard (invisible)	Everywhere
hCaptcha	Medium	Cloudflare sites
Cloudflare Turnstile	Hard	Cloudflare-protected sites
Image CAPTCHAs	Easy	Older websites

Strategy 1: Avoid CAPTCHAs Entirely

The best CAPTCHA is one you never see. Use these techniques to avoid triggering them:

Rotate proxies and user agents
Add realistic delays between requests
Use residential IPs instead of datacenter
Maintain session cookies properly

Strategy 2: Use a CAPTCHA Solving Service

Services like 2Captcha and Anti-Captcha use human workers to solve CAPTCHAs. Here is how to integrate 2Captcha for reCAPTCHA v2:

import requests
import time

TWOCAPTCHA_KEY = "YOUR_2CAPTCHA_KEY"

def solve_recaptcha_v2(site_key: str, page_url: str) -> str:
    # Step 1: Submit the CAPTCHA
    submit = requests.post("http://2captcha.com/in.php", data={
        "key": TWOCAPTCHA_KEY,
        "method": "userrecaptcha",
        "googlekey": site_key,
        "pageurl": page_url,
        "json": 1,
    }).json()

    task_id = submit["request"]

    # Step 2: Poll for the solution
    for _ in range(30):
        time.sleep(5)
        result = requests.get("http://2captcha.com/res.php", params={
            "key": TWOCAPTCHA_KEY,
            "action": "get",
            "id": task_id,
            "json": 1,
        }).json()

        if result["status"] == 1:
            return result["request"]  # The g-recaptcha-response token

    raise TimeoutError("CAPTCHA solving timed out")

# Usage
token = solve_recaptcha_v2(
    site_key="6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
    page_url="https://example.com/login"
)

# Submit the form with the solved token
requests.post("https://example.com/login", data={
    "username": "user",
    "password": "pass",
    "g-recaptcha-response": token,
})

Strategy 3: Let ScraperAPI Handle It

ScraperAPI automatically solves CAPTCHAs as part of the request pipeline:

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"

# ScraperAPI solves CAPTCHAs automatically
response = requests.get(
    "http://api.scraperapi.com",
    params={
        "api_key": API_KEY,
        "url": "https://captcha-protected-site.com",
        "render": "true",
    },
    timeout=90,
)
print(response.text[:500])

ScrapingAnt also provides built-in CAPTCHA handling.

Cost Comparison

Method	Cost per 1000 CAPTCHAs	Speed
2Captcha	$2-3	20-60 seconds
Anti-Captcha	$2-3	20-60 seconds
ScraperAPI	Included in plan	Automatic
ScrapingAnt	Included in plan	Automatic

Best Practice

Design your scraper to avoid CAPTCHAs first. Only add solving logic as a fallback. Every CAPTCHA you solve costs money and slows down your scraper.