Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

CAPTCHA Solving Strategies for Scrapers

Learn strategies to handle CAPTCHAs when web scraping, including solving services, avoidance techniques, and automation.

Anti-Detection · #7intermediate2 min read
Share:WhatsAppLinkedIn

CAPTCHAs are designed to stop automated access. When your scraper encounters one, you have several options ranging from avoidance to solving.

Types of CAPTCHAs You Will Encounter

Type Difficulty Common On
reCAPTCHA v2 Medium Forms, login pages
reCAPTCHA v3 Hard (invisible) Everywhere
hCaptcha Medium Cloudflare sites
Cloudflare Turnstile Hard Cloudflare-protected sites
Image CAPTCHAs Easy Older websites

Strategy 1: Avoid CAPTCHAs Entirely

The best CAPTCHA is one you never see. Use these techniques to avoid triggering them:

  • Rotate proxies and user agents
  • Add realistic delays between requests
  • Use residential IPs instead of datacenter
  • Maintain session cookies properly

Strategy 2: Use a CAPTCHA Solving Service

Services like 2Captcha and Anti-Captcha use human workers to solve CAPTCHAs. Here is how to integrate 2Captcha for reCAPTCHA v2:

import requests
import time

TWOCAPTCHA_KEY = "YOUR_2CAPTCHA_KEY"

def solve_recaptcha_v2(site_key: str, page_url: str) -> str:
    # Step 1: Submit the CAPTCHA
    submit = requests.post("http://2captcha.com/in.php", data={
        "key": TWOCAPTCHA_KEY,
        "method": "userrecaptcha",
        "googlekey": site_key,
        "pageurl": page_url,
        "json": 1,
    }).json()

    task_id = submit["request"]

    # Step 2: Poll for the solution
    for _ in range(30):
        time.sleep(5)
        result = requests.get("http://2captcha.com/res.php", params={
            "key": TWOCAPTCHA_KEY,
            "action": "get",
            "id": task_id,
            "json": 1,
        }).json()

        if result["status"] == 1:
            return result["request"]  # The g-recaptcha-response token

    raise TimeoutError("CAPTCHA solving timed out")

# Usage
token = solve_recaptcha_v2(
    site_key="6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
    page_url="https://example.com/login"
)

# Submit the form with the solved token
requests.post("https://example.com/login", data={
    "username": "user",
    "password": "pass",
    "g-recaptcha-response": token,
})

Strategy 3: Let ScraperAPI Handle It

ScraperAPI automatically solves CAPTCHAs as part of the request pipeline:

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"

# ScraperAPI solves CAPTCHAs automatically
response = requests.get(
    "http://api.scraperapi.com",
    params={
        "api_key": API_KEY,
        "url": "https://captcha-protected-site.com",
        "render": "true",
    },
    timeout=90,
)
print(response.text[:500])

ScrapingAnt also provides built-in CAPTCHA handling.

Cost Comparison

Method Cost per 1000 CAPTCHAs Speed
2Captcha $2-3 20-60 seconds
Anti-Captcha $2-3 20-60 seconds
ScraperAPI Included in plan Automatic
ScrapingAnt Included in plan Automatic

Best Practice

Design your scraper to avoid CAPTCHAs first. Only add solving logic as a fallback. Every CAPTCHA you solve costs money and slows down your scraper.