Tutorial
CAPTCHA Solving for Web Scraping - Complete Guide
How to handle CAPTCHAs when web scraping. Covers reCAPTCHA, hCaptcha, Turnstile, and automated solving services.
CAPTCHAs are one of the biggest obstacles in web scraping. Here is how to deal with every type you will encounter.
Types of CAPTCHAs
| Type | Difficulty | Common On |
|---|---|---|
| reCAPTCHA v2 | Medium | Widely used (checkbox + image) |
| reCAPTCHA v3 | Hard | Invisible, score-based |
| hCaptcha | Medium | Cloudflare sites, many others |
| Cloudflare Turnstile | Hard | Cloudflare-protected sites |
| Image CAPTCHAs | Easy | Legacy sites |
| Text CAPTCHAs | Easy | Older forms |
Strategy 1: Avoid CAPTCHAs Entirely
The best approach is to never trigger CAPTCHAs in the first place.
- Use residential proxies, Clean IPs rarely trigger CAPTCHAs
- Rotate user agents and fingerprints, Avoid detection patterns
- Add realistic delays, Human-like browsing speeds
- Use ScraperAPI, Their smart proxy system avoids CAPTCHAs for most sites
import requests
API_KEY = "YOUR_SCRAPERAPI_KEY"
# ScraperAPI handles CAPTCHA avoidance automatically
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url=https://example.com&render=true"
)
Strategy 2: CAPTCHA Solving Services
When you cannot avoid CAPTCHAs, use solving services.
Popular Services
| Service | reCAPTCHA v2 | reCAPTCHA v3 | hCaptcha | Speed |
|---|---|---|---|---|
| 2Captcha | $2.99/1K | $2.99/1K | $2.99/1K | 20-40s |
| Anti-Captcha | $2.00/1K | $2.00/1K | $2.00/1K | 15-30s |
| CapMonster | $1.20/1K | $1.20/1K | $1.20/1K | 10-20s |
Example: 2Captcha Integration
import requests
import time
CAPTCHA_API_KEY = "YOUR_2CAPTCHA_KEY"
# Step 1: Submit CAPTCHA
resp = requests.post("http://2captcha.com/in.php", data={
"key": CAPTCHA_API_KEY,
"method": "userrecaptcha",
"googlekey": "site_key_here",
"pageurl": "https://example.com/page"
})
captcha_id = resp.text.split("|")[1]
# Step 2: Poll for solution
while True:
time.sleep(10)
result = requests.get(
f"http://2captcha.com/res.php?key={CAPTCHA_API_KEY}&action=get&id={captcha_id}"
)
if "CAPCHA_NOT_READY" not in result.text:
token = result.text.split("|")[1]
break
Strategy 3: Browser Automation
For Turnstile and invisible CAPTCHAs, sometimes a real browser with stealth plugins works.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://example.com")
# Some CAPTCHAs auto-solve for real browsers
page.wait_for_timeout(5000)
Best Practices
- Prevention over solving, Use ScrapingAnt or ScraperAPI to avoid CAPTCHAs entirely
- Budget for solving costs, Factor CAPTCHA solving into your project costs
- Implement fallbacks, Try multiple solving services
- Cache solutions, Some CAPTCHA tokens are valid for minutes
- Monitor solve rates, Track success rates and switch services if needed