CAPTCHA Solving Strategies for Scrapers
Learn strategies to handle CAPTCHAs when web scraping, including solving services, avoidance techniques, and automation.
CAPTCHAs are designed to stop automated access. When your scraper encounters one, you have several options ranging from avoidance to solving.
Types of CAPTCHAs You Will Encounter
| Type | Difficulty | Common On |
|---|---|---|
| reCAPTCHA v2 | Medium | Forms, login pages |
| reCAPTCHA v3 | Hard (invisible) | Everywhere |
| hCaptcha | Medium | Cloudflare sites |
| Cloudflare Turnstile | Hard | Cloudflare-protected sites |
| Image CAPTCHAs | Easy | Older websites |
Strategy 1: Avoid CAPTCHAs Entirely
The best CAPTCHA is one you never see. Use these techniques to avoid triggering them:
- Rotate proxies and user agents
- Add realistic delays between requests
- Use residential IPs instead of datacenter
- Maintain session cookies properly
Strategy 2: Use a CAPTCHA Solving Service
Services like 2Captcha and Anti-Captcha use human workers to solve CAPTCHAs. Here is how to integrate 2Captcha for reCAPTCHA v2:
import requests
import time
TWOCAPTCHA_KEY = "YOUR_2CAPTCHA_KEY"
def solve_recaptcha_v2(site_key: str, page_url: str) -> str:
# Step 1: Submit the CAPTCHA
submit = requests.post("http://2captcha.com/in.php", data={
"key": TWOCAPTCHA_KEY,
"method": "userrecaptcha",
"googlekey": site_key,
"pageurl": page_url,
"json": 1,
}).json()
task_id = submit["request"]
# Step 2: Poll for the solution
for _ in range(30):
time.sleep(5)
result = requests.get("http://2captcha.com/res.php", params={
"key": TWOCAPTCHA_KEY,
"action": "get",
"id": task_id,
"json": 1,
}).json()
if result["status"] == 1:
return result["request"] # The g-recaptcha-response token
raise TimeoutError("CAPTCHA solving timed out")
# Usage
token = solve_recaptcha_v2(
site_key="6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
page_url="https://example.com/login"
)
# Submit the form with the solved token
requests.post("https://example.com/login", data={
"username": "user",
"password": "pass",
"g-recaptcha-response": token,
})
Strategy 3: Let ScraperAPI Handle It
ScraperAPI automatically solves CAPTCHAs as part of the request pipeline:
import requests
API_KEY = "YOUR_SCRAPERAPI_KEY"
# ScraperAPI solves CAPTCHAs automatically
response = requests.get(
"http://api.scraperapi.com",
params={
"api_key": API_KEY,
"url": "https://captcha-protected-site.com",
"render": "true",
},
timeout=90,
)
print(response.text[:500])
ScrapingAnt also provides built-in CAPTCHA handling.
Cost Comparison
| Method | Cost per 1000 CAPTCHAs | Speed |
|---|---|---|
| 2Captcha | $2-3 | 20-60 seconds |
| Anti-Captcha | $2-3 | 20-60 seconds |
| ScraperAPI | Included in plan | Automatic |
| ScrapingAnt | Included in plan | Automatic |
Best Practice
Design your scraper to avoid CAPTCHAs first. Only add solving logic as a fallback. Every CAPTCHA you solve costs money and slows down your scraper.