Proxy Basics for Web Scraping
Understand proxy types, when to use them, and how to integrate proxies into your Python scrapers.
Anti-Detection · #1beginner2 min read
When scraping at scale, websites will eventually block your IP address. Proxies route your requests through different IP addresses to avoid detection.
Types of Proxies
| Type | Speed | Cost | Detection Risk | Best For |
|---|---|---|---|---|
| Datacenter | Fast | Cheap ($1-5/GB) | Higher | General scraping, APIs |
| Residential | Medium | Expensive ($5-15/GB) | Lower | Anti-bot sites |
| Mobile | Slow | Most expensive | Lowest | Heavily protected sites |
| ISP | Fast | Mid-range | Low | Balance of speed and stealth |
Using Proxies with Requests
import requests
proxies = {
"http": "http://user:pass@proxy-server:8080",
"https": "http://user:pass@proxy-server:8080",
}
response = requests.get(
"https://httpbin.org/ip",
proxies=proxies,
timeout=30
)
print(response.json())
Rotating Proxies
For large-scale scraping, use a rotating proxy service that automatically assigns a different IP per request:
import requests
# Most proxy services provide a single endpoint that rotates IPs
proxy_url = "http://user:pass@gate.smartproxy.com:7777"
for i in range(10):
response = requests.get(
"https://httpbin.org/ip",
proxies={"http": proxy_url, "https": proxy_url}
)
print(f"Request {i+1}: {response.json()['origin']}")
When Do You Need Proxies?
- Scraping more than a few hundred pages from one site
- Getting 403 Forbidden or CAPTCHA responses
- Scraping sites with aggressive anti-bot measures
- Need to appear from specific geographic locations
Choosing a Proxy Provider
Look for:
- IP pool size, larger pools mean less chance of getting a used/blocked IP
- Geographic coverage, match your target site's audience
- Rotation options, per-request, sticky sessions, or timed rotation
- Bandwidth pricing, compare cost per GB
- Success rate, some providers guarantee high success rates
Next Steps
- Learn to handle proxy failures and retries
- Combine proxies with user agent rotation
- Set up proxy authentication and session management