Using ScraperAPI with Python
Integrate ScraperAPI into your Python scrapers for automatic proxy rotation, CAPTCHA solving, and JavaScript rendering. Complete guide with code examples.
ScraperAPI handles the hardest parts of web scraping, proxy rotation, CAPTCHA solving, browser fingerprinting, and JavaScript rendering, through a simple API call. Instead of managing your own proxy pool, you send requests through ScraperAPI and get back clean HTML.
Getting Started
- Sign up at scraperapi.com to get your API key
- Install requests (if you have not already):
pip install requests beautifulsoup4
Method 1: API Endpoint
The simplest integration, pass your target URL as a parameter.
import requests
from bs4 import BeautifulSoup
API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://quotes.toscrape.com/"
api_url = f"http://api.scraperapi.com/?api_key={API_KEY}&url={url}"
response = requests.get(api_url)
soup = BeautifulSoup(response.text, "html.parser")
for quote in soup.select("div.quote"):
text = quote.select_one("span.text").get_text()
author = quote.select_one("small.author").get_text()
print(f"{author}: {text[:50]}...")
Method 2: Proxy Mode
Use ScraperAPI as a proxy, minimal code changes to your existing scraper.
import requests
from bs4 import BeautifulSoup
API_KEY = "YOUR_SCRAPERAPI_KEY"
proxies = {
"http": f"http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001",
"https": f"http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001",
}
response = requests.get(
"https://quotes.toscrape.com/",
proxies=proxies,
verify=False,
)
soup = BeautifulSoup(response.text, "html.parser")
print(f"Found {len(soup.select('div.quote'))} quotes")
JavaScript Rendering
Add render=true to scrape pages that load content with JavaScript.
import requests
from bs4 import BeautifulSoup
API_KEY = "YOUR_SCRAPERAPI_KEY"
params = {
"api_key": API_KEY,
"url": "https://example.com/spa-page",
"render": "true", # Renders JavaScript before returning HTML
}
response = requests.get("http://api.scraperapi.com/", params=params)
soup = BeautifulSoup(response.text, "html.parser")
Geo-Targeting
Get localized results by specifying a country.
import requests
API_KEY = "YOUR_SCRAPERAPI_KEY"
params = {
"api_key": API_KEY,
"url": "https://www.google.com/search?q=web+scraping",
"country_code": "us", # Get US results
}
response = requests.get("http://api.scraperapi.com/", params=params)
print(response.text[:500])
Scraping Multiple Pages with ScraperAPI
import requests
from bs4 import BeautifulSoup
import time
API_KEY = "YOUR_SCRAPERAPI_KEY"
all_quotes = []
for page in range(1, 6):
target_url = f"https://quotes.toscrape.com/page/{page}/"
params = {"api_key": API_KEY, "url": target_url}
response = requests.get("http://api.scraperapi.com/", params=params)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
for quote in soup.select("div.quote"):
all_quotes.append({
"text": quote.select_one("span.text").get_text(),
"author": quote.select_one("small.author").get_text(),
})
else:
print(f"Page {page} failed: {response.status_code}")
time.sleep(1) # Respect rate limits
print(f"Total quotes scraped: {len(all_quotes)}")
ScraperAPI Features
| Feature | Parameter | Example |
|---|---|---|
| JavaScript rendering | render=true |
SPAs, dynamic sites |
| Geo-targeting | country_code=us |
Localized results |
| Premium proxies | premium=true |
Hard-to-scrape sites |
| Session stickiness | session_number=123 |
Multi-page sessions |
| Custom headers | keep_headers=true |
Pass your own headers |
Tips
- Start with the free tier (5,000 API credits) to test your scraper before scaling.
- Use
session_numberwhen scraping multi-step flows (login, then scrape) to keep the same IP. - Enable
render=trueonly when you need it, it uses more API credits. - ScraperAPI automatically retries failed requests, so you can simplify your error-handling code.
Next Steps
- Try ScrapingAnt as an alternative proxy and rendering service
- Learn lxml and XPath for faster HTML parsing