Scraping REST APIs with Python Requests
Master the Python Requests library for scraping REST APIs. Learn GET, POST, headers, query parameters, and error handling.
API Scraping · #2beginner2 min read
The requests library is the go-to tool for making HTTP calls in Python. It handles headers, parameters, cookies, and sessions with a clean, readable API.
Installation
pip install requests
GET Requests with Query Parameters
import requests
url = "https://jsonplaceholder.typicode.com/posts"
params = {"userId": 1, "_limit": 5}
headers = {"Accept": "application/json"}
response = requests.get(url, params=params, headers=headers, timeout=30)
response.raise_for_status()
posts = response.json()
for post in posts:
print(f"[{post['id']}] {post['title'][:50]}")
POST Requests (Form Submission / API Calls)
Some APIs require POST requests with a JSON body:
import requests
url = "https://jsonplaceholder.typicode.com/posts"
payload = {
"title": "Scraping Central",
"body": "API scraping is efficient",
"userId": 1
}
response = requests.post(url, json=payload, timeout=30)
print(response.status_code) # 201
print(response.json()["id"]) # New resource ID
Using Sessions for Persistent Connections
Sessions reuse TCP connections and persist cookies across requests, which is faster and essential for authenticated scraping:
import requests
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
"Accept": "application/json",
})
# All requests in the session share these headers
for page in range(1, 4):
response = session.get(
"https://jsonplaceholder.typicode.com/posts",
params={"_page": page, "_limit": 10},
timeout=30,
)
response.raise_for_status()
data = response.json()
print(f"Page {page}: {len(data)} posts")
Robust Error Handling
import requests
from requests.exceptions import HTTPError, ConnectionError, Timeout
try:
response = requests.get("https://api.example.com/data", timeout=10)
response.raise_for_status()
data = response.json()
except Timeout:
print("Request timed out - try again or increase timeout")
except ConnectionError:
print("Could not connect - check the URL or your network")
except HTTPError as e:
print(f"HTTP error: {e.response.status_code}")
Key Tips
- Always set a timeout to avoid hanging requests
- Use sessions when making many requests to the same host
- Set a realistic User-Agent header to avoid blocks
- Check response.status_code before parsing the body
- For high-volume scraping behind anti-bot walls, route requests through ScraperAPI to handle proxies and retries automatically
Next Steps
- Handle API authentication (API keys, OAuth, Bearer tokens)
- Scrape paginated API endpoints
- Switch to HTTPX for async scraping