Scraping REST APIs with Python Requests - API Scraping

Master the Python Requests library for scraping REST APIs. Learn GET, POST, headers, query parameters, and error handling.

The requests library is the go-to tool for making HTTP calls in Python. It handles headers, parameters, cookies, and sessions with a clean, readable API.

Installation

pip install requests

GET Requests with Query Parameters

import requests

url = "https://jsonplaceholder.typicode.com/posts"
params = {"userId": 1, "_limit": 5}
headers = {"Accept": "application/json"}

response = requests.get(url, params=params, headers=headers, timeout=30)
response.raise_for_status()

posts = response.json()
for post in posts:
    print(f"[{post['id']}] {post['title'][:50]}")

POST Requests (Form Submission / API Calls)

Some APIs require POST requests with a JSON body:

import requests

url = "https://jsonplaceholder.typicode.com/posts"
payload = {
    "title": "Scraping Central",
    "body": "API scraping is efficient",
    "userId": 1
}

response = requests.post(url, json=payload, timeout=30)
print(response.status_code)  # 201
print(response.json()["id"])  # New resource ID

Using Sessions for Persistent Connections

Sessions reuse TCP connections and persist cookies across requests, which is faster and essential for authenticated scraping:

import requests

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
    "Accept": "application/json",
})

# All requests in the session share these headers
for page in range(1, 4):
    response = session.get(
        "https://jsonplaceholder.typicode.com/posts",
        params={"_page": page, "_limit": 10},
        timeout=30,
    )
    response.raise_for_status()
    data = response.json()
    print(f"Page {page}: {len(data)} posts")

Robust Error Handling

import requests
from requests.exceptions import HTTPError, ConnectionError, Timeout

try:
    response = requests.get("https://api.example.com/data", timeout=10)
    response.raise_for_status()
    data = response.json()
except Timeout:
    print("Request timed out - try again or increase timeout")
except ConnectionError:
    print("Could not connect - check the URL or your network")
except HTTPError as e:
    print(f"HTTP error: {e.response.status_code}")

Key Tips

Always set a timeout to avoid hanging requests
Use sessions when making many requests to the same host
Set a realistic User-Agent header to avoid blocks
Check response.status_code before parsing the body
For high-volume scraping behind anti-bot walls, route requests through ScraperAPI to handle proxies and retries automatically

Next Steps

Handle API authentication (API keys, OAuth, Bearer tokens)
Scrape paginated API endpoints
Switch to HTTPX for async scraping