Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Scraping REST APIs with Python Requests

Master the Python Requests library for scraping REST APIs. Learn GET, POST, headers, query parameters, and error handling.

API Scraping · #2beginner2 min read
Share:WhatsAppLinkedIn

The requests library is the go-to tool for making HTTP calls in Python. It handles headers, parameters, cookies, and sessions with a clean, readable API.

Installation

pip install requests

GET Requests with Query Parameters

import requests

url = "https://jsonplaceholder.typicode.com/posts"
params = {"userId": 1, "_limit": 5}
headers = {"Accept": "application/json"}

response = requests.get(url, params=params, headers=headers, timeout=30)
response.raise_for_status()

posts = response.json()
for post in posts:
    print(f"[{post['id']}] {post['title'][:50]}")

POST Requests (Form Submission / API Calls)

Some APIs require POST requests with a JSON body:

import requests

url = "https://jsonplaceholder.typicode.com/posts"
payload = {
    "title": "Scraping Central",
    "body": "API scraping is efficient",
    "userId": 1
}

response = requests.post(url, json=payload, timeout=30)
print(response.status_code)  # 201
print(response.json()["id"])  # New resource ID

Using Sessions for Persistent Connections

Sessions reuse TCP connections and persist cookies across requests, which is faster and essential for authenticated scraping:

import requests

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
    "Accept": "application/json",
})

# All requests in the session share these headers
for page in range(1, 4):
    response = session.get(
        "https://jsonplaceholder.typicode.com/posts",
        params={"_page": page, "_limit": 10},
        timeout=30,
    )
    response.raise_for_status()
    data = response.json()
    print(f"Page {page}: {len(data)} posts")

Robust Error Handling

import requests
from requests.exceptions import HTTPError, ConnectionError, Timeout

try:
    response = requests.get("https://api.example.com/data", timeout=10)
    response.raise_for_status()
    data = response.json()
except Timeout:
    print("Request timed out - try again or increase timeout")
except ConnectionError:
    print("Could not connect - check the URL or your network")
except HTTPError as e:
    print(f"HTTP error: {e.response.status_code}")

Key Tips

  • Always set a timeout to avoid hanging requests
  • Use sessions when making many requests to the same host
  • Set a realistic User-Agent header to avoid blocks
  • Check response.status_code before parsing the body
  • For high-volume scraping behind anti-bot walls, route requests through ScraperAPI to handle proxies and retries automatically

Next Steps

  • Handle API authentication (API keys, OAuth, Bearer tokens)
  • Scrape paginated API endpoints
  • Switch to HTTPX for async scraping