Scraping JSON APIs and Processing Responses - API Scraping

Learn how to scrape JSON APIs, navigate nested response structures, and extract exactly the data you need using Python.

JSON is the standard response format for modern APIs. Knowing how to efficiently navigate and extract data from complex JSON structures is a core scraping skill.

Basic JSON Response Handling

import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
response.raise_for_status()

users = response.json()

# Extract specific fields
for user in users:
    print(f"{user['name']} ({user['email']}) - {user['company']['name']}")

Navigating Deeply Nested JSON

Real-world APIs often return deeply nested structures:

import requests

response = requests.get(
    "https://api.github.com/repos/python/cpython",
    timeout=15,
)
data = response.json()

# Safe nested access with .get() to avoid KeyError
owner_name = data.get("owner", {}).get("login", "unknown")
license_name = data.get("license", {}).get("name", "No license")
stars = data.get("stargazers_count", 0)

print(f"Owner: {owner_name}")
print(f"License: {license_name}")
print(f"Stars: {stars:,}")

Flattening Nested JSON for Analysis

import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
users = response.json()

# Flatten nested fields into simple dicts
flat_users = []
for user in users:
    flat_users.append({
        "name": user["name"],
        "email": user["email"],
        "city": user["address"]["city"],
        "lat": user["address"]["geo"]["lat"],
        "lng": user["address"]["geo"]["lng"],
        "company": user["company"]["name"],
    })

# Now easy to convert to CSV or DataFrame
import csv
with open("users.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=flat_users[0].keys())
    writer.writeheader()
    writer.writerows(flat_users)

print(f"Saved {len(flat_users)} users to users.csv")

Handling Different Response Structures

APIs wrap their data differently. Here are common patterns:

def extract_items(response_json):
    """Handle common API response wrappers."""
    # Pattern 1: Direct list
    if isinstance(response_json, list):
        return response_json

    # Pattern 2: { "data": [...] }
    if "data" in response_json:
        return response_json["data"]

    # Pattern 3: { "results": [...] }
    if "results" in response_json:
        return response_json["results"]

    # Pattern 4: { "items": [...] }
    if "items" in response_json:
        return response_json["items"]

    # Pattern 5: { "response": { "items": [...] } }
    if "response" in response_json:
        return extract_items(response_json["response"])

    return [response_json]

Validating JSON Responses

Always check that the response is valid JSON before processing:

import requests

response = requests.get("https://api.example.com/data", timeout=15)

# Check content type
content_type = response.headers.get("Content-Type", "")
if "application/json" not in content_type:
    print(f"Unexpected content type: {content_type}")

# Safe JSON parsing
try:
    data = response.json()
except requests.exceptions.JSONDecodeError:
    print(f"Invalid JSON response: {response.text[:200]}")
    data = None

When scraping JSON APIs at scale, services like ScraperAPI ensure your requests succeed even against protected endpoints, returning clean JSON you can process directly.

Next Steps

Use async HTTPX for faster JSON API scraping
Parse JSON responses with jq and jsonpath
Build an end-to-end API data pipeline