Scraping JSON APIs and Processing Responses
Learn how to scrape JSON APIs, navigate nested response structures, and extract exactly the data you need using Python.
API Scraping · #9beginner2 min read
JSON is the standard response format for modern APIs. Knowing how to efficiently navigate and extract data from complex JSON structures is a core scraping skill.
Basic JSON Response Handling
import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/users",
timeout=15,
)
response.raise_for_status()
users = response.json()
# Extract specific fields
for user in users:
print(f"{user['name']} ({user['email']}) - {user['company']['name']}")
Navigating Deeply Nested JSON
Real-world APIs often return deeply nested structures:
import requests
response = requests.get(
"https://api.github.com/repos/python/cpython",
timeout=15,
)
data = response.json()
# Safe nested access with .get() to avoid KeyError
owner_name = data.get("owner", {}).get("login", "unknown")
license_name = data.get("license", {}).get("name", "No license")
stars = data.get("stargazers_count", 0)
print(f"Owner: {owner_name}")
print(f"License: {license_name}")
print(f"Stars: {stars:,}")
Flattening Nested JSON for Analysis
import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/users",
timeout=15,
)
users = response.json()
# Flatten nested fields into simple dicts
flat_users = []
for user in users:
flat_users.append({
"name": user["name"],
"email": user["email"],
"city": user["address"]["city"],
"lat": user["address"]["geo"]["lat"],
"lng": user["address"]["geo"]["lng"],
"company": user["company"]["name"],
})
# Now easy to convert to CSV or DataFrame
import csv
with open("users.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=flat_users[0].keys())
writer.writeheader()
writer.writerows(flat_users)
print(f"Saved {len(flat_users)} users to users.csv")
Handling Different Response Structures
APIs wrap their data differently. Here are common patterns:
def extract_items(response_json):
"""Handle common API response wrappers."""
# Pattern 1: Direct list
if isinstance(response_json, list):
return response_json
# Pattern 2: { "data": [...] }
if "data" in response_json:
return response_json["data"]
# Pattern 3: { "results": [...] }
if "results" in response_json:
return response_json["results"]
# Pattern 4: { "items": [...] }
if "items" in response_json:
return response_json["items"]
# Pattern 5: { "response": { "items": [...] } }
if "response" in response_json:
return extract_items(response_json["response"])
return [response_json]
Validating JSON Responses
Always check that the response is valid JSON before processing:
import requests
response = requests.get("https://api.example.com/data", timeout=15)
# Check content type
content_type = response.headers.get("Content-Type", "")
if "application/json" not in content_type:
print(f"Unexpected content type: {content_type}")
# Safe JSON parsing
try:
data = response.json()
except requests.exceptions.JSONDecodeError:
print(f"Invalid JSON response: {response.text[:200]}")
data = None
When scraping JSON APIs at scale, services like ScraperAPI ensure your requests succeed even against protected endpoints, returning clean JSON you can process directly.
Next Steps
- Use async HTTPX for faster JSON API scraping
- Parse JSON responses with jq and jsonpath
- Build an end-to-end API data pipeline