Parsing JSON Responses in Python
Learn to parse, navigate, and extract data from JSON API responses in Python using the json module, jmespath, and pandas.
Data Parsing · #4beginner3 min read
JSON is the standard data format for APIs. Python's built-in json module handles it natively, and tools like jmespath make complex extractions easy.
Basic JSON Parsing
import json
# Parse a JSON string
json_string = '{"name": "ScraperAPI", "price": 49.99, "features": ["proxies", "captcha"]}'
data = json.loads(json_string)
print(data["name"]) # ScraperAPI
print(data["features"][0]) # proxies
Parsing API Responses
import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/users/1",
timeout=15,
)
user = response.json() # Automatically parses JSON
print(f"Name: {user['name']}")
print(f"City: {user['address']['city']}")
print(f"Company: {user['company']['name']}")
Safe Navigation of Nested JSON
Real API responses are deeply nested. Use .get() to avoid crashes:
import requests
response = requests.get(
"https://api.github.com/repos/psf/requests",
timeout=15,
)
repo = response.json()
# Unsafe: crashes if key is missing
# owner_type = repo["owner"]["type"]
# Safe: returns default value if key is missing
owner_type = repo.get("owner", {}).get("type", "unknown")
license_name = repo.get("license", {}).get("spdx_id", "No license")
topics = repo.get("topics", [])
print(f"Owner type: {owner_type}")
print(f"License: {license_name}")
print(f"Topics: {', '.join(topics[:5])}")
Extracting Data with JMESPath
JMESPath is a query language for JSON, think XPath but for JSON structures:
pip install jmespath
import jmespath
import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/users",
timeout=15,
)
users = response.json()
# Extract all names
names = jmespath.search("[*].name", users)
print(names)
# Extract name and city pairs
info = jmespath.search("[*].{name: name, city: address.city}", users)
for i in info[:3]:
print(f"{i['name']} lives in {i['city']}")
# Filter users by company
biz = jmespath.search(
"[?company.bs.contains(@, 'e-commerce')].name",
users,
)
print(f"E-commerce users: {biz}")
JSON to pandas DataFrame
For analysis, convert JSON directly to a DataFrame:
import pandas as pd
import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/posts",
timeout=15,
)
posts = response.json()
# Flat JSON -> DataFrame
df = pd.DataFrame(posts)
print(df[["userId", "id", "title"]].head())
# Nested JSON -> use json_normalize
response = requests.get(
"https://jsonplaceholder.typicode.com/users",
timeout=15,
)
users = response.json()
df = pd.json_normalize(users, sep="_")
print(df[["name", "address_city", "company_name"]].head())
Handling Large JSON Files
For JSON files too large to fit in memory:
import json
# Stream-parse a large JSON array
def parse_large_json(filepath):
with open(filepath, "r") as f:
data = json.load(f)
for item in data:
yield {
"id": item["id"],
"name": item["name"],
"value": item.get("value", 0),
}
# Process without loading all into memory at once
for record in parse_large_json("large_dataset.json"):
print(record)
Common JSON Patterns in APIs
| Pattern | Access Method |
|---|---|
{"data": [...]} |
response.json()["data"] |
{"results": [...], "total": N} |
response.json()["results"] |
{"items": [...], "next": "cursor"} |
response.json()["items"] |
[{...}, {...}] |
response.json() (direct list) |
Next Steps
- Use regex for extracting data from non-JSON text
- Parse JSON with jq and jsonpath in the command line
- Clean and validate parsed JSON data