Parsing JSON Responses in Python - Data Parsing

Learn to parse, navigate, and extract data from JSON API responses in Python using the json module, jmespath, and pandas.

JSON is the standard data format for APIs. Python's built-in json module handles it natively, and tools like jmespath make complex extractions easy.

Basic JSON Parsing

import json

# Parse a JSON string
json_string = '{"name": "ScraperAPI", "price": 49.99, "features": ["proxies", "captcha"]}'
data = json.loads(json_string)

print(data["name"])          # ScraperAPI
print(data["features"][0])   # proxies

Parsing API Responses

import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/users/1",
    timeout=15,
)
user = response.json()  # Automatically parses JSON

print(f"Name: {user['name']}")
print(f"City: {user['address']['city']}")
print(f"Company: {user['company']['name']}")

Safe Navigation of Nested JSON

Real API responses are deeply nested. Use .get() to avoid crashes:

import requests

response = requests.get(
    "https://api.github.com/repos/psf/requests",
    timeout=15,
)
repo = response.json()

# Unsafe: crashes if key is missing
# owner_type = repo["owner"]["type"]

# Safe: returns default value if key is missing
owner_type = repo.get("owner", {}).get("type", "unknown")
license_name = repo.get("license", {}).get("spdx_id", "No license")
topics = repo.get("topics", [])

print(f"Owner type: {owner_type}")
print(f"License: {license_name}")
print(f"Topics: {', '.join(topics[:5])}")

Extracting Data with JMESPath

JMESPath is a query language for JSON, think XPath but for JSON structures:

pip install jmespath

import jmespath
import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
users = response.json()

# Extract all names
names = jmespath.search("[*].name", users)
print(names)

# Extract name and city pairs
info = jmespath.search("[*].{name: name, city: address.city}", users)
for i in info[:3]:
    print(f"{i['name']} lives in {i['city']}")

# Filter users by company
biz = jmespath.search(
    "[?company.bs.contains(@, 'e-commerce')].name",
    users,
)
print(f"E-commerce users: {biz}")

JSON to pandas DataFrame

For analysis, convert JSON directly to a DataFrame:

import pandas as pd
import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/posts",
    timeout=15,
)
posts = response.json()

# Flat JSON -> DataFrame
df = pd.DataFrame(posts)
print(df[["userId", "id", "title"]].head())

# Nested JSON -> use json_normalize
response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
users = response.json()

df = pd.json_normalize(users, sep="_")
print(df[["name", "address_city", "company_name"]].head())

Handling Large JSON Files

For JSON files too large to fit in memory:

import json

# Stream-parse a large JSON array
def parse_large_json(filepath):
    with open(filepath, "r") as f:
        data = json.load(f)

    for item in data:
        yield {
            "id": item["id"],
            "name": item["name"],
            "value": item.get("value", 0),
        }

# Process without loading all into memory at once
for record in parse_large_json("large_dataset.json"):
    print(record)

Common JSON Patterns in APIs

Pattern	Access Method
`{"data": [...]}`	`response.json()["data"]`
`{"results": [...], "total": N}`	`response.json()["results"]`
`{"items": [...], "next": "cursor"}`	`response.json()["items"]`
`[{...}, {...}]`	`response.json()` (direct list)

Next Steps

Use regex for extracting data from non-JSON text
Parse JSON with jq and jsonpath in the command line
Clean and validate parsed JSON data