Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Parsing JSON Responses in Python

Learn to parse, navigate, and extract data from JSON API responses in Python using the json module, jmespath, and pandas.

Data Parsing · #4beginner3 min read
Share:WhatsAppLinkedIn

JSON is the standard data format for APIs. Python's built-in json module handles it natively, and tools like jmespath make complex extractions easy.

Basic JSON Parsing

import json

# Parse a JSON string
json_string = '{"name": "ScraperAPI", "price": 49.99, "features": ["proxies", "captcha"]}'
data = json.loads(json_string)

print(data["name"])          # ScraperAPI
print(data["features"][0])   # proxies

Parsing API Responses

import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/users/1",
    timeout=15,
)
user = response.json()  # Automatically parses JSON

print(f"Name: {user['name']}")
print(f"City: {user['address']['city']}")
print(f"Company: {user['company']['name']}")

Safe Navigation of Nested JSON

Real API responses are deeply nested. Use .get() to avoid crashes:

import requests

response = requests.get(
    "https://api.github.com/repos/psf/requests",
    timeout=15,
)
repo = response.json()

# Unsafe: crashes if key is missing
# owner_type = repo["owner"]["type"]

# Safe: returns default value if key is missing
owner_type = repo.get("owner", {}).get("type", "unknown")
license_name = repo.get("license", {}).get("spdx_id", "No license")
topics = repo.get("topics", [])

print(f"Owner type: {owner_type}")
print(f"License: {license_name}")
print(f"Topics: {', '.join(topics[:5])}")

Extracting Data with JMESPath

JMESPath is a query language for JSON, think XPath but for JSON structures:

pip install jmespath
import jmespath
import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
users = response.json()

# Extract all names
names = jmespath.search("[*].name", users)
print(names)

# Extract name and city pairs
info = jmespath.search("[*].{name: name, city: address.city}", users)
for i in info[:3]:
    print(f"{i['name']} lives in {i['city']}")

# Filter users by company
biz = jmespath.search(
    "[?company.bs.contains(@, 'e-commerce')].name",
    users,
)
print(f"E-commerce users: {biz}")

JSON to pandas DataFrame

For analysis, convert JSON directly to a DataFrame:

import pandas as pd
import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/posts",
    timeout=15,
)
posts = response.json()

# Flat JSON -> DataFrame
df = pd.DataFrame(posts)
print(df[["userId", "id", "title"]].head())

# Nested JSON -> use json_normalize
response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
users = response.json()

df = pd.json_normalize(users, sep="_")
print(df[["name", "address_city", "company_name"]].head())

Handling Large JSON Files

For JSON files too large to fit in memory:

import json

# Stream-parse a large JSON array
def parse_large_json(filepath):
    with open(filepath, "r") as f:
        data = json.load(f)

    for item in data:
        yield {
            "id": item["id"],
            "name": item["name"],
            "value": item.get("value", 0),
        }

# Process without loading all into memory at once
for record in parse_large_json("large_dataset.json"):
    print(record)

Common JSON Patterns in APIs

Pattern Access Method
{"data": [...]} response.json()["data"]
{"results": [...], "total": N} response.json()["results"]
{"items": [...], "next": "cursor"} response.json()["items"]
[{...}, {...}] response.json() (direct list)

Next Steps

  • Use regex for extracting data from non-JSON text
  • Parse JSON with jq and jsonpath in the command line
  • Clean and validate parsed JSON data