Using jq and JSONPath for JSON Parsing
Master jq for command-line JSON processing and JSONPath for querying JSON in Python. Filter, transform, and extract data from API responses.
Data Parsing · #12intermediate3 min read
When working with JSON API responses, jq (command-line) and JSONPath (Python) give you powerful query languages to slice, filter, and transform data without writing loops.
jq: Command-Line JSON Processing
jq is essential for exploring API responses in the terminal:
# Install jq
# macOS: brew install jq
# Ubuntu: sudo apt install jq
# Pretty-print JSON
curl -s "https://jsonplaceholder.typicode.com/users/1" | jq .
# Extract a single field
curl -s "https://jsonplaceholder.typicode.com/users/1" | jq '.name'
# "Leanne Graham"
# Extract nested fields
curl -s "https://jsonplaceholder.typicode.com/users/1" | jq '.address.city'
# "Gwenborough"
jq with Arrays
# Get all names from an array
curl -s "https://jsonplaceholder.typicode.com/users" | jq '.[].name'
# Select specific fields into new objects
curl -s "https://jsonplaceholder.typicode.com/users" | \
jq '[.[] | {name: .name, city: .address.city, company: .company.name}]'
# Filter array items
curl -s "https://jsonplaceholder.typicode.com/posts" | \
jq '[.[] | select(.userId == 1)] | length'
# 10
# First 3 items
curl -s "https://jsonplaceholder.typicode.com/posts" | jq '.[:3]'
jq Recipes for Scraping
# Count items
curl -s "https://api.github.com/users/torvalds/repos" | jq 'length'
# Sort by field
curl -s "https://api.github.com/users/torvalds/repos" | \
jq 'sort_by(.stargazers_count) | reverse | .[:5] | .[].name'
# Convert to CSV-like output
curl -s "https://jsonplaceholder.typicode.com/users" | \
jq -r '.[] | [.name, .email, .address.city] | @csv'
# Flatten nested structure
curl -s "https://jsonplaceholder.typicode.com/users" | \
jq '.[] | {name, email, city: .address.city, lat: .address.geo.lat}'
JSONPath in Python
JSONPath provides XPath-like queries for JSON. Use the jsonpath-ng library:
pip install jsonpath-ng
from jsonpath_ng.ext import parse
import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/users",
timeout=15,
)
users = response.json()
# All names
expr = parse("$[*].name")
names = [match.value for match in expr.find(users)]
print(f"Names: {names[:3]}")
# All cities (nested)
expr = parse("$[*].address.city")
cities = [match.value for match in expr.find(users)]
print(f"Cities: {cities[:3]}")
# Company names
expr = parse("$[*].company.name")
companies = [match.value for match in expr.find(users)]
print(f"Companies: {companies[:3]}")
JSONPath vs jq vs Python
import requests
from jsonpath_ng.ext import parse
response = requests.get(
"https://jsonplaceholder.typicode.com/users",
timeout=15,
)
data = response.json()
# Task: Get names of users in cities starting with "S"
# Approach 1: Pure Python
result_python = [u["name"] for u in data if u["address"]["city"].startswith("S")]
# Approach 2: JSONPath (with ext for filtering)
expr = parse('$[?address.city =~ "^S"].name')
result_jsonpath = [m.value for m in expr.find(data)]
# Approach 3: jq (in terminal)
# jq '[.[] | select(.address.city | startswith("S"))] | .[].name'
print(f"Python: {result_python}")
print(f"JSONPath: {result_jsonpath}")
jq in Python with pyjq
pip install pyjq
import pyjq
import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/users",
timeout=15,
)
data = response.json()
# Use jq syntax directly in Python
names = pyjq.first('[.[].name]', data)
print(f"All names: {names[:3]}")
# Complex query
summary = pyjq.first(
'[.[] | {name: .name, city: .address.city}] | sort_by(.city)',
data,
)
for item in summary[:3]:
print(f"{item['name']} -> {item['city']}")
Quick Reference
| Task | jq | JSONPath |
|---|---|---|
| Get field | .name |
$.name |
| Array element | .[0] |
$[0] |
| All elements | .[] |
$[*] |
| Nested field | .address.city |
$.address.city |
| Filter | select(.age > 18) |
$[?(@.age > 18)] |
| Length | length |
N/A (use Python) |
Next Steps
- Deduplicate scraped data
- Normalize and validate JSON data
- Build end-to-end data processing pipelines