Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Using jq and JSONPath for JSON Parsing

Master jq for command-line JSON processing and JSONPath for querying JSON in Python. Filter, transform, and extract data from API responses.

Data Parsing · #12intermediate3 min read
Share:WhatsAppLinkedIn

When working with JSON API responses, jq (command-line) and JSONPath (Python) give you powerful query languages to slice, filter, and transform data without writing loops.

jq: Command-Line JSON Processing

jq is essential for exploring API responses in the terminal:

# Install jq
# macOS: brew install jq
# Ubuntu: sudo apt install jq

# Pretty-print JSON
curl -s "https://jsonplaceholder.typicode.com/users/1" | jq .

# Extract a single field
curl -s "https://jsonplaceholder.typicode.com/users/1" | jq '.name'
# "Leanne Graham"

# Extract nested fields
curl -s "https://jsonplaceholder.typicode.com/users/1" | jq '.address.city'
# "Gwenborough"

jq with Arrays

# Get all names from an array
curl -s "https://jsonplaceholder.typicode.com/users" | jq '.[].name'

# Select specific fields into new objects
curl -s "https://jsonplaceholder.typicode.com/users" | \
  jq '[.[] | {name: .name, city: .address.city, company: .company.name}]'

# Filter array items
curl -s "https://jsonplaceholder.typicode.com/posts" | \
  jq '[.[] | select(.userId == 1)] | length'
# 10

# First 3 items
curl -s "https://jsonplaceholder.typicode.com/posts" | jq '.[:3]'

jq Recipes for Scraping

# Count items
curl -s "https://api.github.com/users/torvalds/repos" | jq 'length'

# Sort by field
curl -s "https://api.github.com/users/torvalds/repos" | \
  jq 'sort_by(.stargazers_count) | reverse | .[:5] | .[].name'

# Convert to CSV-like output
curl -s "https://jsonplaceholder.typicode.com/users" | \
  jq -r '.[] | [.name, .email, .address.city] | @csv'

# Flatten nested structure
curl -s "https://jsonplaceholder.typicode.com/users" | \
  jq '.[] | {name, email, city: .address.city, lat: .address.geo.lat}'

JSONPath in Python

JSONPath provides XPath-like queries for JSON. Use the jsonpath-ng library:

pip install jsonpath-ng
from jsonpath_ng.ext import parse
import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
users = response.json()

# All names
expr = parse("$[*].name")
names = [match.value for match in expr.find(users)]
print(f"Names: {names[:3]}")

# All cities (nested)
expr = parse("$[*].address.city")
cities = [match.value for match in expr.find(users)]
print(f"Cities: {cities[:3]}")

# Company names
expr = parse("$[*].company.name")
companies = [match.value for match in expr.find(users)]
print(f"Companies: {companies[:3]}")

JSONPath vs jq vs Python

import requests
from jsonpath_ng.ext import parse

response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
data = response.json()

# Task: Get names of users in cities starting with "S"

# Approach 1: Pure Python
result_python = [u["name"] for u in data if u["address"]["city"].startswith("S")]

# Approach 2: JSONPath (with ext for filtering)
expr = parse('$[?address.city =~ "^S"].name')
result_jsonpath = [m.value for m in expr.find(data)]

# Approach 3: jq (in terminal)
# jq '[.[] | select(.address.city | startswith("S"))] | .[].name'

print(f"Python: {result_python}")
print(f"JSONPath: {result_jsonpath}")

jq in Python with pyjq

pip install pyjq
import pyjq
import requests

response = requests.get(
    "https://jsonplaceholder.typicode.com/users",
    timeout=15,
)
data = response.json()

# Use jq syntax directly in Python
names = pyjq.first('[.[].name]', data)
print(f"All names: {names[:3]}")

# Complex query
summary = pyjq.first(
    '[.[] | {name: .name, city: .address.city}] | sort_by(.city)',
    data,
)
for item in summary[:3]:
    print(f"{item['name']} -> {item['city']}")

Quick Reference

Task jq JSONPath
Get field .name $.name
Array element .[0] $[0]
All elements .[] $[*]
Nested field .address.city $.address.city
Filter select(.age > 18) $[?(@.age > 18)]
Length length N/A (use Python)

Next Steps

  • Deduplicate scraped data
  • Normalize and validate JSON data
  • Build end-to-end data processing pipelines