Working with GraphQL APIs
Learn how to discover and scrape GraphQL APIs, craft queries, handle variables, and paginate through GraphQL endpoints.
GraphQL APIs expose a single endpoint where you send queries describing exactly the data you want. Many modern apps (Shopify, GitHub, Yelp) use GraphQL, and scraping them requires a different approach than REST.
How GraphQL Differs from REST
| Aspect | REST | GraphQL |
|---|---|---|
| Endpoints | Many (/users, /posts) |
Single (/graphql) |
| Data shape | Fixed by the server | Defined by your query |
| Over-fetching | Common | You get exactly what you ask for |
| Discovery | Swagger/OpenAPI docs | Introspection queries |
Sending a GraphQL Query
All GraphQL requests are POST requests with a JSON body containing a query field:
import requests
url = "https://countries.trevorblades.com/graphql"
query = """
{
countries(filter: { continent: { eq: "AS" } }) {
name
capital
currency
}
}
"""
response = requests.post(
url,
json={"query": query},
timeout=15,
)
response.raise_for_status()
countries = response.json()["data"]["countries"]
for c in countries[:5]:
print(f"{c['name']} - Capital: {c['capital']}, Currency: {c['currency']}")
Using Variables
Variables keep queries clean and reusable:
import requests
url = "https://countries.trevorblades.com/graphql"
query = """
query GetCountry($code: ID!) {
country(code: $code) {
name
capital
languages { name }
}
}
"""
variables = {"code": "IN"}
response = requests.post(
url,
json={"query": query, "variables": variables},
timeout=15,
)
data = response.json()["data"]["country"]
langs = ", ".join(lang["name"] for lang in data["languages"])
print(f"{data['name']} ({data['capital']}) - Languages: {langs}")
Discovering the Schema with Introspection
Many GraphQL endpoints expose their schema:
import requests
introspection_query = """
{
__schema {
queryType { name }
types {
name
fields { name }
}
}
}
"""
response = requests.post(
"https://countries.trevorblades.com/graphql",
json={"query": introspection_query},
timeout=15,
)
types = response.json()["data"]["__schema"]["types"]
for t in types:
if not t["name"].startswith("__") and t["fields"]:
fields = [f["name"] for f in t["fields"]]
print(f"{t['name']}: {', '.join(fields[:5])}")
GraphQL Pagination (Relay-Style Cursors)
import requests
url = "https://api.example.com/graphql"
all_items = []
cursor = None
while True:
query = """
query($cursor: String) {
products(first: 50, after: $cursor) {
edges {
node { id name price }
cursor
}
pageInfo { hasNextPage endCursor }
}
}
"""
variables = {"cursor": cursor}
response = requests.post(url, json={"query": query, "variables": variables}, timeout=15)
data = response.json()["data"]["products"]
for edge in data["edges"]:
all_items.append(edge["node"])
if not data["pageInfo"]["hasNextPage"]:
break
cursor = data["pageInfo"]["endCursor"]
print(f"Collected {len(all_items)} products")
If a GraphQL endpoint is protected by Cloudflare or similar bot detection, proxy your requests through ScrapingAnt for automatic browser fingerprinting and CAPTCHA solving.
Next Steps
- Reverse engineer hidden APIs from JavaScript-heavy sites
- Handle GraphQL rate limits and query complexity limits
- Combine GraphQL scraping with async HTTPX