Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Working with GraphQL APIs

Learn how to discover and scrape GraphQL APIs, craft queries, handle variables, and paginate through GraphQL endpoints.

API Scraping · #5intermediate3 min read
Share:WhatsAppLinkedIn

GraphQL APIs expose a single endpoint where you send queries describing exactly the data you want. Many modern apps (Shopify, GitHub, Yelp) use GraphQL, and scraping them requires a different approach than REST.

How GraphQL Differs from REST

Aspect REST GraphQL
Endpoints Many (/users, /posts) Single (/graphql)
Data shape Fixed by the server Defined by your query
Over-fetching Common You get exactly what you ask for
Discovery Swagger/OpenAPI docs Introspection queries

Sending a GraphQL Query

All GraphQL requests are POST requests with a JSON body containing a query field:

import requests

url = "https://countries.trevorblades.com/graphql"

query = """
{
  countries(filter: { continent: { eq: "AS" } }) {
    name
    capital
    currency
  }
}
"""

response = requests.post(
    url,
    json={"query": query},
    timeout=15,
)
response.raise_for_status()

countries = response.json()["data"]["countries"]
for c in countries[:5]:
    print(f"{c['name']} - Capital: {c['capital']}, Currency: {c['currency']}")

Using Variables

Variables keep queries clean and reusable:

import requests

url = "https://countries.trevorblades.com/graphql"

query = """
query GetCountry($code: ID!) {
  country(code: $code) {
    name
    capital
    languages { name }
  }
}
"""

variables = {"code": "IN"}

response = requests.post(
    url,
    json={"query": query, "variables": variables},
    timeout=15,
)
data = response.json()["data"]["country"]
langs = ", ".join(lang["name"] for lang in data["languages"])
print(f"{data['name']} ({data['capital']}) - Languages: {langs}")

Discovering the Schema with Introspection

Many GraphQL endpoints expose their schema:

import requests

introspection_query = """
{
  __schema {
    queryType { name }
    types {
      name
      fields { name }
    }
  }
}
"""

response = requests.post(
    "https://countries.trevorblades.com/graphql",
    json={"query": introspection_query},
    timeout=15,
)

types = response.json()["data"]["__schema"]["types"]
for t in types:
    if not t["name"].startswith("__") and t["fields"]:
        fields = [f["name"] for f in t["fields"]]
        print(f"{t['name']}: {', '.join(fields[:5])}")

GraphQL Pagination (Relay-Style Cursors)

import requests

url = "https://api.example.com/graphql"
all_items = []
cursor = None

while True:
    query = """
    query($cursor: String) {
      products(first: 50, after: $cursor) {
        edges {
          node { id name price }
          cursor
        }
        pageInfo { hasNextPage endCursor }
      }
    }
    """
    variables = {"cursor": cursor}
    response = requests.post(url, json={"query": query, "variables": variables}, timeout=15)
    data = response.json()["data"]["products"]

    for edge in data["edges"]:
        all_items.append(edge["node"])

    if not data["pageInfo"]["hasNextPage"]:
        break
    cursor = data["pageInfo"]["endCursor"]

print(f"Collected {len(all_items)} products")

If a GraphQL endpoint is protected by Cloudflare or similar bot detection, proxy your requests through ScrapingAnt for automatic browser fingerprinting and CAPTCHA solving.

Next Steps

  • Reverse engineer hidden APIs from JavaScript-heavy sites
  • Handle GraphQL rate limits and query complexity limits
  • Combine GraphQL scraping with async HTTPX