GraphQL Scraping: Queries and Endpoints, APIs, SERPs & Reverse Engineering

GraphQL is a single POST endpoint, a typed schema, and a query language. Different from REST in every direction, and increasingly common.

GraphQL is one endpoint (usually /graphql), one HTTP method (POST), one content type (application/json), and a query language for asking exactly the fields you want. Increasingly common on modern APIs, Shopify, GitHub, Contentful, Hasura, and worth knowing.

Recognizing it in DevTools

Spot a GraphQL API by:

All XHR/POSTs go to a single URL like /graphql or /api/graphql.
Request body: {"query": "{ ... }", "variables": {...}, "operationName": "..."}.
Response body: {"data": {...}, "errors": [...]}.

If you see this shape, it's GraphQL.

Catalog108 endpoint

Catalog108 exposes:

/api/graphql, full GraphQL with introspection enabled.
/api/graphql/no-introspection, rejects introspection queries (covered later).
/api/graphql/persisted, sha256Hash-based persisted queries (lesson 3.42).

Introspection, the magic skeleton key

GraphQL endpoints (when introspection is enabled) expose their entire schema to anyone who asks:

INTROSPECTION_QUERY = """
{
  __schema {
  types {
  name
  kind
  fields {
  name
  type {
  name
  kind
  ofType { name kind }
  }
  }
  }
  }
}
"""

import requests
r = requests.post(
  "https://practice.scrapingcentral.com/api/graphql",
  json={"query": INTROSPECTION_QUERY},
)
schema = r.json()["data"]["__schema"]
print([t["name"] for t in schema["types"] if not t["name"].startswith("__")])

You get back: every type, every field, every relationship. The full API surface, machine-readable. Tools like Insomnia, Postman, and Apollo Studio render this graphically.

If the endpoint disables introspection (lesson 3.42), you fall back to capturing real queries from the site.

Writing queries

query GetProducts {
  products(first: 12) {
  id
  name
  price
  category {
  name
  }
  reviews(first: 5) {
  author
  rating
  text
  }
  }
}

Fetches 12 products, each with selected fields and the first 5 reviews. Two key features:

You pick the fields. No over-fetching. No under-fetching.
Nested resources in one request. No N+1 problem, reviews come with the product.

Sending a query from Python

import requests

QUERY = """
query Products($first: Int!) {
  products(first: $first) {
  id
  name
  price
  }
}
"""

r = requests.post(
  "https://practice.scrapingcentral.com/api/graphql",
  json={
  "query": QUERY,
  "variables": {"first": 12},
  "operationName": "Products",
  },
)
data = r.json()
if "errors" in data:
  print("GraphQL errors:", data["errors"])
else:
  for p in data["data"]["products"]:
  print(p["id"], p["name"], p["price"])

Notes:

Always POST, even though it conceptually "reads."
variables for parameters, keeps queries reusable.
operationName when sending multiple queries; the server picks the named one.

PHP version

use GuzzleHttp\Client;

$client = new Client();
$query = '
query Products($first: Int!) {
  products(first: $first) {
  id name price
  }
}';

$res = $client->post('https://practice.scrapingcentral.com/api/graphql', [
  'json' => [
  'query' => $query,
  'variables' => ['first' => 12],
  'operationName' => 'Products',
  ],
]);

$data = json_decode($res->getBody()->getContents(), true);
foreach ($data['data']['products'] as $p) {
  echo "{$p['id']} {$p['name']} {$p['price']}\n";
}

Pagination patterns

GraphQL APIs typically use one of:

Offset: products(offset: 0, limit: 12).
Cursor (relay-style): products(first: 12, after: "cursor") { edges { cursor node { ... } } pageInfo { hasNextPage endCursor } }.

Relay-style is the standard for serious APIs (Shopify, GitHub):

query Page($cursor: String) {
  products(first: 50, after: $cursor) {
  edges {
  node { id name price }
  cursor
  }
  pageInfo {
  hasNextPage
  endCursor
  }
  }
}

Pagination loop:

cursor = None
while True:
  r = requests.post(URL, json={
  "query": QUERY,
  "variables": {"cursor": cursor},
  })
  page = r.json()["data"]["products"]
  for edge in page["edges"]:
  yield edge["node"]
  if not page["pageInfo"]["hasNextPage"]:
  break
  cursor = page["pageInfo"]["endCursor"]

Error handling

GraphQL almost always returns 200, even on errors. Errors come in the response body:

{
  "data": {"products": null},
  "errors": [
  {"message": "Field 'foo' doesn't exist on type 'Product'", "path": ["products", "foo"]}
  ]
}

Your scraper:

def gql(query, variables=None):
  r = requests.post(URL, json={"query": query, "variables": variables or {}})
  r.raise_for_status()
  data = r.json()
  if data.get("errors"):
  raise RuntimeError(f"GraphQL: {data['errors']}")
  return data["data"]

Mutations

For writes:

mutation CreateReview($input: CreateReviewInput!) {
  createReview(input: $input) {
  id
  rating
  }
}

gql(MUTATION, variables={"input": {"productId": 1, "rating": 5, "text": "..."}})

Conceptually identical to queries; convention is to use mutation keyword for clarity.

Subscriptions

For real-time:

subscription PriceChange {
  priceChanged { productId newPrice }
}

Run over WebSocket, not POST. Lesson 3.43 covers WebSocket scraping.

Auth on GraphQL

Same as REST:

Authorization: Bearer <token> header.
Cookies (with a Session).
API key in header or query string.

Catalog108's /api/graphql accepts the same JWT token issued by /api/auth/login.

Why GraphQL is great for scrapers

One endpoint to learn. Find /graphql, you know the whole API.
Field selection. Pull only what you need; tiny responses.
Nested fetches. Get product + reviews + category in one call.
Introspection. Schema is machine-discoverable.
Stable schema. Breaking changes are usually versioned carefully.

Why GraphQL is annoying for scrapers

POST-only. Cache-unfriendly.
Complex queries can be slow. Some servers throttle by complexity score.
Persisted queries. Some APIs require precomputed query hashes (lesson 3.42).
Server-side query limits. Max depth, max nodes, etc.

Hands-on lab

Hit Catalog108's /challenges/api/graphql/playground. Issue an introspection query and dump the schema. Write a query that fetches products(first: 10) { id name price reviews(first:3) { rating } }. Confirm you get nested data in one call. Then write a paginated loop using cursor-based pagination. Compare to a REST equivalent, the GraphQL version is structurally tighter.

GraphQL Scraping: Queries and Endpoints

What you’ll learn

Recognizing it in DevTools

Catalog108 endpoint

Introspection, the magic skeleton key

Writing queries

Sending a query from Python

PHP version

Pagination patterns

Error handling

Mutations

Subscriptions

Auth on GraphQL

Why GraphQL is great for scrapers

Why GraphQL is annoying for scrapers

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which HTTP method does GraphQL use, even for read queries?