Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

3.42advanced5 min read

Persisted Queries, The Modern GraphQL Trap

Production GraphQL doesn't accept arbitrary queries, only known hashes. Scrapers must extract the precomputed hashes from the bundle or capture them live.

What you’ll learn

  • Recognise a persisted-query GraphQL endpoint.
  • Extract sha256Hashes from the site's JS bundle.
  • Send persisted queries with hash + variables.
  • Fall back to raw queries when persisted-only mode is relaxed.

Modern production GraphQL doesn't let clients send arbitrary queries. Instead, the server has a registry of approved queries identified by their SHA-256 hash. Clients send the hash + variables; the server looks up the registered query and runs it.

For scrapers this is a problem. Introspection won't tell you what queries are allowed, and arbitrary queries are rejected. You must extract the hashes from the site itself.

Why APIs do this

Two reasons:

  1. Security. No arbitrary queries means no expensive query attacks (deep recursive queries, full table scans). Servers only run pre-vetted operations.
  2. Bandwidth. The client sends a 64-char hash instead of a 4 KB query body. At scale, meaningful.

Apollo's Automatic Persisted Queries (APQ) is the canonical implementation. Shopify, Hasura, and many in-house APIs use a variant.

How a persisted-query request looks

POST /api/graphql/persisted

{
  "operationName": "GetProducts",
  "variables": {"first": 12},
  "extensions": {
  "persistedQuery": {
  "version": 1,
  "sha256Hash": "7e8d2f3a1b4c5e6d8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e"
  }
  }
}

No query field. Just the hash. Server looks up the hash in its registry, runs the query, returns data.

If the hash isn't recognized, response:

{
  "errors": [{"message": "PersistedQueryNotFound", "extensions": {...}}]
}

APQ, the half-way protocol

Apollo's APQ uses a fallback: if the hash isn't known, the client retries with the full query AND the hash, registering it for future use. Servers can disable this fallback (strict APQ).

In strict mode, the only way to query is to know an existing hash.

How to extract hashes

The hashes are baked into the site's JS bundle, often as constants:

// In a Webpack-bundled JS file
const GET_PRODUCTS_QUERY = {
  documentId: "7e8d2f3a1b4c5e6d8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e",
  // ... or
  hash: "7e8d2f3a...",
};

Or as constants in a generated file:

const GetProductsDocument = {
  __meta__: {
  hash: "7e8d2f3a...",
  operationName: "GetProducts",
  },
};

The pattern varies, documentId, hash, sha256Hash, persisted_query_id. Grep for all of these.

Finding hashes, the workflow

  1. Open the site, capture a real GraphQL call in DevTools. Note the sha256Hash value.
  2. Search the JS bundles for that hash. Find the surrounding code.
  3. Inventory all similar hashes near it, they're usually clustered (one operations registry per bundle).
  4. Match each hash to its operation name (often a property on the same object).
curl -s https://target.example.com/static/js/main.bundle.js > main.js
grep -oE '[a-f0-9]{64}' main.js | sort -u | head
grep -B2 -A2 '7e8d2f3a' main.js

Catalog108 example

/challenges/api/graphql/persisted requires sha256Hash. The hashes are in the lab's JS:

import requests, re

# 1. Find the bundle
page = requests.get("https://practice.scrapingcentral.com/challenges/api/graphql/persisted").text
bundle = re.search(r'src="([^"]+\.js[^"]*)"', page).group(1)

# 2. Download
js = requests.get(f"https://practice.scrapingcentral.com{bundle}").text

# 3. Extract hashes, names often paired
matches = re.findall(r'operationName:\s*["\']([^"\']+)["\'][^}]+sha256Hash:\s*["\']([a-f0-9]{64})["\']', js)
hashes = dict(matches)
print(hashes)
# → {"GetProducts": "abc123...", "GetProduct": "def456..."}

# 4. Use a persisted query
r = requests.post(
  "https://practice.scrapingcentral.com/api/graphql/persisted",
  json={
  "operationName": "GetProducts",
  "variables": {"first": 12},
  "extensions": {
  "persistedQuery": {
  "version": 1,
  "sha256Hash": hashes["GetProducts"],
  },
  },
  },
)
print(r.json())

PHP version

use GuzzleHttp\Client;

$client = new Client();

// Extract hashes from bundle (similar logic to Python)
$bundle = $client->get('https://practice.scrapingcentral.com/static/js/persisted-app.js')->getBody()->getContents();
preg_match_all('/operationName:\s*["\']([^"\']+)["\'][^}]+sha256Hash:\s*["\']([a-f0-9]{64})["\']/', $bundle, $m, PREG_SET_ORDER);
$hashes = [];
foreach ($m as $pair) {
  $hashes[$pair[1]] = $pair[2];
}

// Use a persisted query
$res = $client->post('https://practice.scrapingcentral.com/api/graphql/persisted', [
  'json' => [
  'operationName' => 'GetProducts',
  'variables' => ['first' => 12],
  'extensions' => [
  'persistedQuery' => [
  'version' => 1,
  'sha256Hash' => $hashes['GetProducts'],
  ],
  ],
  ],
]);
print_r(json_decode($res->getBody()->getContents(), true));

When extraction is harder

Some sites obfuscate. Two harder cases:

  1. Hashes assembled at runtime. Split into parts, concatenated. Find via Sources breakpoints on the network call.
  2. Custom hash schemes. Not sha256 but something proprietary. Reverse-engineer the hashing function.

The Sources-breakpoint technique (lesson 3.46): set a breakpoint on the line that issues the GraphQL POST, inspect the body just before send. Whatever's there is what the site is sending, copy verbatim.

Persisted-query-only vs flexible

Some implementations:

  • Strict: only registered hashes accepted. You can ONLY use what the site uses.
  • APQ fallback: if hash unknown, send full query + hash, server registers it. You can introduce new queries.
  • Hybrid: introspection disabled but raw queries still accepted on a separate endpoint. Look for it.

Check by trying to send a raw query (no hash) and see if it works.

Working around restrictions

If strict and you need data the registered queries don't expose:

  • Look for variants, the site may have 50 registered queries; one may have the field you need.
  • Look for a sibling endpoint (some sites have a more permissive /admin/graphql or /preview/graphql).
  • Accept the constraint and work within it. Persisted queries are intentional limits.

A complete persisted-query client

class PersistedGqlClient:
  def __init__(self, base: str, bundle_url: str):
  self.base = base
  self.session = requests.Session()
  self.hashes = self._extract(bundle_url)

  def _extract(self, bundle_url: str) -> dict[str, str]:
  js = self.session.get(bundle_url).text
  matches = re.findall(
  r'operationName:\s*["\']([^"\']+)["\'][^}]+sha256Hash:\s*["\']([a-f0-9]{64})["\']',
  js
  )
  return dict(matches)

  def query(self, op_name: str, variables: dict = None) -> dict:
  if op_name not in self.hashes:
  raise ValueError(f"Unknown operation: {op_name}")
  r = self.session.post(self.base, json={
  "operationName": op_name,
  "variables": variables or {},
  "extensions": {
  "persistedQuery": {
  "version": 1,
  "sha256Hash": self.hashes[op_name],
  },
  },
  })
  r.raise_for_status()
  data = r.json()
  if data.get("errors"):
  raise RuntimeError(data["errors"])
  return data["data"]

Hands-on lab

Hit /challenges/api/graphql/persisted on Catalog108. Extract the hashes from the JS bundle. Use one of them to issue a persisted query. Then try sending a raw query (no hash) to the same endpoint, observe the rejection. You've now navigated the most common modern GraphQL hardening pattern.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/api/graphql/persisted

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Persisted Queries, The Modern GraphQL Trap1 / 8

Why do production GraphQL APIs use persisted queries?

Score so far: 0 / 0