Persisted Queries, The Modern GraphQL Trap
Production GraphQL doesn't accept arbitrary queries, only known hashes. Scrapers must extract the precomputed hashes from the bundle or capture them live.
What you’ll learn
- Recognise a persisted-query GraphQL endpoint.
- Extract sha256Hashes from the site's JS bundle.
- Send persisted queries with hash + variables.
- Fall back to raw queries when persisted-only mode is relaxed.
Modern production GraphQL doesn't let clients send arbitrary queries. Instead, the server has a registry of approved queries identified by their SHA-256 hash. Clients send the hash + variables; the server looks up the registered query and runs it.
For scrapers this is a problem. Introspection won't tell you what queries are allowed, and arbitrary queries are rejected. You must extract the hashes from the site itself.
Why APIs do this
Two reasons:
- Security. No arbitrary queries means no expensive query attacks (deep recursive queries, full table scans). Servers only run pre-vetted operations.
- Bandwidth. The client sends a 64-char hash instead of a 4 KB query body. At scale, meaningful.
Apollo's Automatic Persisted Queries (APQ) is the canonical implementation. Shopify, Hasura, and many in-house APIs use a variant.
How a persisted-query request looks
POST /api/graphql/persisted
{
"operationName": "GetProducts",
"variables": {"first": 12},
"extensions": {
"persistedQuery": {
"version": 1,
"sha256Hash": "7e8d2f3a1b4c5e6d8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e"
}
}
}
No query field. Just the hash. Server looks up the hash in its registry, runs the query, returns data.
If the hash isn't recognized, response:
{
"errors": [{"message": "PersistedQueryNotFound", "extensions": {...}}]
}
APQ, the half-way protocol
Apollo's APQ uses a fallback: if the hash isn't known, the client retries with the full query AND the hash, registering it for future use. Servers can disable this fallback (strict APQ).
In strict mode, the only way to query is to know an existing hash.
How to extract hashes
The hashes are baked into the site's JS bundle, often as constants:
// In a Webpack-bundled JS file
const GET_PRODUCTS_QUERY = {
documentId: "7e8d2f3a1b4c5e6d8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e",
// ... or
hash: "7e8d2f3a...",
};
Or as constants in a generated file:
const GetProductsDocument = {
__meta__: {
hash: "7e8d2f3a...",
operationName: "GetProducts",
},
};
The pattern varies, documentId, hash, sha256Hash, persisted_query_id. Grep for all of these.
Finding hashes, the workflow
- Open the site, capture a real GraphQL call in DevTools. Note the
sha256Hashvalue. - Search the JS bundles for that hash. Find the surrounding code.
- Inventory all similar hashes near it, they're usually clustered (one operations registry per bundle).
- Match each hash to its operation name (often a property on the same object).
curl -s https://target.example.com/static/js/main.bundle.js > main.js
grep -oE '[a-f0-9]{64}' main.js | sort -u | head
grep -B2 -A2 '7e8d2f3a' main.js
Catalog108 example
/challenges/api/graphql/persisted requires sha256Hash. The hashes are in the lab's JS:
import requests, re
# 1. Find the bundle
page = requests.get("https://practice.scrapingcentral.com/challenges/api/graphql/persisted").text
bundle = re.search(r'src="([^"]+\.js[^"]*)"', page).group(1)
# 2. Download
js = requests.get(f"https://practice.scrapingcentral.com{bundle}").text
# 3. Extract hashes, names often paired
matches = re.findall(r'operationName:\s*["\']([^"\']+)["\'][^}]+sha256Hash:\s*["\']([a-f0-9]{64})["\']', js)
hashes = dict(matches)
print(hashes)
# → {"GetProducts": "abc123...", "GetProduct": "def456..."}
# 4. Use a persisted query
r = requests.post(
"https://practice.scrapingcentral.com/api/graphql/persisted",
json={
"operationName": "GetProducts",
"variables": {"first": 12},
"extensions": {
"persistedQuery": {
"version": 1,
"sha256Hash": hashes["GetProducts"],
},
},
},
)
print(r.json())
PHP version
use GuzzleHttp\Client;
$client = new Client();
// Extract hashes from bundle (similar logic to Python)
$bundle = $client->get('https://practice.scrapingcentral.com/static/js/persisted-app.js')->getBody()->getContents();
preg_match_all('/operationName:\s*["\']([^"\']+)["\'][^}]+sha256Hash:\s*["\']([a-f0-9]{64})["\']/', $bundle, $m, PREG_SET_ORDER);
$hashes = [];
foreach ($m as $pair) {
$hashes[$pair[1]] = $pair[2];
}
// Use a persisted query
$res = $client->post('https://practice.scrapingcentral.com/api/graphql/persisted', [
'json' => [
'operationName' => 'GetProducts',
'variables' => ['first' => 12],
'extensions' => [
'persistedQuery' => [
'version' => 1,
'sha256Hash' => $hashes['GetProducts'],
],
],
],
]);
print_r(json_decode($res->getBody()->getContents(), true));
When extraction is harder
Some sites obfuscate. Two harder cases:
- Hashes assembled at runtime. Split into parts, concatenated. Find via Sources breakpoints on the network call.
- Custom hash schemes. Not sha256 but something proprietary. Reverse-engineer the hashing function.
The Sources-breakpoint technique (lesson 3.46): set a breakpoint on the line that issues the GraphQL POST, inspect the body just before send. Whatever's there is what the site is sending, copy verbatim.
Persisted-query-only vs flexible
Some implementations:
- Strict: only registered hashes accepted. You can ONLY use what the site uses.
- APQ fallback: if hash unknown, send full query + hash, server registers it. You can introduce new queries.
- Hybrid: introspection disabled but raw queries still accepted on a separate endpoint. Look for it.
Check by trying to send a raw query (no hash) and see if it works.
Working around restrictions
If strict and you need data the registered queries don't expose:
- Look for variants, the site may have 50 registered queries; one may have the field you need.
- Look for a sibling endpoint (some sites have a more permissive
/admin/graphqlor/preview/graphql). - Accept the constraint and work within it. Persisted queries are intentional limits.
A complete persisted-query client
class PersistedGqlClient:
def __init__(self, base: str, bundle_url: str):
self.base = base
self.session = requests.Session()
self.hashes = self._extract(bundle_url)
def _extract(self, bundle_url: str) -> dict[str, str]:
js = self.session.get(bundle_url).text
matches = re.findall(
r'operationName:\s*["\']([^"\']+)["\'][^}]+sha256Hash:\s*["\']([a-f0-9]{64})["\']',
js
)
return dict(matches)
def query(self, op_name: str, variables: dict = None) -> dict:
if op_name not in self.hashes:
raise ValueError(f"Unknown operation: {op_name}")
r = self.session.post(self.base, json={
"operationName": op_name,
"variables": variables or {},
"extensions": {
"persistedQuery": {
"version": 1,
"sha256Hash": self.hashes[op_name],
},
},
})
r.raise_for_status()
data = r.json()
if data.get("errors"):
raise RuntimeError(data["errors"])
return data["data"]
Hands-on lab
Hit /challenges/api/graphql/persisted on Catalog108. Extract the hashes from the JS bundle. Use one of them to issue a persisted query. Then try sending a raw query (no hash) to the same endpoint, observe the rejection. You've now navigated the most common modern GraphQL hardening pattern.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/api/graphql/persistedQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.