Required vs Optional Headers, Minimum Viable Request
A captured curl has 15 headers. Production code needs three. How to find and keep only what's load-bearing.
What you’ll learn
- Categorise headers as required, situational, or noise.
- Use binary elimination to find required headers in under a minute.
- Recognise the role of User-Agent, Referer, Accept, Origin, and the sec-* family.
- Build the smallest possible request that still works.
A copy-as-cURL gives you everything the browser sent, 15 to 25 headers depending on the site. In production you want the smallest possible set that still works. This isn't just aesthetics: fewer headers means faster requests, less fingerprint surface, less maintenance when something changes.
The three categories
Every header in a captured request is one of:
- Required, the endpoint returns a different response (or refuses) without it. Must keep.
- Situational, required only for some endpoints, or only under some conditions. Test on your specific target.
- Noise, the server ignores it. Strip.
Most captured requests are 80% noise. The art is finding the 20%.
The binary elimination algorithm
Don't strip headers one at a time, that's slow. Use binary elimination:
- Run the curl as-is. Note the response.
- Strip half the headers. Re-run. Same response? The discarded half was noise. Keep going with the remaining half.
- Strip half of what remains. Re-run. Same? Discard. Different? Bisect into that half to find the load-bearing header.
- Continue until you can't strip anything without breaking the response.
15 headers → ~4 iterations → done in under a minute.
Headers and what they're typically for
A taxonomy of headers a scraper meets:
| Header | Category | Notes |
|---|---|---|
Authorization |
Required (when auth) | Bearer/Basic/Custom. The actual auth. |
Cookie |
Required (when session auth) | The session, CSRF, AB-test buckets. |
Content-Type |
Required (POST/PUT) | Otherwise the server can't parse the body. |
Accept |
Situational | Some APIs return HTML if Accept doesn't include application/json. |
User-Agent |
Situational | Some endpoints 403 without a browser-shaped UA. |
Referer |
Situational | Anti-leech and some CSRF flows check it. |
Origin |
Situational | CORS preflights; some APIs require it on POST. |
X-Requested-With: XMLHttpRequest |
Situational | Old-school API marker; some Rails / Django apps require it. |
X-CSRF-Token |
Required (on POST form actions) | Single-use, capture fresh. |
Accept-Encoding |
Noise (mostly) | Libraries handle this. |
Accept-Language |
Noise (mostly) | Affects localisation only. |
sec-fetch-* |
Noise (almost always) | Browser metadata; servers usually ignore. |
sec-ch-ua* |
Noise (almost always) | Client Hints; servers usually ignore. |
Cache-Control, Pragma |
Noise | Client-side caching hints. |
DNT, Upgrade-Insecure-Requests |
Noise | Browser-only. |
A real worked example
Capture a curl on Catalog108's /api/products:
curl 'https://practice.scrapingcentral.com/api/products' \
-H 'accept: application/json, text/plain, */*' \
-H 'accept-encoding: gzip, deflate, br' \
-H 'accept-language: en-US,en;q=0.9' \
-H 'cache-control: no-cache' \
-H 'pragma: no-cache' \
-H 'sec-ch-ua: "Chromium";v="120"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 ...' \
--compressed
12 headers. Binary elimination:
Round 1, strip headers 1–6:
curl 'https://practice.scrapingcentral.com/api/products' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 ...'
Still works. The first six (accept, accept-encoding, accept-language, cache-control, pragma, sec-ch-ua) were all noise.
Round 2, strip the sec-fetch-* trio:
curl 'https://practice.scrapingcentral.com/api/products' \
-H 'user-agent: Mozilla/5.0 ...'
Still works. The sec-fetch headers were noise too.
Round 3, strip the user-agent:
curl 'https://practice.scrapingcentral.com/api/products'
Still works on Catalog108 because the endpoint is public. Minimum viable: zero headers.
For comparison, on a real anti-bot-protected site, dropping the User-Agent often produces a 403, that one becomes required.
Required-header patterns by auth type
| Auth style | Required header(s) |
|---|---|
| None (public) | Often zero |
| Cookie session | Cookie |
| Bearer / JWT | Authorization: Bearer <token> |
| API key in header | Custom (e.g. X-API-Key) |
| API key in query | None (it's in the URL) |
| HMAC-signed | X-Signature (or whatever) + X-Timestamp |
| CSRF POST | Cookie + X-CSRF-Token + (often) Origin |
| OAuth 2 bearer | Authorization: Bearer ... |
When in doubt, try with Authorization alone. If 403, add User-Agent. If still 403, add Referer. If still 403, suspect TLS fingerprinting (lesson 3.49).
Python: enforced minimal headers
import requests
# Minimum viable for Catalog108 public endpoint
r = requests.get("https://practice.scrapingcentral.com/api/products")
# For an authenticated endpoint
r = requests.get(
"https://practice.scrapingcentral.com/api/auth/me",
headers={"Authorization": f"Bearer {token}"},
)
# For an anti-bot site, you may need
r = requests.get(
"https://target.example.com/api/data",
headers={
"Authorization": f"Bearer {token}",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 ...",
"Referer": "https://target.example.com/products",
},
)
Three headers is the typical upper bound for a clean production scraper.
PHP: enforced minimal headers
use GuzzleHttp\Client;
$client = new Client(['base_uri' => 'https://practice.scrapingcentral.com']);
// Minimum viable
$res = $client->get('/api/products');
// Authenticated
$res = $client->get('/api/auth/me', [
'headers' => ['Authorization' => "Bearer $token"],
]);
Why this matters in production
Three reasons to keep headers minimal:
- Maintenance. Every header you copy is a header you have to remember to update when the site changes. Five headers are five potential bugs.
- Fingerprinting surface. Headers + order + casing form part of your HTTP fingerprint. The more browser-y-looking headers you copy, the easier you are to detect when they don't perfectly match a real browser.
- Bandwidth and CPU. Negligible per request, but at 10M requests/day, every byte matters.
Captured curls are starting points, not endings. Always trim.
Hands-on lab
Capture a curl from /api/products in DevTools. Apply the binary elimination algorithm: strip half, test, repeat. Note how few headers Catalog108 actually requires (likely zero for public endpoints, just Authorization for authenticated ones). Then translate the minimal version to Python or PHP. You've just shaved off 80% of the request, and your scraper is now structurally simpler and less brittle.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/api/productsQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.