Required vs Optional Headers, Minimum Viable Request, APIs, SERPs & Reverse Engineering

A captured curl has 15 headers. Production code needs three. How to find and keep only what's load-bearing.

A copy-as-cURL gives you everything the browser sent, 15 to 25 headers depending on the site. In production you want the smallest possible set that still works. This isn't just aesthetics: fewer headers means faster requests, less fingerprint surface, less maintenance when something changes.

The three categories

Every header in a captured request is one of:

Required, the endpoint returns a different response (or refuses) without it. Must keep.
Situational, required only for some endpoints, or only under some conditions. Test on your specific target.
Noise, the server ignores it. Strip.

Most captured requests are 80% noise. The art is finding the 20%.

The binary elimination algorithm

Don't strip headers one at a time, that's slow. Use binary elimination:

Run the curl as-is. Note the response.
Strip half the headers. Re-run. Same response? The discarded half was noise. Keep going with the remaining half.
Strip half of what remains. Re-run. Same? Discard. Different? Bisect into that half to find the load-bearing header.
Continue until you can't strip anything without breaking the response.

15 headers → ~4 iterations → done in under a minute.

Headers and what they're typically for

A taxonomy of headers a scraper meets:

Header	Category	Notes
`Authorization`	Required (when auth)	Bearer/Basic/Custom. The actual auth.
`Cookie`	Required (when session auth)	The session, CSRF, AB-test buckets.
`Content-Type`	Required (POST/PUT)	Otherwise the server can't parse the body.
`Accept`	Situational	Some APIs return HTML if Accept doesn't include `application/json`.
`User-Agent`	Situational	Some endpoints 403 without a browser-shaped UA.
`Referer`	Situational	Anti-leech and some CSRF flows check it.
`Origin`	Situational	CORS preflights; some APIs require it on POST.
`X-Requested-With: XMLHttpRequest`	Situational	Old-school API marker; some Rails / Django apps require it.
`X-CSRF-Token`	Required (on POST form actions)	Single-use, capture fresh.
`Accept-Encoding`	Noise (mostly)	Libraries handle this.
`Accept-Language`	Noise (mostly)	Affects localisation only.
`sec-fetch-*`	Noise (almost always)	Browser metadata; servers usually ignore.
`sec-ch-ua*`	Noise (almost always)	Client Hints; servers usually ignore.
`Cache-Control`, `Pragma`	Noise	Client-side caching hints.
`DNT`, `Upgrade-Insecure-Requests`	Noise	Browser-only.

A real worked example

Capture a curl on Catalog108's /api/products:

curl 'https://practice.scrapingcentral.com/api/products' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-encoding: gzip, deflate, br' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H 'cache-control: no-cache' \
  -H 'pragma: no-cache' \
  -H 'sec-ch-ua: "Chromium";v="120"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 ...' \
  --compressed

12 headers. Binary elimination:

Round 1, strip headers 1–6:

curl 'https://practice.scrapingcentral.com/api/products' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 ...'

Still works. The first six (accept, accept-encoding, accept-language, cache-control, pragma, sec-ch-ua) were all noise.

Round 2, strip the sec-fetch-* trio:

curl 'https://practice.scrapingcentral.com/api/products' \
  -H 'user-agent: Mozilla/5.0 ...'

Still works. The sec-fetch headers were noise too.

Round 3, strip the user-agent:

curl 'https://practice.scrapingcentral.com/api/products'

Still works on Catalog108 because the endpoint is public. Minimum viable: zero headers.

For comparison, on a real anti-bot-protected site, dropping the User-Agent often produces a 403, that one becomes required.

Required-header patterns by auth type

Auth style	Required header(s)
None (public)	Often zero
Cookie session	`Cookie`
Bearer / JWT	`Authorization: Bearer <token>`
API key in header	Custom (e.g. `X-API-Key`)
API key in query	None (it's in the URL)
HMAC-signed	`X-Signature` (or whatever) + `X-Timestamp`
CSRF POST	`Cookie` + `X-CSRF-Token` + (often) `Origin`
OAuth 2 bearer	`Authorization: Bearer ...`

When in doubt, try with Authorization alone. If 403, add User-Agent. If still 403, add Referer. If still 403, suspect TLS fingerprinting (lesson 3.49).

Python: enforced minimal headers

import requests

# Minimum viable for Catalog108 public endpoint
r = requests.get("https://practice.scrapingcentral.com/api/products")

# For an authenticated endpoint
r = requests.get(
  "https://practice.scrapingcentral.com/api/auth/me",
  headers={"Authorization": f"Bearer {token}"},
)

# For an anti-bot site, you may need
r = requests.get(
  "https://target.example.com/api/data",
  headers={
  "Authorization": f"Bearer {token}",
  "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 ...",
  "Referer": "https://target.example.com/products",
  },
)

Three headers is the typical upper bound for a clean production scraper.

PHP: enforced minimal headers

use GuzzleHttp\Client;
$client = new Client(['base_uri' => 'https://practice.scrapingcentral.com']);

// Minimum viable
$res = $client->get('/api/products');

// Authenticated
$res = $client->get('/api/auth/me', [
  'headers' => ['Authorization' => "Bearer $token"],
]);

Why this matters in production

Three reasons to keep headers minimal:

Maintenance. Every header you copy is a header you have to remember to update when the site changes. Five headers are five potential bugs.
Fingerprinting surface. Headers + order + casing form part of your HTTP fingerprint. The more browser-y-looking headers you copy, the easier you are to detect when they don't perfectly match a real browser.
Bandwidth and CPU. Negligible per request, but at 10M requests/day, every byte matters.

Captured curls are starting points, not endings. Always trim.

Hands-on lab

Capture a curl from /api/products in DevTools. Apply the binary elimination algorithm: strip half, test, repeat. Note how few headers Catalog108 actually requires (likely zero for public endpoints, just Authorization for authenticated ones). Then translate the minimal version to Python or PHP. You've just shaved off 80% of the request, and your scraper is now structurally simpler and less brittle.

Required vs Optional Headers, Minimum Viable Request

What you’ll learn

The three categories

The binary elimination algorithm

Headers and what they're typically for

A real worked example

Required-header patterns by auth type

Python: enforced minimal headers

PHP: enforced minimal headers

Why this matters in production

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which of these headers is almost ALWAYS noise that a scraper can safely strip?