Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

3.8beginner5 min read

Required vs Optional Headers, Minimum Viable Request

A captured curl has 15 headers. Production code needs three. How to find and keep only what's load-bearing.

What you’ll learn

  • Categorise headers as required, situational, or noise.
  • Use binary elimination to find required headers in under a minute.
  • Recognise the role of User-Agent, Referer, Accept, Origin, and the sec-* family.
  • Build the smallest possible request that still works.

A copy-as-cURL gives you everything the browser sent, 15 to 25 headers depending on the site. In production you want the smallest possible set that still works. This isn't just aesthetics: fewer headers means faster requests, less fingerprint surface, less maintenance when something changes.

The three categories

Every header in a captured request is one of:

  1. Required, the endpoint returns a different response (or refuses) without it. Must keep.
  2. Situational, required only for some endpoints, or only under some conditions. Test on your specific target.
  3. Noise, the server ignores it. Strip.

Most captured requests are 80% noise. The art is finding the 20%.

The binary elimination algorithm

Don't strip headers one at a time, that's slow. Use binary elimination:

  1. Run the curl as-is. Note the response.
  2. Strip half the headers. Re-run. Same response? The discarded half was noise. Keep going with the remaining half.
  3. Strip half of what remains. Re-run. Same? Discard. Different? Bisect into that half to find the load-bearing header.
  4. Continue until you can't strip anything without breaking the response.

15 headers → ~4 iterations → done in under a minute.

Headers and what they're typically for

A taxonomy of headers a scraper meets:

Header Category Notes
Authorization Required (when auth) Bearer/Basic/Custom. The actual auth.
Cookie Required (when session auth) The session, CSRF, AB-test buckets.
Content-Type Required (POST/PUT) Otherwise the server can't parse the body.
Accept Situational Some APIs return HTML if Accept doesn't include application/json.
User-Agent Situational Some endpoints 403 without a browser-shaped UA.
Referer Situational Anti-leech and some CSRF flows check it.
Origin Situational CORS preflights; some APIs require it on POST.
X-Requested-With: XMLHttpRequest Situational Old-school API marker; some Rails / Django apps require it.
X-CSRF-Token Required (on POST form actions) Single-use, capture fresh.
Accept-Encoding Noise (mostly) Libraries handle this.
Accept-Language Noise (mostly) Affects localisation only.
sec-fetch-* Noise (almost always) Browser metadata; servers usually ignore.
sec-ch-ua* Noise (almost always) Client Hints; servers usually ignore.
Cache-Control, Pragma Noise Client-side caching hints.
DNT, Upgrade-Insecure-Requests Noise Browser-only.

A real worked example

Capture a curl on Catalog108's /api/products:

curl 'https://practice.scrapingcentral.com/api/products' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-encoding: gzip, deflate, br' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H 'cache-control: no-cache' \
  -H 'pragma: no-cache' \
  -H 'sec-ch-ua: "Chromium";v="120"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 ...' \
  --compressed

12 headers. Binary elimination:

Round 1, strip headers 1–6:

curl 'https://practice.scrapingcentral.com/api/products' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 ...'

Still works. The first six (accept, accept-encoding, accept-language, cache-control, pragma, sec-ch-ua) were all noise.

Round 2, strip the sec-fetch-* trio:

curl 'https://practice.scrapingcentral.com/api/products' \
  -H 'user-agent: Mozilla/5.0 ...'

Still works. The sec-fetch headers were noise too.

Round 3, strip the user-agent:

curl 'https://practice.scrapingcentral.com/api/products'

Still works on Catalog108 because the endpoint is public. Minimum viable: zero headers.

For comparison, on a real anti-bot-protected site, dropping the User-Agent often produces a 403, that one becomes required.

Required-header patterns by auth type

Auth style Required header(s)
None (public) Often zero
Cookie session Cookie
Bearer / JWT Authorization: Bearer <token>
API key in header Custom (e.g. X-API-Key)
API key in query None (it's in the URL)
HMAC-signed X-Signature (or whatever) + X-Timestamp
CSRF POST Cookie + X-CSRF-Token + (often) Origin
OAuth 2 bearer Authorization: Bearer ...

When in doubt, try with Authorization alone. If 403, add User-Agent. If still 403, add Referer. If still 403, suspect TLS fingerprinting (lesson 3.49).

Python: enforced minimal headers

import requests

# Minimum viable for Catalog108 public endpoint
r = requests.get("https://practice.scrapingcentral.com/api/products")

# For an authenticated endpoint
r = requests.get(
  "https://practice.scrapingcentral.com/api/auth/me",
  headers={"Authorization": f"Bearer {token}"},
)

# For an anti-bot site, you may need
r = requests.get(
  "https://target.example.com/api/data",
  headers={
  "Authorization": f"Bearer {token}",
  "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 ...",
  "Referer": "https://target.example.com/products",
  },
)

Three headers is the typical upper bound for a clean production scraper.

PHP: enforced minimal headers

use GuzzleHttp\Client;
$client = new Client(['base_uri' => 'https://practice.scrapingcentral.com']);

// Minimum viable
$res = $client->get('/api/products');

// Authenticated
$res = $client->get('/api/auth/me', [
  'headers' => ['Authorization' => "Bearer $token"],
]);

Why this matters in production

Three reasons to keep headers minimal:

  1. Maintenance. Every header you copy is a header you have to remember to update when the site changes. Five headers are five potential bugs.
  2. Fingerprinting surface. Headers + order + casing form part of your HTTP fingerprint. The more browser-y-looking headers you copy, the easier you are to detect when they don't perfectly match a real browser.
  3. Bandwidth and CPU. Negligible per request, but at 10M requests/day, every byte matters.

Captured curls are starting points, not endings. Always trim.

Hands-on lab

Capture a curl from /api/products in DevTools. Apply the binary elimination algorithm: strip half, test, repeat. Note how few headers Catalog108 actually requires (likely zero for public endpoints, just Authorization for authenticated ones). Then translate the minimal version to Python or PHP. You've just shaved off 80% of the request, and your scraper is now structurally simpler and less brittle.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /api/products

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Required vs Optional Headers, Minimum Viable Request1 / 8

Which of these headers is almost ALWAYS noise that a scraper can safely strip?

Score so far: 0 / 0