The Three Layers of Modern Web Data (HTML, XHR, Mobile), APIs, SERPs & Reverse Engineering

Every site exposes data through up to three layers. Knowing which layer to target, and the trade-offs of each, is the senior scraper's first decision.

A site doesn't have one source of data, it has up to three, each a layer over the same underlying database. Picking the right layer is the senior-level decision that separates a scraper that ships in a day from one that takes a week and breaks every redesign.

Layer 1: Rendered HTML

The web page a browser shows. Server-rendered or client-rendered, the final markup is what your browser's Elements panel shows.

Discovery cost: zero. View source, write selectors, done.
Stability: worst. Class names change, layouts shift, DOM moves every release.
Auth complexity: low. Cookies + a session is usually all you need.
Best for: static sites with no XHR layer, prototype scrapers, sites with very stable markup (Wikipedia, government portals), one-off academic projects.

Layer 2: XHR / API (the JSON layer)

The internal JSON API the site uses to populate itself. Visible in DevTools → Network → Fetch/XHR.

Discovery cost: medium. You need to find the right endpoint, decode auth, replicate headers. The rest of this sub-path is dedicated to this layer.
Stability: good. APIs change less often than markup; when they do, the breakage is typically additive (new optional fields) rather than destructive.
Auth complexity: medium to high. JWT, OAuth, signed requests, CSRF, all live here.
Best for: anything modern and JavaScript-heavy. SPAs, Next.js sites, dashboards, e-commerce, social feeds.

This is where 80% of professional scraping happens.

Layer 3: Mobile-app API

The endpoints the mobile app talks to. Often the same backend as the web API, but with different conventions, usually simpler auth (long-lived tokens or signed requests), no CSRF, no browser-specific headers, sometimes a fully different schema.

Discovery cost: high. You need a proxy (mitmproxy, Charles, Proxyman), root/jailbreak or a specially configured emulator, sometimes SSL-pinning bypasses (Sub-Path 3, lessons 47–48).
Stability: very good. Mobile apps update slowly; their APIs are kept stable for backward compatibility with old app versions.
Auth complexity: variable. Sometimes simpler (one API key signed into a request), sometimes worse (certificate pinning).
Best for: when the web API is locked down by an anti-bot service, when the mobile API exposes data the web doesn't, when you need long-lived tokens.

Famous example: Instagram, Twitter/X, Reddit, Uber, DoorDash, all have private mobile APIs that scrapers prefer because the web equivalents are heavily fingerprinted.

Trade-off matrix

Layer	Discovery	Stability	Auth	When to choose
HTML	Easy	Bad	Easy	Pure SSR sites, no XHR available
XHR / API	Medium	Good	Medium	Default for modern sites
Mobile API	Hard	Best	Variable	Web layer is locked down

How to inspect each layer on Catalog108

Catalog108 exposes all three layers explicitly for practice.

Layer 1, HTML:

curl https://practice.scrapingcentral.com/products | head -100

You'll see the full SSR HTML with embedded <script id="__NEXT_DATA__"> and a hydrated React tree.

Layer 2, XHR:

Visit /products in a browser, open Network → Fetch/XHR, look for /api/products:

curl https://practice.scrapingcentral.com/api/products

Returns clean JSON. Same data, no markup. This is where you'll spend Sub-Path 3.

Layer 3, mobile-style:

Catalog108 doesn't ship a real mobile app, but it exposes endpoints that mimic mobile-style auth, long-lived bearer tokens, no cookie session, no CSRF:

curl -H "Authorization: Bearer <token>" \
  https://practice.scrapingcentral.com/api/products

The lessons on JWT, HMAC, and OAuth all use this mobile-shaped pattern.

The decision rule

When you sit down with a new target, the order is:

Check XHR first. Reload with Network → Fetch/XHR open. See JSON? Done, go after that.
If no XHR, check HTML. Pure SSR? Fine, use the static-scraping toolkit.
If XHR is locked down (heavy auth, fingerprinting, CAPTCHAs): consider the mobile API. Set up mitmproxy on a test phone, capture, replicate.

You almost never start by writing CSS selectors against rendered HTML in 2026 unless the other two layers have been ruled out.

A worked example

Imagine you want to scrape a retailer's product catalog.

Junior path: open Chrome, view source, find <div class="product-tile">, write a BeautifulSoup loop, deploy, watch it break next Tuesday when marketing changes the class names.
Senior path: open Network → Fetch/XHR, find /api/v2/products?store=123&page=1, copy as cURL, replicate. The endpoint accepts a ?per_page=200 parameter that the site never uses. One request gets you 200 products. Twenty requests get you 4,000. The scraper runs for a year without touching a class name.

That's the value of layer-awareness.

Python snippet: feeling all three layers

import requests

# Layer 1: HTML
html = requests.get("https://practice.scrapingcentral.com/products").text
print("HTML length:", len(html))

# Layer 2: JSON XHR
api = requests.get("https://practice.scrapingcentral.com/api/products").json()
print("Products:", len(api["products"]))

# Layer 3: authenticated, mobile-shaped
token = requests.post(
  "https://practice.scrapingcentral.com/api/auth/login",
  json={"email": "student@practice.scrapingcentral.com", "password": "practice123"},
).json()["access_token"]
me = requests.get(
  "https://practice.scrapingcentral.com/api/auth/me",
  headers={"Authorization": f"Bearer {token}"},
).json()
print("Authenticated as:", me)

PHP equivalent uses Guzzle with the same three calls. You'll see all three layers in the next several lessons.

Hands-on lab

Open /products in your browser. Use View Source to inspect Layer 1, Network → Fetch/XHR to inspect Layer 2. For Layer 3, hit /api/auth/login with the demo credentials and use the bearer token to call /api/auth/me. Note how much cleaner Layer 2 and Layer 3 are than parsing the SSR HTML, this is the shift the rest of the sub-path teaches you to make instinctively.

The Three Layers of Modern Web Data (HTML, XHR, Mobile)

What you’ll learn

Layer 1: Rendered HTML

Layer 2: XHR / API (the JSON layer)

Layer 3: Mobile-app API

Trade-off matrix

How to inspect each layer on Catalog108

The decision rule

A worked example

Python snippet: feeling all three layers

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which of the three layers, HTML, XHR API, mobile API, typically offers the BEST stability over time?