Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.1intermediate4 min read

Detecting JS-Rendered Content (Three-Test Diagnostic)

Before reaching for a headless browser, run three fast tests to confirm the page actually needs one. Most pages don't.

What you’ll learn

  • Apply the three-test diagnostic: view-source vs DOM, JS-disabled reload, and curl-vs-browser comparison.
  • Recognise the four ways modern sites hydrate content: SSR, CSR, hybrid, and progressive enhancement.
  • Identify when static scraping will work even on a page that 'looks' dynamic.
  • Decide between browser automation and a JSON-endpoint hunt before writing any code.

The most expensive mistake in scraping is reaching for a headless browser when you didn't need one. Browser automation is ten to fifty times slower than HTTP scraping, uses ten times the memory, and breaks more often. Before you spin up Playwright, prove the page actually needs it.

Three tests, two minutes, no code.

Test 1: view-source vs Elements

In your browser, open the target page. Right-click and pick View Page Source (Ctrl+U / Cmd+Opt+U). This is the raw HTML the server sent, exactly what curl would get. Now open DevTools → Elements. This is the live DOM after JavaScript has run.

Search both for a string you want to scrape (a product name, a price, a date). Four possible outcomes:

view-source Elements Verdict
Present Present Server-rendered. Use requests / Guzzle.
Absent Present Client-rendered. You need a browser or the underlying API.
Present in JSON <script> Present Hydration payload. Parse the JSON directly, fastest path.
Absent Absent (yet) Lazy-loaded on scroll/click. Browser or API.

The hydration-payload case is gold. Frameworks like Next.js, Nuxt, and SvelteKit embed all initial data inside a <script id="__NEXT_DATA__"> tag. You can extract it with requests plus a JSON parse, no browser needed. Lesson 1.18 covers the technique in detail.

Test 2: disable JavaScript and reload

In Chrome DevTools, open the command palette (Ctrl+Shift+P / Cmd+Shift+P), type "Disable JavaScript", and reload. Three possible outcomes:

  • The page looks identical. Server-rendered. Static scraping will work.
  • A skeleton or spinner shows forever. Pure SPA. You need a browser or the API.
  • The shell loads but data is missing. Hybrid. The shell is server-rendered, data fetches via XHR. Often you can hit the XHR endpoint directly.

Re-enable JavaScript when you're done, DevTools doesn't always undo it cleanly on close.

Test 3: curl vs browser

The definitive test. Reproduce what your scraper would actually see:

curl -s -A "Mozilla/5.0" https://practice.scrapingcentral.com/challenges/dynamic/spa-pure \
  | grep -i "product"

If the data you want appears in the output, static scraping works. If you see only a <div id="root"></div> and a bundle of JS files, you've confirmed a client-rendered SPA.

Add -D headers.txt to capture response headers, sometimes the server returns SSR for browser User-Agents and a stripped shell for curl/8.0. The User-Agent flag handles the obvious case; if results still differ, you're looking at fingerprinting (Sub-Path 5).

The four rendering shapes

After running all three tests on a few hundred sites, every page falls into one of four buckets:

  1. Pure SSR. Server returns full HTML with data embedded. curl is enough. Examples: Wikipedia, most government sites, classic e-commerce.
  2. Pure CSR. Server returns a minimal shell; React/Vue/Svelte renders everything client-side. Examples: most dashboards, modern Twitter, the /spa-pure lab.
  3. Hybrid (SSR + hydration). Server returns rendered HTML plus the data as JSON in a <script> tag. The framework "hydrates" the static markup into an interactive app. Examples: most Next.js / Nuxt sites.
  4. Progressive enhancement. Server returns working HTML; JS adds nice-to-haves like sorting and filters. Static scraping works for the data; only interactions need a browser.

Knowing which bucket you're in determines the tool. Don't skip this step.

Decision tree

┌─ Data in view-source? ─── YES ──► Use requests/Guzzle. Done.
│  │
│  NO
│  ▼
├─ Data in a hydration script tag (__NEXT_DATA__, __NUXT__)? ── YES ──► Parse the JSON.
│  │
│  NO
│  ▼
├─ Data visible in DevTools Network as an XHR/Fetch response? ── YES ──► Hit that API.
│  │
│  NO
│  ▼
└─ Use a headless browser (Playwright/Selenium).

In practice you fall through to the bottom of this tree maybe twenty percent of the time. The other eighty percent is solvable without a browser, you just need to look first.

Why scrapers reach for browsers too quickly

Two reasons, both psychological:

  • The browser "just works". Throw Playwright at any page and it returns the rendered DOM. No reasoning required, no curl-debugging. The cost (slowness, resource use, fragility) shows up later, in production.
  • The page "looks dynamic". A spinner, a lazy-load fade, an animated counter, none of these necessarily mean the data is client-rendered. They might be cosmetic. Run the tests.

The instinct to default to a browser is the single most common reason scrapers are slow, expensive to run, and unstable. Diagnose first.

Hands-on lab

Open /challenges/dynamic/spa-pure in your browser. Run all three tests. Then open /products (a hybrid page) and run the same three tests. Compare what you see in view-source and in Network. You should be able to articulate, in one sentence, why one needs a browser and the other doesn't.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/spa-pure

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Detecting JS-Rendered Content (Three-Test Diagnostic)1 / 8

You search for a product name in View Page Source and find it. You also see it in the Elements panel. What kind of rendering is the page using?

Score so far: 0 / 0