Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Scraping Dynamic Content Without a Browser

Extract data from JavaScript-heavy websites without using a browser. Discover hidden APIs, intercept XHR requests, and parse JSON responses.

Python Scraping · #14intermediate3 min read
Share:WhatsAppLinkedIn

Many websites load data dynamically using JavaScript. When you fetch the page with requests, the HTML is empty because the content has not been rendered yet. But you often do not need a browser, finding the underlying API is faster and more reliable.

Finding Hidden APIs

Modern websites typically load data from JSON APIs via XHR or Fetch requests. Here is how to find them:

  1. Open Chrome DevTools (F12)
  2. Go to the Network tab
  3. Filter by Fetch/XHR
  4. Interact with the page (scroll, click, search)
  5. Look for JSON responses containing the data you need

Scraping a JSON API Directly

Once you find the API endpoint, you can call it directly, no HTML parsing needed.

import requests

# This is the API endpoint discovered via DevTools
url = "https://quotes.toscrape.com/api/quotes"
params = {"page": 1}

all_quotes = []
has_next = True

while has_next:
    response = requests.get(url, params=params)
    data = response.json()

    for quote in data["quotes"]:
        all_quotes.append({
            "text": quote["text"],
            "author": quote["author"]["name"],
            "tags": quote["tags"],
        })

    has_next = data["has_next"]
    params["page"] += 1

print(f"Fetched {len(all_quotes)} quotes from the API")

Replicating XHR Requests

Sometimes API calls require specific headers or parameters. Copy them from DevTools.

import requests

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept": "application/json",
    "Referer": "https://example.com/search",
    "X-Requested-With": "XMLHttpRequest",
})

# Replicate the exact API call from DevTools
response = session.get(
    "https://example.com/api/search",
    params={"q": "python", "page": 1, "sort": "relevance"},
)

data = response.json()
for item in data.get("results", []):
    print(item["title"])

Extracting Data from Embedded JSON

Some sites embed JSON data directly in the HTML inside <script> tags.

import requests
import json
import re
from bs4 import BeautifulSoup

response = requests.get("https://example.com/product/123")
soup = BeautifulSoup(response.text, "html.parser")

# Find script tags containing JSON data
for script in soup.select("script"):
    text = script.string or ""
    if "productData" in text:
        # Extract the JSON object using regex
        match = re.search(r"productData\s*=\s*(\{.*?\});", text, re.DOTALL)
        if match:
            product = json.loads(match.group(1))
            print(f"Name: {product['name']}")
            print(f"Price: {product['price']}")
            break

Checking for __NEXT_DATA__ (Next.js Sites)

Next.js sites embed page data in a special script tag.

import requests
import json
from bs4 import BeautifulSoup

response = requests.get("https://example-nextjs-site.com/products")
soup = BeautifulSoup(response.text, "html.parser")

next_data = soup.select_one("script#__NEXT_DATA__")
if next_data:
    data = json.loads(next_data.string)
    props = data["props"]["pageProps"]
    print(json.dumps(props, indent=2)[:500])

When You Really Need a Browser

Sometimes there is no hidden API, the content is rendered entirely client-side. In those cases:

  • Use ScraperAPI with the render=true parameter to get rendered HTML without running your own browser.
  • Use ScrapingAnt which renders pages in a real browser and returns the final HTML.
  • As a last resort, use Playwright or Selenium locally.

Tips

  • Always check for hidden APIs first, they are faster and return cleaner data than scraping HTML.
  • Look for application/json responses in the Network tab.
  • Check the page source for <script type="application/ld+json"> tags, these contain structured data (schema.org) that is easy to parse.
  • The __NEXT_DATA__ trick works on any Next.js site and gives you the full page data as JSON.

Next Steps

  • Learn to use ScraperAPI for JavaScript rendering at scale
  • Explore ScrapingAnt for browser-based scraping without infrastructure