How to Handle JavaScript Rendering in Web Scraping

Learn how to scrape JavaScript-rendered websites. Covers headless browsers, rendering APIs, and techniques for extracting dynamically loaded content.

Over 70% of modern websites rely on JavaScript to render content. If your scraper only fetches raw HTML, you are missing most of the data.

How to Tell If a Site Uses JS Rendering

View page source (Ctrl+U), If the data is not in the raw HTML, it is rendered by JavaScript
Disable JavaScript in browser settings, If the page breaks, it needs JS
Compare requests.get() response with what you see in the browser

Option 1: Headless Browsers

Playwright (Recommended)

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://spa-website.com")
    
    # Wait for specific content to load
    page.wait_for_selector(".product-list")
    
    # Get the fully rendered HTML
    html = page.content()
    browser.close()

Selenium

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)

driver.get("https://spa-website.com")
driver.implicitly_wait(10)
html = driver.page_source
driver.quit()

Option 2: Rendering APIs

Running headless browsers is resource-intensive. ScraperAPI handles rendering in the cloud.

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://spa-website.com"

# Just add render=true
resp = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"
)

No browser management, no memory issues, no infrastructure to maintain.

Option 3: Find the API Endpoints

Often, JavaScript-rendered content comes from API calls. Intercept these to skip rendering entirely.

Open Chrome DevTools > Network tab
Filter by Fetch/XHR
Find the API call that loads the data
Call that API directly

# Direct API call - no rendering needed
api_url = "https://spa-website.com/api/products?page=1"
resp = requests.get(api_url, headers={"Accept": "application/json"})
data = resp.json()

This is the fastest and most efficient approach when it works.

Comparison

Method	Speed	Resource Usage	Ease of Use	Reliability
Direct API calls	Fastest	Minimal	Hard to find	Variable
ScraperAPI render	Fast	None (cloud)	Very easy	High
Playwright	Medium	High	Medium	High
Selenium	Slow	Very high	Easy	Medium

Best Practices

Always check for API endpoints first, They are faster and more reliable
Use ScrapingAnt or ScraperAPI for cloud rendering, saves infrastructure costs
Set explicit wait conditions, Do not rely on arbitrary sleep() calls
Reuse browser instances, Starting a new browser per request is wasteful
Set resource blocking, Skip loading images and fonts to speed up rendering