Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Tutorial

How to Handle JavaScript Rendering in Web Scraping

Learn how to scrape JavaScript-rendered websites. Covers headless browsers, rendering APIs, and techniques for extracting dynamically loaded content.

Over 70% of modern websites rely on JavaScript to render content. If your scraper only fetches raw HTML, you are missing most of the data.

How to Tell If a Site Uses JS Rendering

  1. View page source (Ctrl+U), If the data is not in the raw HTML, it is rendered by JavaScript
  2. Disable JavaScript in browser settings, If the page breaks, it needs JS
  3. Compare requests.get() response with what you see in the browser

Option 1: Headless Browsers

Playwright (Recommended)

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://spa-website.com")
    
    # Wait for specific content to load
    page.wait_for_selector(".product-list")
    
    # Get the fully rendered HTML
    html = page.content()
    browser.close()

Selenium

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)

driver.get("https://spa-website.com")
driver.implicitly_wait(10)
html = driver.page_source
driver.quit()

Option 2: Rendering APIs

Running headless browsers is resource-intensive. ScraperAPI handles rendering in the cloud.

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://spa-website.com"

# Just add render=true
resp = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"
)

No browser management, no memory issues, no infrastructure to maintain.

Option 3: Find the API Endpoints

Often, JavaScript-rendered content comes from API calls. Intercept these to skip rendering entirely.

  1. Open Chrome DevTools > Network tab
  2. Filter by Fetch/XHR
  3. Find the API call that loads the data
  4. Call that API directly
# Direct API call - no rendering needed
api_url = "https://spa-website.com/api/products?page=1"
resp = requests.get(api_url, headers={"Accept": "application/json"})
data = resp.json()

This is the fastest and most efficient approach when it works.

Comparison

Method Speed Resource Usage Ease of Use Reliability
Direct API calls Fastest Minimal Hard to find Variable
ScraperAPI render Fast None (cloud) Very easy High
Playwright Medium High Medium High
Selenium Slow Very high Easy Medium

Best Practices

  1. Always check for API endpoints first, They are faster and more reliable
  2. Use ScrapingAnt or ScraperAPI for cloud rendering, saves infrastructure costs
  3. Set explicit wait conditions, Do not rely on arbitrary sleep() calls
  4. Reuse browser instances, Starting a new browser per request is wasteful
  5. Set resource blocking, Skip loading images and fonts to speed up rendering