Tutorial
How to Handle JavaScript Rendering in Web Scraping
Learn how to scrape JavaScript-rendered websites. Covers headless browsers, rendering APIs, and techniques for extracting dynamically loaded content.
Over 70% of modern websites rely on JavaScript to render content. If your scraper only fetches raw HTML, you are missing most of the data.
How to Tell If a Site Uses JS Rendering
- View page source (Ctrl+U), If the data is not in the raw HTML, it is rendered by JavaScript
- Disable JavaScript in browser settings, If the page breaks, it needs JS
- Compare
requests.get()response with what you see in the browser
Option 1: Headless Browsers
Playwright (Recommended)
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://spa-website.com")
# Wait for specific content to load
page.wait_for_selector(".product-list")
# Get the fully rendered HTML
html = page.content()
browser.close()
Selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("https://spa-website.com")
driver.implicitly_wait(10)
html = driver.page_source
driver.quit()
Option 2: Rendering APIs
Running headless browsers is resource-intensive. ScraperAPI handles rendering in the cloud.
import requests
API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://spa-website.com"
# Just add render=true
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"
)
No browser management, no memory issues, no infrastructure to maintain.
Option 3: Find the API Endpoints
Often, JavaScript-rendered content comes from API calls. Intercept these to skip rendering entirely.
- Open Chrome DevTools > Network tab
- Filter by Fetch/XHR
- Find the API call that loads the data
- Call that API directly
# Direct API call - no rendering needed
api_url = "https://spa-website.com/api/products?page=1"
resp = requests.get(api_url, headers={"Accept": "application/json"})
data = resp.json()
This is the fastest and most efficient approach when it works.
Comparison
| Method | Speed | Resource Usage | Ease of Use | Reliability |
|---|---|---|---|---|
| Direct API calls | Fastest | Minimal | Hard to find | Variable |
| ScraperAPI render | Fast | None (cloud) | Very easy | High |
| Playwright | Medium | High | Medium | High |
| Selenium | Slow | Very high | Easy | Medium |
Best Practices
- Always check for API endpoints first, They are faster and more reliable
- Use ScrapingAnt or ScraperAPI for cloud rendering, saves infrastructure costs
- Set explicit wait conditions, Do not rely on arbitrary
sleep()calls - Reuse browser instances, Starting a new browser per request is wasteful
- Set resource blocking, Skip loading images and fonts to speed up rendering