Selenium: Handling JavaScript-Rendered Pages
Learn how to scrape JavaScript-rendered pages with Selenium. Handle dynamic content, AJAX calls, and single-page applications.
Many modern websites load their content dynamically using JavaScript. When you fetch these pages with a simple HTTP request, you get an empty shell. Selenium solves this by running a real browser that executes JavaScript, just like a human visitor would see.
The Problem
import requests
from bs4 import BeautifulSoup
# This returns an empty page because content is loaded via JS
resp = requests.get("https://quotes.toscrape.com/js/")
soup = BeautifulSoup(resp.text, "html.parser")
quotes = soup.select(".quote")
print(len(quotes)) # 0, no quotes found!
The Selenium Solution
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("https://quotes.toscrape.com/js/")
# Wait for JS to render the quotes
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".quote")))
quotes = driver.find_elements(By.CSS_SELECTOR, ".quote")
print(len(quotes)) # 10, all quotes found!
for quote in quotes:
text = quote.find_element(By.CSS_SELECTOR, ".text").text
author = quote.find_element(By.CSS_SELECTOR, ".author").text
print(f"{text}, {author}")
driver.quit()
Waiting for AJAX Requests
Some pages load data via AJAX after the initial page load. You can wait for specific conditions:
# Wait until a loading spinner disappears
wait.until(EC.invisibility_of_element_located(
(By.CSS_SELECTOR, ".loading")
))
# Wait until a specific number of elements appear
wait.until(lambda d: len(d.find_elements(By.CSS_SELECTOR, ".item")) >= 20)
# Wait for text to appear in an element
wait.until(EC.text_to_be_present_in_element(
(By.CSS_SELECTOR, "#status"), "Complete"
))
Executing Custom JavaScript
Sometimes you need to run JavaScript directly to trigger content loading or extract data from JS variables:
# Scroll to bottom to trigger lazy loading
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
# Extract data from a JavaScript variable
data = driver.execute_script("return window.__INITIAL_DATA__")
# Get computed styles or hidden attributes
color = driver.execute_script(
"return getComputedStyle(arguments[0]).color",
driver.find_element(By.CSS_SELECTOR, ".price")
)
Getting the Rendered Page Source
After JavaScript has executed, you can get the fully rendered HTML:
rendered_html = driver.page_source
# Now parse with BeautifulSoup if you prefer
from bs4 import BeautifulSoup
soup = BeautifulSoup(rendered_html, "html.parser")
Easier Alternative
If you need rendered HTML without managing browsers, ScrapingAnt and ScraperAPI both offer JavaScript rendering as a service. Send them a URL and get back the fully rendered page source via a simple API call, no browser management required.
Next Steps
- Learn to take screenshots and PDFs with Playwright
- Handle infinite scroll pages
- Set up Selenium with proxies