Selenium WebDriver Basics for Web Scraping
Learn the fundamentals of Selenium WebDriver for web scraping. Set up Chrome WebDriver, navigate pages, and extract data from dynamic websites.
Selenium WebDriver is one of the oldest and most widely used browser automation tools. Originally built for testing, it has become a go-to choice for scraping JavaScript-heavy websites. Selenium controls a real browser instance, which means it can render JavaScript, handle cookies, and interact with pages just like a human user would.
Installation
pip install selenium webdriver-manager
The webdriver-manager package automatically downloads and manages the correct browser driver for your system, so you never have to manually download ChromeDriver again.
Basic Scraping Example
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
# Configure headless Chrome
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
# Launch browser
driver = webdriver.Chrome(
service=Service(ChromeDriverManager().install()),
options=options
)
try:
driver.get("https://quotes.toscrape.com/js/")
# Wait for the page to render JavaScript
driver.implicitly_wait(10)
quotes = driver.find_elements(By.CSS_SELECTOR, ".quote")
for quote in quotes:
text = quote.find_element(By.CSS_SELECTOR, ".text").text
author = quote.find_element(By.CSS_SELECTOR, ".author").text
print(f"{text}, {author}")
finally:
driver.quit()
Key Locator Strategies
Selenium provides several ways to find elements on a page:
| Locator | Example |
|---|---|
By.CSS_SELECTOR |
driver.find_element(By.CSS_SELECTOR, ".price") |
By.XPATH |
driver.find_element(By.XPATH, "//div[@class='item']") |
By.ID |
driver.find_element(By.ID, "search-box") |
By.CLASS_NAME |
driver.find_element(By.CLASS_NAME, "product") |
By.TAG_NAME |
driver.find_elements(By.TAG_NAME, "a") |
Explicit Waits
Implicit waits apply globally, but explicit waits give you finer control:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".quote"))
)
When to Consider an API Alternative
Selenium is powerful but resource-intensive. Each scraping session launches a full browser process. For large-scale projects, consider using ScraperAPI or ScrapingAnt which handle browser rendering on their servers, returning clean HTML without you needing to manage browser instances locally.
Next Steps
- Learn to handle JavaScript-rendered pages with Selenium
- Set up Selenium with proxies for IP rotation
- Explore Selenium Grid for parallel scraping