Selenium WebDriver Basics for Web Scraping - Browser Automation

Learn the fundamentals of Selenium WebDriver for web scraping. Set up Chrome WebDriver, navigate pages, and extract data from dynamic websites.

Selenium WebDriver is one of the oldest and most widely used browser automation tools. Originally built for testing, it has become a go-to choice for scraping JavaScript-heavy websites. Selenium controls a real browser instance, which means it can render JavaScript, handle cookies, and interact with pages just like a human user would.

Installation

pip install selenium webdriver-manager

The webdriver-manager package automatically downloads and manages the correct browser driver for your system, so you never have to manually download ChromeDriver again.

Basic Scraping Example

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

# Configure headless Chrome
options = Options()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")

# Launch browser
driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install()),
    options=options
)

try:
    driver.get("https://quotes.toscrape.com/js/")

    # Wait for the page to render JavaScript
    driver.implicitly_wait(10)

    quotes = driver.find_elements(By.CSS_SELECTOR, ".quote")
    for quote in quotes:
        text = quote.find_element(By.CSS_SELECTOR, ".text").text
        author = quote.find_element(By.CSS_SELECTOR, ".author").text
        print(f"{text}, {author}")
finally:
    driver.quit()

Key Locator Strategies

Selenium provides several ways to find elements on a page:

Locator	Example
`By.CSS_SELECTOR`	`driver.find_element(By.CSS_SELECTOR, ".price")`
`By.XPATH`	`driver.find_element(By.XPATH, "//div[@class='item']")`
`By.ID`	`driver.find_element(By.ID, "search-box")`
`By.CLASS_NAME`	`driver.find_element(By.CLASS_NAME, "product")`
`By.TAG_NAME`	`driver.find_elements(By.TAG_NAME, "a")`

Explicit Waits

Implicit waits apply globally, but explicit waits give you finer control:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)
element = wait.until(
    EC.presence_of_element_located((By.CSS_SELECTOR, ".quote"))
)

When to Consider an API Alternative

Selenium is powerful but resource-intensive. Each scraping session launches a full browser process. For large-scale projects, consider using ScraperAPI or ScrapingAnt which handle browser rendering on their servers, returning clean HTML without you needing to manage browser instances locally.

Next Steps

Learn to handle JavaScript-rendered pages with Selenium
Set up Selenium with proxies for IP rotation
Explore Selenium Grid for parallel scraping