Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.14intermediate4 min read

Selenium in Python (Legacy but Still Common)

Selenium predates Playwright by a decade and still dominates legacy codebases. Know it well enough to read, port, and not fear inheriting a Selenium scraper.

What you’ll learn

  • Install Selenium 4 and a managed driver via `webdriver-manager`.
  • Translate Playwright concepts to Selenium: driver, options, explicit waits.
  • Write a Selenium scraper using `WebDriverWait` correctly (not `time.sleep`).
  • Identify why Playwright won, and where Selenium is still genuinely useful.

Selenium is the original browser automation library. Most "old" scraping tutorials, most enterprise QA suites, and a huge fraction of legacy production scrapers use it. You need to be fluent enough to read inherited code, port it to Playwright when it's worth the effort, and ship working scrapers in shops that have standardised on Selenium.

Install

Selenium 4 includes Selenium Manager, automatic driver downloads:

pip install selenium

That's it. The first time you run a script, Selenium downloads the matching ChromeDriver into a managed cache. No more chromedriver-binary headaches.

If you're on an older codebase that pins Selenium 3, you'll see webdriver_manager or hand-managed driver paths:

pip install selenium webdriver-manager

Your first scraper

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
try:
  driver.get("https://practice.scrapingcentral.com/")
  WebDriverWait(driver, 10).until(
  EC.presence_of_element_located((By.CSS_SELECTOR, "h1"))
  )
  h1 = driver.find_element(By.CSS_SELECTOR, "h1").text
  print(h1)
finally:
  driver.quit()

Compare to Playwright:

page.goto("https://practice.scrapingcentral.com/")
print(page.locator("h1").first.inner_text())

Selenium needs explicit waits everywhere. Playwright bakes them in. That's the headline difference and the main reason teams migrate.

Concept mapping

Playwright Selenium
Browser WebDriver (your driver object)
BrowserContext No direct equivalent, use a new WebDriver instance
Page The active document on the driver
Locator (auto-waiting) WebElement (one-shot, stale-prone)
wait_for_selector WebDriverWait + ExpectedConditions
page.evaluate("...") driver.execute_script("...")
expect_response Browser-Mob Proxy or DevTools Protocol setup

The "no BrowserContext equivalent" is a real gap, to isolate sessions you launch separate drivers, each with their own browser process. Significantly heavier than Playwright contexts.

Driver options

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless=new")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--window-size=1280,800")
options.add_argument("user-agent=Mozilla/5.0 ...")

driver = webdriver.Chrome(options=options)

--headless=new is the modern flag (the original --headless is being phased out). --no-sandbox and --disable-dev-shm-usage are mandatory in Docker/CI environments. The user-agent=... argument is critical, Selenium's default UA betrays "HeadlessChrome" instantly.

Selectors: By.* and find_element*

driver.find_element(By.CSS_SELECTOR, ".product-card")
driver.find_element(By.XPATH, "//button[normalize-space()='Submit']")
driver.find_element(By.ID, "main")
driver.find_element(By.TAG_NAME, "h1")
driver.find_element(By.LINK_TEXT, "Sign in")

driver.find_elements(By.CSS_SELECTOR, ".product-card")  # plural, returns list

find_element returns one WebElement (or raises NoSuchElementException). find_elements returns a list (possibly empty). Note: these query the DOM right now. There's no auto-wait. Call them too early and you get nothing or stale references.

Explicit waits, the Selenium way

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

el = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".product-card")))
el = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".product-card")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.submit")))
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".spinner")))
wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, ".status"), "Loaded"))

WebDriverWait polls the condition every 500ms. The condition is a callable (Selenium provides built-in expected_conditions, or you write your own).

You must use these for any element that isn't synchronously present. Skipping them is the #1 cause of flaky Selenium scrapers.

# Custom condition: a lambda that returns truthy when ready
wait.until(lambda d: len(d.find_elements(By.CSS_SELECTOR, ".product-card")) >= 24)

Actions

el = driver.find_element(By.CSS_SELECTOR, "button.submit")
el.click()

input_el = driver.find_element(By.NAME, "email")
input_el.clear()
input_el.send_keys("demo@example.com")

from selenium.webdriver.common.keys import Keys
input_el.send_keys(Keys.RETURN)

click(), clear(), send_keys(), Selenium's verbs. There's no fill (set-and-dispatch); clear() + send_keys() is the manual equivalent.

For advanced interactions:

from selenium.webdriver.common.action_chains import ActionChains

ActionChains(driver) \
  .move_to_element(target) \
  .pause(0.5) \
  .click() \
  .perform()

ActionChains for sequencing pointer/keyboard operations, drag-and-drop, hover-then-click, modifier-key combos. Verbose, but the only way in Selenium.

Why Playwright won

  • Auto-waiting. Selenium requires explicit waits everywhere; Playwright bakes them in.
  • One process, many contexts. Selenium needs separate drivers; Playwright shares.
  • Better network APIs. Playwright has route() and expect_response() built in; Selenium 4 has BiDi protocol support but it's newer and less mature.
  • Better dev experience. Codegen, trace viewer, video recording, all native to Playwright; Selenium relies on third-party tools.

Most new scraping projects start on Playwright. Selenium remains where existing test suites or QA toolchains use it.

When Selenium is still the right call

  1. Legacy codebases. You inherit a 500-line Selenium scraper. Rewriting in Playwright might cost more than maintaining.
  2. Selenium Grid + Saucelabs/BrowserStack. Enterprise QA cloud infrastructure is Selenium-native. Playwright has its own grid (@playwright/test), but Selenium has the established cross-browser cloud.
  3. undetected-chromedriver. A patched ChromeDriver that evades many anti-bot detections. Lesson 2.28 covers it.
  4. W3C WebDriver standardisation. WebDriver is a standard; CDP is a Chromium-only protocol. For genuinely cross-browser scraping (Firefox + Safari + Chrome + IE-relics), Selenium still has the edge.

Hands-on lab

Open /challenges/dynamic/date-picker/custom. Write a Selenium script that opens the date picker, selects a specific date, and reads the resulting input value. Use WebDriverWait + EC.element_to_be_clickable for every interaction, never time.sleep. Compare line counts with the equivalent Playwright script; Selenium will be roughly twice as long.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/date-picker/custom

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Selenium in Python (Legacy but Still Common)1 / 8

What's the main behavioural difference between Selenium's `find_element` and Playwright's `locator`?

Score so far: 0 / 0