Selenium in Python (Legacy but Still Common), Dynamic Web & Browser Automation

Selenium predates Playwright by a decade and still dominates legacy codebases. Know it well enough to read, port, and not fear inheriting a Selenium scraper.

Selenium is the original browser automation library. Most "old" scraping tutorials, most enterprise QA suites, and a huge fraction of legacy production scrapers use it. You need to be fluent enough to read inherited code, port it to Playwright when it's worth the effort, and ship working scrapers in shops that have standardised on Selenium.

Install

Selenium 4 includes Selenium Manager, automatic driver downloads:

pip install selenium

That's it. The first time you run a script, Selenium downloads the matching ChromeDriver into a managed cache. No more chromedriver-binary headaches.

If you're on an older codebase that pins Selenium 3, you'll see webdriver_manager or hand-managed driver paths:

pip install selenium webdriver-manager

Your first scraper

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
try:
  driver.get("https://practice.scrapingcentral.com/")
  WebDriverWait(driver, 10).until(
  EC.presence_of_element_located((By.CSS_SELECTOR, "h1"))
  )
  h1 = driver.find_element(By.CSS_SELECTOR, "h1").text
  print(h1)
finally:
  driver.quit()

Compare to Playwright:

page.goto("https://practice.scrapingcentral.com/")
print(page.locator("h1").first.inner_text())

Selenium needs explicit waits everywhere. Playwright bakes them in. That's the headline difference and the main reason teams migrate.

Concept mapping

Playwright	Selenium
`Browser`	`WebDriver` (your `driver` object)
`BrowserContext`	No direct equivalent, use a new `WebDriver` instance
`Page`	The active document on the driver
`Locator` (auto-waiting)	`WebElement` (one-shot, stale-prone)
`wait_for_selector`	`WebDriverWait + ExpectedConditions`
`page.evaluate("...")`	`driver.execute_script("...")`
`expect_response`	Browser-Mob Proxy or DevTools Protocol setup

The "no BrowserContext equivalent" is a real gap, to isolate sessions you launch separate drivers, each with their own browser process. Significantly heavier than Playwright contexts.

Driver options

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless=new")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--window-size=1280,800")
options.add_argument("user-agent=Mozilla/5.0 ...")

driver = webdriver.Chrome(options=options)

--headless=new is the modern flag (the original --headless is being phased out). --no-sandbox and --disable-dev-shm-usage are mandatory in Docker/CI environments. The user-agent=... argument is critical, Selenium's default UA betrays "HeadlessChrome" instantly.

Selectors: By.* and find_element*

driver.find_element(By.CSS_SELECTOR, ".product-card")
driver.find_element(By.XPATH, "//button[normalize-space()='Submit']")
driver.find_element(By.ID, "main")
driver.find_element(By.TAG_NAME, "h1")
driver.find_element(By.LINK_TEXT, "Sign in")

driver.find_elements(By.CSS_SELECTOR, ".product-card")  # plural, returns list

find_element returns one WebElement (or raises NoSuchElementException). find_elements returns a list (possibly empty). Note: these query the DOM right now. There's no auto-wait. Call them too early and you get nothing or stale references.

Explicit waits, the Selenium way

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

el = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".product-card")))
el = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".product-card")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.submit")))
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".spinner")))
wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, ".status"), "Loaded"))

WebDriverWait polls the condition every 500ms. The condition is a callable (Selenium provides built-in expected_conditions, or you write your own).

You must use these for any element that isn't synchronously present. Skipping them is the #1 cause of flaky Selenium scrapers.

# Custom condition: a lambda that returns truthy when ready
wait.until(lambda d: len(d.find_elements(By.CSS_SELECTOR, ".product-card")) >= 24)

Actions

el = driver.find_element(By.CSS_SELECTOR, "button.submit")
el.click()

input_el = driver.find_element(By.NAME, "email")
input_el.clear()
input_el.send_keys("demo@example.com")

from selenium.webdriver.common.keys import Keys
input_el.send_keys(Keys.RETURN)

click(), clear(), send_keys(), Selenium's verbs. There's no fill (set-and-dispatch); clear() + send_keys() is the manual equivalent.

For advanced interactions:

from selenium.webdriver.common.action_chains import ActionChains

ActionChains(driver) \
  .move_to_element(target) \
  .pause(0.5) \
  .click() \
  .perform()

ActionChains for sequencing pointer/keyboard operations, drag-and-drop, hover-then-click, modifier-key combos. Verbose, but the only way in Selenium.

Why Playwright won

Auto-waiting. Selenium requires explicit waits everywhere; Playwright bakes them in.
One process, many contexts. Selenium needs separate drivers; Playwright shares.
Better network APIs. Playwright has route() and expect_response() built in; Selenium 4 has BiDi protocol support but it's newer and less mature.
Better dev experience. Codegen, trace viewer, video recording, all native to Playwright; Selenium relies on third-party tools.

Most new scraping projects start on Playwright. Selenium remains where existing test suites or QA toolchains use it.

When Selenium is still the right call

Legacy codebases. You inherit a 500-line Selenium scraper. Rewriting in Playwright might cost more than maintaining.
Selenium Grid + Saucelabs/BrowserStack. Enterprise QA cloud infrastructure is Selenium-native. Playwright has its own grid (@playwright/test), but Selenium has the established cross-browser cloud.
undetected-chromedriver. A patched ChromeDriver that evades many anti-bot detections. Lesson 2.28 covers it.
W3C WebDriver standardisation. WebDriver is a standard; CDP is a Chromium-only protocol. For genuinely cross-browser scraping (Firefox + Safari + Chrome + IE-relics), Selenium still has the edge.

Hands-on lab

Open /challenges/dynamic/date-picker/custom. Write a Selenium script that opens the date picker, selects a specific date, and reads the resulting input value. Use WebDriverWait + EC.element_to_be_clickable for every interaction, never time.sleep. Compare line counts with the equivalent Playwright script; Selenium will be roughly twice as long.

Selenium in Python (Legacy but Still Common)

What you’ll learn

Install

Your first scraper

Concept mapping

Driver options

Selectors: By.* and find_element*

Explicit waits, the Selenium way

Actions

Why Playwright won

When Selenium is still the right call

Hands-on lab

Hands-on lab

Quiz, check your understanding

What's the main behavioural difference between Selenium's `find_element` and Playwright's `locator`?