Selenium in Python (Legacy but Still Common)
Selenium predates Playwright by a decade and still dominates legacy codebases. Know it well enough to read, port, and not fear inheriting a Selenium scraper.
What you’ll learn
- Install Selenium 4 and a managed driver via `webdriver-manager`.
- Translate Playwright concepts to Selenium: driver, options, explicit waits.
- Write a Selenium scraper using `WebDriverWait` correctly (not `time.sleep`).
- Identify why Playwright won, and where Selenium is still genuinely useful.
Selenium is the original browser automation library. Most "old" scraping tutorials, most enterprise QA suites, and a huge fraction of legacy production scrapers use it. You need to be fluent enough to read inherited code, port it to Playwright when it's worth the effort, and ship working scrapers in shops that have standardised on Selenium.
Install
Selenium 4 includes Selenium Manager, automatic driver downloads:
pip install selenium
That's it. The first time you run a script, Selenium downloads the matching ChromeDriver into a managed cache. No more chromedriver-binary headaches.
If you're on an older codebase that pins Selenium 3, you'll see webdriver_manager or hand-managed driver paths:
pip install selenium webdriver-manager
Your first scraper
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
try:
driver.get("https://practice.scrapingcentral.com/")
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "h1"))
)
h1 = driver.find_element(By.CSS_SELECTOR, "h1").text
print(h1)
finally:
driver.quit()
Compare to Playwright:
page.goto("https://practice.scrapingcentral.com/")
print(page.locator("h1").first.inner_text())
Selenium needs explicit waits everywhere. Playwright bakes them in. That's the headline difference and the main reason teams migrate.
Concept mapping
| Playwright | Selenium |
|---|---|
Browser |
WebDriver (your driver object) |
BrowserContext |
No direct equivalent, use a new WebDriver instance |
Page |
The active document on the driver |
Locator (auto-waiting) |
WebElement (one-shot, stale-prone) |
wait_for_selector |
WebDriverWait + ExpectedConditions |
page.evaluate("...") |
driver.execute_script("...") |
expect_response |
Browser-Mob Proxy or DevTools Protocol setup |
The "no BrowserContext equivalent" is a real gap, to isolate sessions you launch separate drivers, each with their own browser process. Significantly heavier than Playwright contexts.
Driver options
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--window-size=1280,800")
options.add_argument("user-agent=Mozilla/5.0 ...")
driver = webdriver.Chrome(options=options)
--headless=new is the modern flag (the original --headless is being phased out). --no-sandbox and --disable-dev-shm-usage are mandatory in Docker/CI environments. The user-agent=... argument is critical, Selenium's default UA betrays "HeadlessChrome" instantly.
Selectors: By.* and find_element*
driver.find_element(By.CSS_SELECTOR, ".product-card")
driver.find_element(By.XPATH, "//button[normalize-space()='Submit']")
driver.find_element(By.ID, "main")
driver.find_element(By.TAG_NAME, "h1")
driver.find_element(By.LINK_TEXT, "Sign in")
driver.find_elements(By.CSS_SELECTOR, ".product-card") # plural, returns list
find_element returns one WebElement (or raises NoSuchElementException). find_elements returns a list (possibly empty). Note: these query the DOM right now. There's no auto-wait. Call them too early and you get nothing or stale references.
Explicit waits, the Selenium way
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
el = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".product-card")))
el = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".product-card")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.submit")))
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, ".spinner")))
wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, ".status"), "Loaded"))
WebDriverWait polls the condition every 500ms. The condition is a callable (Selenium provides built-in expected_conditions, or you write your own).
You must use these for any element that isn't synchronously present. Skipping them is the #1 cause of flaky Selenium scrapers.
# Custom condition: a lambda that returns truthy when ready
wait.until(lambda d: len(d.find_elements(By.CSS_SELECTOR, ".product-card")) >= 24)
Actions
el = driver.find_element(By.CSS_SELECTOR, "button.submit")
el.click()
input_el = driver.find_element(By.NAME, "email")
input_el.clear()
input_el.send_keys("demo@example.com")
from selenium.webdriver.common.keys import Keys
input_el.send_keys(Keys.RETURN)
click(), clear(), send_keys(), Selenium's verbs. There's no fill (set-and-dispatch); clear() + send_keys() is the manual equivalent.
For advanced interactions:
from selenium.webdriver.common.action_chains import ActionChains
ActionChains(driver) \
.move_to_element(target) \
.pause(0.5) \
.click() \
.perform()
ActionChains for sequencing pointer/keyboard operations, drag-and-drop, hover-then-click, modifier-key combos. Verbose, but the only way in Selenium.
Why Playwright won
- Auto-waiting. Selenium requires explicit waits everywhere; Playwright bakes them in.
- One process, many contexts. Selenium needs separate drivers; Playwright shares.
- Better network APIs. Playwright has
route()andexpect_response()built in; Selenium 4 has BiDi protocol support but it's newer and less mature. - Better dev experience. Codegen, trace viewer, video recording, all native to Playwright; Selenium relies on third-party tools.
Most new scraping projects start on Playwright. Selenium remains where existing test suites or QA toolchains use it.
When Selenium is still the right call
- Legacy codebases. You inherit a 500-line Selenium scraper. Rewriting in Playwright might cost more than maintaining.
- Selenium Grid + Saucelabs/BrowserStack. Enterprise QA cloud infrastructure is Selenium-native. Playwright has its own grid (
@playwright/test), but Selenium has the established cross-browser cloud. undetected-chromedriver. A patched ChromeDriver that evades many anti-bot detections. Lesson 2.28 covers it.- W3C WebDriver standardisation. WebDriver is a standard; CDP is a Chromium-only protocol. For genuinely cross-browser scraping (Firefox + Safari + Chrome + IE-relics), Selenium still has the edge.
Hands-on lab
Open /challenges/dynamic/date-picker/custom. Write a Selenium script that opens the date picker, selects a specific date, and reads the resulting input value. Use WebDriverWait + EC.element_to_be_clickable for every interaction, never time.sleep. Compare line counts with the equivalent Playwright script; Selenium will be roughly twice as long.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/dynamic/date-picker/customQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.