Locator Strategies: CSS, XPath, Role, Text, Test-ID, Dynamic Web & Browser Automation

Choosing the right selector type is the single biggest factor in scraper stability. A clear hierarchy of which to prefer, when, and why.

Every scraper bug eventually boils down to a selector that broke. Choosing the right selector type for the right job is the single biggest factor in long-term scraper stability. This lesson is the hierarchy you should internalise.

The stability hierarchy

Ordered from most stable to least:

data-testid attributes, engineers add these explicitly so tests can target them. They rarely change.
ARIA roles + accessible names, anchored in semantic HTML, survive cosmetic redesigns.
Visible text content, humans see and react to text, so it changes less aggressively than CSS classes.
Stable structural CSS, <main>, <article>, semantic landmarks.
Class-name CSS, fine when classes are meaningful (.product-card), bad when they're generated (.css-3a7e1b).
XPath with positional logic, div[2]/div[1] style, breaks on any DOM reshuffle.
Coordinates / pixel positions, never use unless you have no choice.

Use the highest level that works. Drop a step only when you must.

CSS selectors

Playwright's default selector engine. Most scraping selectors should be CSS:

page.locator(".product-card")  # class
page.locator("#main-nav")  # id
page.locator("a[href*='/products/']")  # attribute contains
page.locator("article > h2")  # direct child
page.locator("[data-status='in-stock']")  # custom attribute
page.locator(".product-card:has(.sale-badge)") # :has() pseudo-class

CSS is fast and well-supported. The :has() pseudo-class is a recent addition that handles "parent matching child" cleanly, formerly the only place XPath beat CSS.

XPath: when to use it

XPath is more powerful than CSS in two cases:

# 1. Match by visible text content
page.locator("xpath=//button[normalize-space()='Add to cart']")

# 2. Walk up the tree (`ancestor::`)
page.locator("xpath=//span[text()='Price']/ancestor::tr[1]")

Playwright auto-detects XPath when the selector starts with /, //, or xpath=. For everything XPath uniquely solves, CSS now has equivalents:

Goal	XPath	CSS equivalent
Match text	`//button[text()='X']`	`text="X"` (Playwright extension)
Has child	`//div[.//span]`	`div:has(span)`
Direct child	`/div/h1`	`div > h1`

Use XPath only when you need true tree-walking. For everything else, prefer CSS or text matchers.

Role-based locators

Role selectors target the semantic structure of the page, what the element means, not how it looks:

page.get_by_role("button", name="Add to cart")
page.get_by_role("link", name="See all products")
page.get_by_role("heading", name="Featured")
page.get_by_role("textbox", name="Search")
page.get_by_role("checkbox", name="Accept terms")
page.get_by_role("listitem")

The name argument matches the accessible name, usually the visible label, sometimes an aria-label. Roles are stable across redesigns because they're tied to the element's purpose, not its CSS.

Roles are also forgiving: a <button> and a <div role="button"> both match get_by_role("button").

Text-based locators

page.get_by_text("Add to cart")  # exact-ish
page.get_by_text("Add to", exact=False)  # substring
page.get_by_text(re.compile(r"^Add to "))  # regex

get_by_text matches against visible (rendered) text. Useful for clicking elements whose only stable identifier is the words inside them.

Caveats:

It matches the closest ancestor containing the text. If your "Add to cart" string is in a <span> inside a <button>, you'll get the button (usually what you want).
For non-button elements, prefer get_by_role if available, text appears more places than you think.

Test-ID locators

<button data-testid="submit-order">Order now</button>

page.get_by_test_id("submit-order")

When a site has data-testid attributes, use them. They're added precisely so testing tools (like yours) can target elements stably. The attribute survives most code changes because engineers know "tests reference this string."

Configure the attribute name if the site uses something else (data-test, data-cy):

playwright.selectors.set_test_id_attribute("data-test")

Real example: same element, five ways

The "Add to cart" button on /products/1-white-wooden-vase:

# 1. Test-ID (best if present)
page.get_by_test_id("add-to-cart")

# 2. Role + accessible name (recommended default)
page.get_by_role("button", name="Add to cart")

# 3. Text
page.get_by_text("Add to cart")

# 4. CSS by class
page.locator("button.add-to-cart")

# 5. XPath
page.locator("xpath=//button[normalize-space()='Add to cart']")

If the site redesigns its CSS, #4 breaks. If the wording changes to "Add to bag", #3 and #5 break. If the DOM structure changes around it, only #1 and #2 are still rock-solid.

Three brittle patterns to never write

nth-child / nth-of-type based on position.

page.locator("ul > li:nth-child(3) > a")  # breaks if a row is added

Generated class names.

page.locator(".sc-bdVaJa.kxjJDU")  # breaks every CSS rebuild

These are CSS-in-JS hashes. They change every deployment.

Long XPath positional chains.

page.locator("xpath=/html/body/div/div[2]/div[1]/div/a")  # breaks on any structural change

Auto-generated by "Copy XPath" in DevTools. Always rewrite before committing.

Scoping with chained locators

Instead of one long CSS selector, chain locators:

# Bad: one fragile compound selector
page.locator("table tbody tr:has-text('Yellow Mug') td:nth-child(3) button")

# Good: layered, each step has meaning
row = page.locator("tr").filter(has_text="Yellow Mug")
row.locator("button", has_text="Delete").click()

Chained locators document intent and isolate failure: if the second step breaks, you know exactly which scope didn't resolve.

Hands-on lab

Open /products/1-white-wooden-vase. Write five different locators for the "Add to cart" button, one each of: test-id, role, text, CSS, XPath. Use Playwright's page.locator(sel).count() to verify each resolves to exactly one element. Note which selectors are most readable. That readability is your future-self thanking you.

Locator Strategies: CSS, XPath, Role, Text, Test-ID

What you’ll learn

The stability hierarchy

CSS selectors

XPath: when to use it

Role-based locators

Text-based locators

Test-ID locators

Real example: same element, five ways

Three brittle patterns to never write

Scoping with chained locators

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which selector strategy is MOST stable across site redesigns?