Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.6intermediate5 min read

Locators: The Auto-Waiting Magic

Locators are Playwright's most important abstraction. They auto-wait, auto-retry, and eliminate the entire category of timing bugs that plague Selenium.

What you’ll learn

  • Explain what a locator is and how it differs from a one-shot element handle.
  • List the conditions a locator auto-waits for before acting (attached, visible, enabled, stable).
  • Use chained locators to scope queries reliably.
  • Recognise when auto-waiting is NOT enough, and which explicit waits to add.

The single biggest reason Playwright eats Selenium for lunch is the locator. A locator isn't a reference to a DOM element, it's a lazy query that re-runs every time you act on it, with automatic waiting baked in. Once you internalise this, most of your timing bugs disappear.

Locator vs ElementHandle

Selenium has find_element(). It returns a live reference to a DOM node as it existed at that moment. If the React framework re-renders five milliseconds later, your reference is stale and throws StaleElementReferenceException. You spend half your Selenium career writing retry loops around this.

Playwright's locator(selector) returns no element. It returns a description of how to find an element. Every action on a locator re-queries the DOM, so re-renders don't break you:

button = page.locator("button.submit")
button.click()  # queries the DOM right now, finds the button, clicks
# ...React tears down and rebuilds the button...
button.click()  # re-queries from scratch, finds the new button, clicks

There's still an ElementHandle API (page.query_selector(...)) for when you need a frozen reference, for example, to use the same node across multiple frames of an animation. But for 95% of scraping work, locators are what you want.

What auto-waiting actually waits for

When you call locator.click(), Playwright waits, by default up to 30 seconds, for the element to satisfy actionability checks:

Check Meaning
Attached The node exists in the DOM.
Visible Non-empty bounding box, not display:none, not visibility:hidden.
Stable Hasn't moved or resized in the last two animation frames.
Receives events No other element on top intercepting clicks.
Enabled Not disabled (for form controls).

Only after all five are true does the click actually fire. This eliminates 90% of timing bugs without any explicit wait. No more time.sleep(2). No more "click failed because the button was still animating."

Different actions wait for different subsets:

  • click() / dblclick(), all five checks.
  • fill(), attached + visible + enabled + editable.
  • inner_text() / text_content(), attached only (no visibility check).
  • is_visible() / is_enabled(), attached only; returns the current state.

Chained locators

Locators chain. Each call narrows the scope:

# Find the .product-card containing the text "Mug", then click its .add-to-cart button
page.locator(".product-card").filter(has_text="Mug").locator(".add-to-cart").click()

This reads like English. You can build deeply scoped selectors without writing fragile compound CSS.

Useful chaining methods:

Method What it does
.locator(selector) Scope further: find selector within the current match.
.filter(has_text="...") Keep only matches whose text content contains the string.
.filter(has=other_locator) Keep only matches that contain other_locator.
.first / .last / .nth(i) Pick one of N matches.
.all() Materialise the matches into a list of locators.

filter is especially useful. The pattern "the row where the name column says X, click the delete button in that row" is two locators and a .filter().

Built-in selectors

Beyond raw CSS and XPath, Playwright understands a handful of high-level matchers:

page.get_by_role("button", name="Submit")  # ARIA role + accessible name
page.get_by_text("Add to cart")  # visible text content
page.get_by_label("Email")  # form field by its label
page.get_by_placeholder("Search...")  # input placeholder
page.get_by_alt_text("Logo")  # image alt text
page.get_by_title("Help")  # element title attribute
page.get_by_test_id("submit-btn")  # data-testid attribute

These are the recommended default selectors for stability. They survive cosmetic redesigns: a class name might change, but the ARIA role of a "Submit" button doesn't. We dig into the full strategy in Lesson 2.7.

Auto-waiting is NOT enough

Don't rely on auto-wait as your only synchronization. Three cases need explicit waits:

1. Waiting for content that appears via XHR.

page.goto("https://practice.scrapingcentral.com/challenges/dynamic/lazy-images")
page.wait_for_selector(".product-card img[src]:not([src^='data:'])")

The <img> tags exist immediately (auto-wait would succeed), but their src is a placeholder until the lazy-load fires. You need to wait for the attribute condition, not the element.

2. Waiting for a network response.

with page.expect_response("**/api/products*") as resp:
  page.click("text=Load more")
response = resp.value
data = response.json()

Auto-wait doesn't know about your custom API call; expressing the wait in terms of the network is more reliable than guessing about DOM updates.

3. Waiting for a function to return true.

page.wait_for_function("() => document.querySelectorAll('.product-card').length >= 20")

The DOM-side equivalent of a custom condition. Useful for "wait until at least 20 products have loaded."

Timeouts: be specific

Default timeout is 30 seconds. Override per-call:

page.locator(".product-card").click(timeout=5000)
page.wait_for_selector(".loaded", timeout=10000, state="visible")

Or per-context:

context.set_default_timeout(10000)  # 10s for everything in this context

Production scrapers should set this lower than 30s, a 30-second wait usually means something is broken, and you want to fail fast and retry cleanly.

The "strict mode" violation

Playwright errors loudly if a locator matches multiple elements when you expected one:

Error: locator.click: strict mode violation:
  page.locator(".btn") resolved to 3 elements

This is a feature, not a bug. It catches selectors that are accidentally too broad. Fix it by adding more specificity (.first, a filter, a closer container), never by ignoring the warning.

Hands-on lab

Open /challenges/dynamic/lazy-images. Write a script that opens the page and counts how many product images have a non-placeholder src. Compare two approaches: (a) call count() immediately after goto(), and (b) use wait_for_selector(".product-card img:not([src^='data:'])") first. You should see (a) return zero or a small number and (b) return all of them. That's auto-waiting hitting its limit, the image element exists immediately, but its attribute doesn't.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/lazy-images

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Locators: The Auto-Waiting Magic1 / 8

What does `page.locator('button.submit')` return?

Score so far: 0 / 0