Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Playwright Waiting Strategies and Selectors

Learn Playwright's waiting strategies and powerful selector engine to build reliable scrapers that handle dynamic content loading.

Browser Automation · #4intermediate2 min read
Share:WhatsAppLinkedIn

The most common reason scrapers break is timing. You try to extract content before it has loaded. Playwright solves this with built-in auto-waiting and a rich set of explicit wait methods. Mastering these is the difference between a flaky scraper and a reliable one.

Auto-Waiting

Playwright automatically waits for elements to be actionable before performing actions like click, fill, or type. This means it waits until the element is visible, enabled, and stable. You do not need to add manual sleeps for most interactions.

# Playwright auto-waits for the button to be clickable
page.click("#submit-button")

Explicit Waiting Strategies

When you need to wait for content to appear before extracting data, use explicit waits:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://quotes.toscrape.com/js/")

    # Wait for a specific element to appear in the DOM
    page.wait_for_selector(".quote")

    # Wait for element to be visible (not just in DOM)
    page.wait_for_selector(".quote", state="visible")

    # Wait for element to be removed
    page.wait_for_selector(".loading-spinner", state="hidden")

    # Wait for navigation to complete
    page.wait_for_load_state("networkidle")

    # Wait for a specific URL pattern
    page.wait_for_url("**/results**")

    # Custom wait with a predicate
    page.wait_for_function("document.querySelectorAll('.quote').length > 5")

    quotes = page.query_selector_all(".quote .text")
    for q in quotes:
        print(q.inner_text())

    browser.close()

Load State Options

State Meaning
"load" The load event has fired (default)
"domcontentloaded" The DOM is fully parsed
"networkidle" No network requests for 500ms

Powerful Selector Engine

Playwright supports CSS selectors, XPath, text selectors, and its own selector combinators:

# CSS selector
page.query_selector("div.product > span.price")

# Text selector, find element containing exact text
page.query_selector("text=Add to Cart")

# Text substring match
page.query_selector("text=/free shipping/i")

# Combine selectors, find a div that has specific child text
page.query_selector("article:has-text('Python')")

# XPath
page.query_selector("xpath=//table//tr[2]/td[1]")

# nth-match for picking from multiple matches
page.query_selector(":nth-match(.item, 3)")

Timeout Configuration

Set custom timeouts to avoid hanging on slow pages:

# Per-action timeout
page.wait_for_selector(".results", timeout=15000)

# Global default timeout
page.set_default_timeout(30000)

Tip for Large-Scale Scraping

If you find yourself fighting with wait strategies because sites are slow or inconsistent, ScraperAPI handles rendering and waiting on their infrastructure, returning fully loaded HTML. This can simplify your scraping pipeline significantly.

Next Steps

  • Handle popups and dialogs that interrupt scraping
  • Learn to scrape infinite scroll pages
  • Intercept network requests for direct data access