Playwright Waiting Strategies and Selectors
Learn Playwright's waiting strategies and powerful selector engine to build reliable scrapers that handle dynamic content loading.
The most common reason scrapers break is timing. You try to extract content before it has loaded. Playwright solves this with built-in auto-waiting and a rich set of explicit wait methods. Mastering these is the difference between a flaky scraper and a reliable one.
Auto-Waiting
Playwright automatically waits for elements to be actionable before performing actions like click, fill, or type. This means it waits until the element is visible, enabled, and stable. You do not need to add manual sleeps for most interactions.
# Playwright auto-waits for the button to be clickable
page.click("#submit-button")
Explicit Waiting Strategies
When you need to wait for content to appear before extracting data, use explicit waits:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://quotes.toscrape.com/js/")
# Wait for a specific element to appear in the DOM
page.wait_for_selector(".quote")
# Wait for element to be visible (not just in DOM)
page.wait_for_selector(".quote", state="visible")
# Wait for element to be removed
page.wait_for_selector(".loading-spinner", state="hidden")
# Wait for navigation to complete
page.wait_for_load_state("networkidle")
# Wait for a specific URL pattern
page.wait_for_url("**/results**")
# Custom wait with a predicate
page.wait_for_function("document.querySelectorAll('.quote').length > 5")
quotes = page.query_selector_all(".quote .text")
for q in quotes:
print(q.inner_text())
browser.close()
Load State Options
| State | Meaning |
|---|---|
"load" |
The load event has fired (default) |
"domcontentloaded" |
The DOM is fully parsed |
"networkidle" |
No network requests for 500ms |
Powerful Selector Engine
Playwright supports CSS selectors, XPath, text selectors, and its own selector combinators:
# CSS selector
page.query_selector("div.product > span.price")
# Text selector, find element containing exact text
page.query_selector("text=Add to Cart")
# Text substring match
page.query_selector("text=/free shipping/i")
# Combine selectors, find a div that has specific child text
page.query_selector("article:has-text('Python')")
# XPath
page.query_selector("xpath=//table//tr[2]/td[1]")
# nth-match for picking from multiple matches
page.query_selector(":nth-match(.item, 3)")
Timeout Configuration
Set custom timeouts to avoid hanging on slow pages:
# Per-action timeout
page.wait_for_selector(".results", timeout=15000)
# Global default timeout
page.set_default_timeout(30000)
Tip for Large-Scale Scraping
If you find yourself fighting with wait strategies because sites are slow or inconsistent, ScraperAPI handles rendering and waiting on their infrastructure, returning fully loaded HTML. This can simplify your scraping pipeline significantly.
Next Steps
- Handle popups and dialogs that interrupt scraping
- Learn to scrape infinite scroll pages
- Intercept network requests for direct data access