Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.9intermediate5 min read

Waiting Strategies (The Make-or-Break Skill)

Time-based sleeps are the #1 cause of flaky scrapers. Replace them with the four deterministic wait primitives Playwright provides.

What you’ll learn

  • List the five Playwright wait primitives and when each is appropriate.
  • Never write `time.sleep()` in a scraper again.
  • Synchronise on the right signal: DOM state, network response, or custom function.
  • Tune timeouts to fail fast and retry, instead of waiting 30 seconds for nothing.

If you only learn one skill from this sub-path, learn this: never use time.sleep() in a scraper. Sleeps are guesses, and guesses are the root cause of every flaky scraper in production. Playwright gives you four deterministic wait primitives, once you know which to reach for, your scrapers stop breaking on slow networks.

Why time.sleep is wrong

page.click("button.load-more")
time.sleep(2)  # hope it's loaded
data = page.locator(".item").all()

Three problems:

  1. Too short. On a slow network, 2 seconds isn't enough. You get partial data and don't know.
  2. Too long. On a fast network, you waste 1.9 seconds per click. Over thousands of pages, that's hours.
  3. Wrong signal. Time doesn't tell you whether the content arrived. You're synchronising on the wall clock, not on the actual event.

Every Playwright wait method waits for something specific and returns as soon as it happens. That's the whole game.

The five wait primitives

Primitive Synchronises on Use when
Auto-wait inside actions Element actionability Almost always, built into click, fill, etc.
wait_for_selector DOM state of a selector Element should appear/disappear/change state
wait_for_load_state Page lifecycle event Cross-cutting page-level event
expect_response / expect_request Network event matching a URL Data arrives via XHR/Fetch
wait_for_function Arbitrary JS predicate Custom condition, count of items, value of a variable

Together they cover every wait you'll need.

wait_for_selector

The most common after auto-wait. Wait for an element to reach a specific state:

page.wait_for_selector(".product-card", state="visible", timeout=10000)
page.wait_for_selector(".spinner", state="hidden")  # wait until spinner is gone
page.wait_for_selector(".error-banner", state="attached")  # exists in DOM, may not be visible
page.wait_for_selector(".tooltip", state="detached")  # removed from DOM

States:

  • attached, in the DOM (default).
  • detached, not in the DOM.
  • visible, in the DOM and visible.
  • hidden, either not in the DOM or hidden.

state="hidden" is gold for spinner-watching: wait until the loader is gone, then read.

wait_for_load_state

Page-lifecycle synchronisation:

page.wait_for_load_state("domcontentloaded")
page.wait_for_load_state("load")
page.wait_for_load_state("networkidle")
  • domcontentloaded, HTML parsed; subresources may still be loading.
  • load, load event fired; main subresources loaded.
  • networkidle, 500ms with no network activity.

networkidle is convenient but flaky on sites with analytics beacons, long-poll WebSockets, or live-update streams. Prefer domcontentloaded + a specific selector wait.

expect_response

The most powerful wait when data arrives via XHR/Fetch:

with page.expect_response("**/api/products*") as resp_info:
  page.click("text=Load more")
response = resp_info.value
data = response.json()

You declare what you're waiting for (a response URL pattern) and Playwright captures it. The with block runs your trigger action; the response is available after exit.

Variants:

# Match by predicate function
with page.expect_response(lambda r: r.url.endswith("/products") and r.status == 200):
  page.click("text=Load more")

# Wait for a request even if no response yet
with page.expect_request("**/api/checkout"):
  page.click("text=Buy")

Use this whenever a UI action triggers a known API call. It's deterministic, the wait returns the instant the response arrives, with the response payload in hand.

wait_for_function

Custom predicates that run in the browser:

page.wait_for_function("() => document.querySelectorAll('.product-card').length >= 24")
page.wait_for_function("() => window.__APP_READY__ === true")
page.wait_for_function("(target) => document.title === target", arg="Catalog108 – Products")

wait_for_function polls the predicate inside the browser context. Returns when truthy. Useful for:

  • Waiting on a global state flag the app sets.
  • Waiting for N items to load (not just one).
  • Waiting on conditions that don't map cleanly to a single selector.

Race patterns

Sometimes you don't know which signal will fire first, success or error:

import asyncio

async def main():
  page = ...
  await page.click("button.submit")
  result = await page.wait_for_selector(".success.error")
  if "success" in (await result.get_attribute("class") or ""):
  print("ok")
  else:
  print("failed")

A comma in a CSS selector is an OR, Playwright's wait_for_selector returns when either appears. Cheaper than racing two wait_for calls in parallel.

Tuning timeouts: fail fast

Default timeout is 30 seconds. For most scraping, that's too long:

context.set_default_timeout(8000)  # 8s for everything
context.set_default_navigation_timeout(15000)

A 30-second wait usually means the scrape is broken, slow networks rarely take that long. Set 5–10s, fail fast, retry the whole page. Better than hanging on a 28-second timeout.

Per-call overrides for legitimate slow cases:

page.wait_for_selector(".big-export", timeout=60000)  # this one is genuinely slow

The pattern: act → wait → assert

Every interaction follows this shape:

page.click("button.load-more")
page.wait_for_selector(".product-card:nth-of-type(48)")
count = page.locator(".product-card").count()
assert count == 48
  1. Act. Click, fill, navigate.
  2. Wait. Synchronise on the specific change you expect.
  3. Assert. Verify the change actually happened.

The assert is non-optional. Otherwise a silent failure (wrong selector, network timeout swallowed somewhere) goes unnoticed and your scrape ships bad data.

A small but important warning

page.wait_for_timeout(ms) does exist in the API. It is a literal setTimeout. Don't use it. It's there for debugging only. If you find yourself reaching for it, ask which actual signal you should be waiting for instead.

Hands-on lab

Open /challenges/dynamic/auto-typed/animated. The page types text character-by-character via JS. Write three versions of a scraper that captures the final text: (1) with time.sleep(5), (2) with wait_for_function checking the text length, (3) with wait_for_selector watching for a "done" indicator. Time all three. The function-based version should win on both speed and reliability.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/auto-typed/animated

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Waiting Strategies (The Make-or-Break Skill)1 / 8

What is the main reason `time.sleep(2)` should NEVER appear in a production scraper?

Score so far: 0 / 0