Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.20intermediate5 min read

Modals, Popups, Cookie Banners, Auto-Dismissing

Every modern site throws three to five overlays at your scraper before you reach the content. Recognise them, dismiss them, ignore them, without breaking the scrape.

What you’ll learn

  • Categorise overlays: cookie banner, marketing popup, login wall, newsletter sign-up, geolocation prompt.
  • Auto-dismiss using best-effort handlers that don't fail if the modal is absent.
  • Choose between dismissing once vs. blocking the modal's render via JS / cookies.
  • Bypass overlays entirely by setting the cookie they're trying to plant.

The first thing a real user does on most sites is dismiss two or three overlays. Your scraper has to do the same, but unlike a user, it has to handle modals that may or may not appear, may appear after a delay, and may block clicks until dismissed. This lesson is the playbook.

The four overlay categories

Type Trigger Usual dismiss
Cookie/GDPR banner Page load "Accept" button or sometimes "Decline"
Marketing popup After N seconds or scroll % Close × or "No thanks"
Login wall After M page views, or content gated Cannot dismiss without auth, handle differently
Newsletter / app prompt On exit intent or first visit Close ×

Each has a different lifecycle, but the patterns for handling them are similar.

The best-effort handler pattern

The mistake most scrapers make: writing code that fails when the modal isn't there.

# Bad: errors out if no banner today
page.locator("button.accept-cookies").click()

The right pattern is to dismiss if present, ignore if not:

def dismiss_if_present(page, selector, timeout=2000):
  try:
  page.locator(selector).first.click(timeout=timeout)
  except Exception:
  pass

dismiss_if_present(page, "button.accept-cookies")
dismiss_if_present(page, ".marketing-modal .close")
dismiss_if_present(page, ".newsletter-popup button[aria-label='Close']")

A short timeout (2-5 seconds) plus a swallowed exception. The scraper continues whether or not the modal showed up.

Pre-emptive dismissal: set the cookie

For cookie banners specifically, the better approach: skip the dialog entirely by setting the cookie the dialog plants. View the cookies in DevTools after clicking "Accept", you'll see something like cookieConsent=1 or gdpr_accepted=true. Set it before navigation:

context = browser.new_context()
context.add_cookies([{
  "name": "cookieConsent",
  "value": "1",
  "domain": "practice.scrapingcentral.com",
  "path": "/",
}])
page = context.new_page()
page.goto("https://practice.scrapingcentral.com/challenges/dynamic/modals/cookie-banner")
# Cookie banner never appears; content renders directly.

Same trick for newsletter dismissal cookies, "I'm 18+" age gates, geo-acknowledgement banners. Inspect what the dismiss-click sets, then set it directly.

Blocking the modal's render

When you can't pre-set a cookie, the next-best option is to prevent the modal from ever showing. Two approaches:

1. CSS injection.

page.add_style_tag(content="""
  .marketing-modal.newsletter-popup,
  [class*='cookie-banner'] {
  display: none !important;
  }
""")

The modal still renders into the DOM, but is invisible and doesn't intercept clicks.

2. JS removal.

page.add_init_script("""
  new MutationObserver((muts) => {
  document.querySelectorAll('.marketing-modal.newsletter-popup').forEach(el => el.remove());
  }).observe(document.body, { childList: true, subtree: true });
""")

add_init_script runs before any page script. The observer removes the modal the instant it appears. More invasive but more reliable.

Handling the login wall

Login walls are different, you can't just dismiss them. Three strategies:

  1. Authenticate. Lesson 2.25 covers persistent contexts and stored sessions.
  2. Find the API. The login wall protects the UI; sometimes the underlying API is less protected. Check Network for unauthenticated XHRs.
  3. Use the Google-cache trick or archive.org. Sometimes the content is mirrored elsewhere without the wall. Of declining value as both have tightened access.

The first option is the right one for production. The others are workarounds.

Order matters

Some sites cascade: the cookie banner blocks clicks on the marketing popup, which blocks clicks on the content. Dismiss them top-to-bottom in render order:

page.goto(url)
dismiss_if_present(page, "button.accept-cookies", 3000)
dismiss_if_present(page, ".marketing-modal .close")
dismiss_if_present(page, ".newsletter-popup .close")
# Now the content is reachable

If your scraper is timing out on a click and the screenshot shows an overlay on top, this is almost certainly the cause. Dismiss the overlay first.

Dialog vs modal vs overlay vs toast

The terminology shifts. For Playwright purposes:

What you see What Playwright treats it as
Browser-native alert(), confirm(), prompt() A Dialog, handle via page.on('dialog'...)
In-page React/Vue modal A regular DOM element, query and click
Cookie banner anchored to the bottom DOM element
Toast notification (briefly visible) DOM element, usually self-dismisses

Browser-native dialogs are special: they pause page execution until handled. They require:

page.on("dialog", lambda d: d.accept())  # or d.dismiss()

You must register the listener before the action that triggers the dialog. Once accepted/dismissed, the page continues.

A reusable helper

class OverlayDismisser:
  """Dismiss common overlays best-effort."""

  DEFAULT_SELECTORS = [
  "button:has-text('Accept all')",
  "button:has-text('Accept')",
  "[class*='cookie'] button[class*='accept']",
  "[class*='cookie'] button[aria-label*='Accept']",
  ".marketing-modal .close",
  ".newsletter-popup button[aria-label='Close']",
  "[class*='popup'] button[aria-label='Close']",
  ]

  def __init__(self, page, extra=None, timeout=2000):
  self.page = page
  self.selectors = list(self.DEFAULT_SELECTORS) + (extra or [])
  self.timeout = timeout

  def run(self):
  for sel in self.selectors:
  try:
  self.page.locator(sel).first.click(timeout=self.timeout)
  except Exception:
  continue

OverlayDismisser(page).run()

Plug it in after every page.goto. The defaults cover ~70% of common overlays; add site-specific selectors via the extra argument.

Hands-on lab

Open /challenges/dynamic/modals/cookie-banner. Write a scraper that: (1) navigates to the page, (2) handles the cookie banner via the best-effort pattern, (3) reads the underlying content. Then look at what cookie the banner sets when accepted, and rewrite the scraper to pre-set that cookie before goto, the banner shouldn't appear at all. Verify both approaches produce the same content.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/modals/cookie-banner

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Modals, Popups, Cookie Banners, Auto-Dismissing1 / 8

Why is the BEST-EFFORT dismiss pattern (try/except + short timeout) preferred over a blocking click?

Score so far: 0 / 0