Modals, Popups, Cookie Banners, Auto-Dismissing
Every modern site throws three to five overlays at your scraper before you reach the content. Recognise them, dismiss them, ignore them, without breaking the scrape.
What you’ll learn
- Categorise overlays: cookie banner, marketing popup, login wall, newsletter sign-up, geolocation prompt.
- Auto-dismiss using best-effort handlers that don't fail if the modal is absent.
- Choose between dismissing once vs. blocking the modal's render via JS / cookies.
- Bypass overlays entirely by setting the cookie they're trying to plant.
The first thing a real user does on most sites is dismiss two or three overlays. Your scraper has to do the same, but unlike a user, it has to handle modals that may or may not appear, may appear after a delay, and may block clicks until dismissed. This lesson is the playbook.
The four overlay categories
| Type | Trigger | Usual dismiss |
|---|---|---|
| Cookie/GDPR banner | Page load | "Accept" button or sometimes "Decline" |
| Marketing popup | After N seconds or scroll % | Close × or "No thanks" |
| Login wall | After M page views, or content gated | Cannot dismiss without auth, handle differently |
| Newsletter / app prompt | On exit intent or first visit | Close × |
Each has a different lifecycle, but the patterns for handling them are similar.
The best-effort handler pattern
The mistake most scrapers make: writing code that fails when the modal isn't there.
# Bad: errors out if no banner today
page.locator("button.accept-cookies").click()
The right pattern is to dismiss if present, ignore if not:
def dismiss_if_present(page, selector, timeout=2000):
try:
page.locator(selector).first.click(timeout=timeout)
except Exception:
pass
dismiss_if_present(page, "button.accept-cookies")
dismiss_if_present(page, ".marketing-modal .close")
dismiss_if_present(page, ".newsletter-popup button[aria-label='Close']")
A short timeout (2-5 seconds) plus a swallowed exception. The scraper continues whether or not the modal showed up.
Pre-emptive dismissal: set the cookie
For cookie banners specifically, the better approach: skip the dialog entirely by setting the cookie the dialog plants. View the cookies in DevTools after clicking "Accept", you'll see something like cookieConsent=1 or gdpr_accepted=true. Set it before navigation:
context = browser.new_context()
context.add_cookies([{
"name": "cookieConsent",
"value": "1",
"domain": "practice.scrapingcentral.com",
"path": "/",
}])
page = context.new_page()
page.goto("https://practice.scrapingcentral.com/challenges/dynamic/modals/cookie-banner")
# Cookie banner never appears; content renders directly.
Same trick for newsletter dismissal cookies, "I'm 18+" age gates, geo-acknowledgement banners. Inspect what the dismiss-click sets, then set it directly.
Blocking the modal's render
When you can't pre-set a cookie, the next-best option is to prevent the modal from ever showing. Two approaches:
1. CSS injection.
page.add_style_tag(content="""
.marketing-modal.newsletter-popup,
[class*='cookie-banner'] {
display: none !important;
}
""")
The modal still renders into the DOM, but is invisible and doesn't intercept clicks.
2. JS removal.
page.add_init_script("""
new MutationObserver((muts) => {
document.querySelectorAll('.marketing-modal.newsletter-popup').forEach(el => el.remove());
}).observe(document.body, { childList: true, subtree: true });
""")
add_init_script runs before any page script. The observer removes the modal the instant it appears. More invasive but more reliable.
Handling the login wall
Login walls are different, you can't just dismiss them. Three strategies:
- Authenticate. Lesson 2.25 covers persistent contexts and stored sessions.
- Find the API. The login wall protects the UI; sometimes the underlying API is less protected. Check Network for unauthenticated XHRs.
- Use the Google-cache trick or archive.org. Sometimes the content is mirrored elsewhere without the wall. Of declining value as both have tightened access.
The first option is the right one for production. The others are workarounds.
Order matters
Some sites cascade: the cookie banner blocks clicks on the marketing popup, which blocks clicks on the content. Dismiss them top-to-bottom in render order:
page.goto(url)
dismiss_if_present(page, "button.accept-cookies", 3000)
dismiss_if_present(page, ".marketing-modal .close")
dismiss_if_present(page, ".newsletter-popup .close")
# Now the content is reachable
If your scraper is timing out on a click and the screenshot shows an overlay on top, this is almost certainly the cause. Dismiss the overlay first.
Dialog vs modal vs overlay vs toast
The terminology shifts. For Playwright purposes:
| What you see | What Playwright treats it as |
|---|---|
Browser-native alert(), confirm(), prompt() |
A Dialog, handle via page.on('dialog'...) |
| In-page React/Vue modal | A regular DOM element, query and click |
| Cookie banner anchored to the bottom | DOM element |
| Toast notification (briefly visible) | DOM element, usually self-dismisses |
Browser-native dialogs are special: they pause page execution until handled. They require:
page.on("dialog", lambda d: d.accept()) # or d.dismiss()
You must register the listener before the action that triggers the dialog. Once accepted/dismissed, the page continues.
A reusable helper
class OverlayDismisser:
"""Dismiss common overlays best-effort."""
DEFAULT_SELECTORS = [
"button:has-text('Accept all')",
"button:has-text('Accept')",
"[class*='cookie'] button[class*='accept']",
"[class*='cookie'] button[aria-label*='Accept']",
".marketing-modal .close",
".newsletter-popup button[aria-label='Close']",
"[class*='popup'] button[aria-label='Close']",
]
def __init__(self, page, extra=None, timeout=2000):
self.page = page
self.selectors = list(self.DEFAULT_SELECTORS) + (extra or [])
self.timeout = timeout
def run(self):
for sel in self.selectors:
try:
self.page.locator(sel).first.click(timeout=self.timeout)
except Exception:
continue
OverlayDismisser(page).run()
Plug it in after every page.goto. The defaults cover ~70% of common overlays; add site-specific selectors via the extra argument.
Hands-on lab
Open /challenges/dynamic/modals/cookie-banner. Write a scraper that: (1) navigates to the page, (2) handles the cookie banner via the best-effort pattern, (3) reads the underlying content. Then look at what cookie the banner sets when accepted, and rewrite the scraper to pre-set that cookie before goto, the banner shouldn't appear at all. Verify both approaches produce the same content.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/dynamic/modals/cookie-bannerQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.