iframes and Shadow DOM, Piercing Nested Contexts
Two ways content can hide from a flat document.querySelectorAll. Pierce them correctly and you can scrape anything; pierce them wrong and you'll wonder why your selectors return nothing.
What you’ll learn
- Distinguish iframes from shadow DOM and recognise each in the page.
- Switch into an iframe with `page.frame_locator()` and back.
- Pierce open shadow DOM via Playwright's auto-piercing locators.
- Handle closed shadow DOM, and accept when content is genuinely unreachable.
Most "I can't find this element" bugs come down to the element living inside a context your selectors don't reach. Two contexts cause this: iframes and shadow DOM. They look identical in the rendered page but require entirely different handling. This lesson covers both.
The difference at a glance
| iframe | Shadow DOM | |
|---|---|---|
| What it is | An embedded document with its own DOM | A nested DOM attached to a host element |
| In source | <iframe src="..."> |
<custom-element> with internal #shadow-root |
| Cross-origin? | Often, especially for ads/widgets | Always same-origin |
| Reach from parent | Must "switch into" the frame | Open: pierce automatically; Closed: can't reach |
| Devtools breadcrumb | "frame > html > ..." | "host > #shadow-root > ..." |
iframes are full documents inside a <iframe> tag. Shadow DOM is a scoping mechanism for web components. Both isolate their internals from outside JS, including your scraper.
Recognising the patterns
iframe in the source:
<iframe src="https://embedded.example.com/widget" id="checkout-frame"></iframe>
The content inside is fetched as a separate document. Inspect it in DevTools by clicking into the iframe's content area, you'll see "(iframe)" badges.
Shadow DOM in the source:
<custom-product-card>
#shadow-root (open)
<div class="product-name">Yellow Mug</div>
<div class="price">$12</div>
</custom-product-card>
The #shadow-root line is the boundary. Below it, selectors from outside don't reach unless you explicitly pierce. The (open) vs (closed) annotation matters a lot.
Working with iframes
Use page.frame_locator() to scope queries inside an iframe:
frame = page.frame_locator("#checkout-frame")
frame.locator("input[name='card-number']").fill("4242424242424242")
frame.locator("button.pay").click()
frame_locator returns a FrameLocator, same interface as Locator but rooted inside the iframe. Chain it like any other locator.
For nested iframes:
inner = page.frame_locator("#outer").frame_locator("#inner")
inner.locator("h1").inner_text()
Each frame_locator call descends one level.
Older API: page.frame()
You'll still see this in older code:
frame = page.frame(name="checkout-frame")
frame.fill("input[name='card-number']", "...")
page.frame() returns a Frame object you act on directly. The newer frame_locator is preferred because it's a lazy locator with all the auto-wait benefits. Use frame_locator for new code.
Waiting for iframes
iframes load asynchronously. Wait for the iframe element first, then act on its contents:
page.wait_for_selector("#checkout-frame")
frame = page.frame_locator("#checkout-frame")
frame.locator("input").wait_for(state="attached")
The frame's DOM may not be populated until the inner document loads. wait_for on a known internal element synchronises both.
Cross-origin iframes
If the iframe is cross-origin, browser security still lets Playwright drive it, Playwright operates at the browser level, not the JS level. The same frame_locator works.
However, cookies and storage are isolated by origin. If the iframe needs an auth cookie, it must be set for the iframe's origin, not the parent's.
Working with shadow DOM, open
Playwright's locators pierce open shadow DOM automatically. You don't need a special syntax:
# host element has open shadow DOM containing .product-name
page.locator(".product-name").inner_text() # works
That's the magic. Selenium, Puppeteer, and raw JS document.querySelectorAll don't pierce, you have to manually descend into shadowRoot. Playwright treats the shadow tree as part of the document for selector purposes.
When you need to scope to a specific component:
page.locator("custom-product-card").locator(".product-name").first
Both selectors are evaluated piercing-aware.
Open shadow DOM in raw JS
For comparison, here's what other tools (and Playwright if you used evaluate) would need:
const host = document.querySelector("custom-product-card");
const name = host.shadowRoot.querySelector(".product-name").textContent;
shadowRoot accesses the open shadow tree. Playwright hides this for you in normal selectors.
Working with shadow DOM, closed
Closed shadow DOM is genuinely closed. host.shadowRoot returns null. The element's internals are unreachable from external JS, by design.
Playwright handles this slightly: locators can still find elements inside closed shadow DOM via its browser-level instrumentation, but the support is partial and depends on Chromium internals. Test before relying on it.
# May or may not work depending on shadow boundary configuration
page.locator(".internal").inner_text()
If it doesn't work, the content is genuinely scraper-hostile. Three remaining options:
- Hit the API. Closed shadow DOM components usually render data fetched via XHR. Capture and reproduce.
- OCR a screenshot. Last resort. Lesson on canvas rendering covers the technique.
- Accept it's unscrapeable. Sometimes the right call.
Finding the shadow boundary in DevTools
In Chrome DevTools → Elements, when you hover or click an element inside a shadow tree, you'll see #shadow-root (open) or (closed) in the breadcrumbs. The custom element's tag name is the host.
You can also run:
$0.shadowRoot // in console, on the selected element
If it returns a ShadowRoot, it's open. If it returns null on an element that obviously has a shadow tree (you see the breadcrumb), it's closed.
A combined case: iframe with shadow DOM inside
Modern Stripe checkout, for example, is an iframe whose content uses shadow DOM. Compose:
frame = page.frame_locator("#stripe-iframe")
# Inside the frame, shadow DOM is pierced automatically by Playwright locators
frame.locator("input[name='card-number']").fill("4242 4242 4242 4242")
The frame_locator switches context to the iframe. Locators inside that scope pierce shadow DOM automatically. The two mechanisms compose cleanly.
Common errors and fixes
| Error | Cause | Fix |
|---|---|---|
locator resolved to 0 elements and the element is clearly visible |
Inside an unrecognised iframe | Use frame_locator |
locator resolved to 0 elements on a custom-element page |
Closed shadow DOM | Find the API or accept it's unscrapeable |
intermittent timeout on iframe content |
Frame loads async | Wait on a known internal selector first |
same selector returns different counts each run |
Multiple frames with same internal class | Scope to the right frame_locator |
Hands-on lab
Open /challenges/dynamic/iframe/same-origin. Inspect the page, find the iframe. Write a Playwright script that uses frame_locator to extract the heading inside the iframe. Then visit /challenges/dynamic/shadow-dom/open and confirm Playwright's auto-piercing works without frame_locator. Finally /challenges/dynamic/shadow-dom/closed, see whether your scraper can reach inside or whether you've hit a genuinely closed boundary.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/dynamic/iframe/same-originQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.