Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.21intermediate5 min read

iframes and Shadow DOM, Piercing Nested Contexts

Two ways content can hide from a flat document.querySelectorAll. Pierce them correctly and you can scrape anything; pierce them wrong and you'll wonder why your selectors return nothing.

What you’ll learn

  • Distinguish iframes from shadow DOM and recognise each in the page.
  • Switch into an iframe with `page.frame_locator()` and back.
  • Pierce open shadow DOM via Playwright's auto-piercing locators.
  • Handle closed shadow DOM, and accept when content is genuinely unreachable.

Most "I can't find this element" bugs come down to the element living inside a context your selectors don't reach. Two contexts cause this: iframes and shadow DOM. They look identical in the rendered page but require entirely different handling. This lesson covers both.

The difference at a glance

iframe Shadow DOM
What it is An embedded document with its own DOM A nested DOM attached to a host element
In source <iframe src="..."> <custom-element> with internal #shadow-root
Cross-origin? Often, especially for ads/widgets Always same-origin
Reach from parent Must "switch into" the frame Open: pierce automatically; Closed: can't reach
Devtools breadcrumb "frame > html > ..." "host > #shadow-root > ..."

iframes are full documents inside a <iframe> tag. Shadow DOM is a scoping mechanism for web components. Both isolate their internals from outside JS, including your scraper.

Recognising the patterns

iframe in the source:

<iframe src="https://embedded.example.com/widget" id="checkout-frame"></iframe>

The content inside is fetched as a separate document. Inspect it in DevTools by clicking into the iframe's content area, you'll see "(iframe)" badges.

Shadow DOM in the source:

<custom-product-card>
  #shadow-root (open)
  <div class="product-name">Yellow Mug</div>
  <div class="price">$12</div>
</custom-product-card>

The #shadow-root line is the boundary. Below it, selectors from outside don't reach unless you explicitly pierce. The (open) vs (closed) annotation matters a lot.

Working with iframes

Use page.frame_locator() to scope queries inside an iframe:

frame = page.frame_locator("#checkout-frame")
frame.locator("input[name='card-number']").fill("4242424242424242")
frame.locator("button.pay").click()

frame_locator returns a FrameLocator, same interface as Locator but rooted inside the iframe. Chain it like any other locator.

For nested iframes:

inner = page.frame_locator("#outer").frame_locator("#inner")
inner.locator("h1").inner_text()

Each frame_locator call descends one level.

Older API: page.frame()

You'll still see this in older code:

frame = page.frame(name="checkout-frame")
frame.fill("input[name='card-number']", "...")

page.frame() returns a Frame object you act on directly. The newer frame_locator is preferred because it's a lazy locator with all the auto-wait benefits. Use frame_locator for new code.

Waiting for iframes

iframes load asynchronously. Wait for the iframe element first, then act on its contents:

page.wait_for_selector("#checkout-frame")
frame = page.frame_locator("#checkout-frame")
frame.locator("input").wait_for(state="attached")

The frame's DOM may not be populated until the inner document loads. wait_for on a known internal element synchronises both.

Cross-origin iframes

If the iframe is cross-origin, browser security still lets Playwright drive it, Playwright operates at the browser level, not the JS level. The same frame_locator works.

However, cookies and storage are isolated by origin. If the iframe needs an auth cookie, it must be set for the iframe's origin, not the parent's.

Working with shadow DOM, open

Playwright's locators pierce open shadow DOM automatically. You don't need a special syntax:

# host element has open shadow DOM containing .product-name
page.locator(".product-name").inner_text()  # works

That's the magic. Selenium, Puppeteer, and raw JS document.querySelectorAll don't pierce, you have to manually descend into shadowRoot. Playwright treats the shadow tree as part of the document for selector purposes.

When you need to scope to a specific component:

page.locator("custom-product-card").locator(".product-name").first

Both selectors are evaluated piercing-aware.

Open shadow DOM in raw JS

For comparison, here's what other tools (and Playwright if you used evaluate) would need:

const host = document.querySelector("custom-product-card");
const name = host.shadowRoot.querySelector(".product-name").textContent;

shadowRoot accesses the open shadow tree. Playwright hides this for you in normal selectors.

Working with shadow DOM, closed

Closed shadow DOM is genuinely closed. host.shadowRoot returns null. The element's internals are unreachable from external JS, by design.

Playwright handles this slightly: locators can still find elements inside closed shadow DOM via its browser-level instrumentation, but the support is partial and depends on Chromium internals. Test before relying on it.

# May or may not work depending on shadow boundary configuration
page.locator(".internal").inner_text()

If it doesn't work, the content is genuinely scraper-hostile. Three remaining options:

  1. Hit the API. Closed shadow DOM components usually render data fetched via XHR. Capture and reproduce.
  2. OCR a screenshot. Last resort. Lesson on canvas rendering covers the technique.
  3. Accept it's unscrapeable. Sometimes the right call.

Finding the shadow boundary in DevTools

In Chrome DevTools → Elements, when you hover or click an element inside a shadow tree, you'll see #shadow-root (open) or (closed) in the breadcrumbs. The custom element's tag name is the host.

You can also run:

$0.shadowRoot  // in console, on the selected element

If it returns a ShadowRoot, it's open. If it returns null on an element that obviously has a shadow tree (you see the breadcrumb), it's closed.

A combined case: iframe with shadow DOM inside

Modern Stripe checkout, for example, is an iframe whose content uses shadow DOM. Compose:

frame = page.frame_locator("#stripe-iframe")
# Inside the frame, shadow DOM is pierced automatically by Playwright locators
frame.locator("input[name='card-number']").fill("4242 4242 4242 4242")

The frame_locator switches context to the iframe. Locators inside that scope pierce shadow DOM automatically. The two mechanisms compose cleanly.

Common errors and fixes

Error Cause Fix
locator resolved to 0 elements and the element is clearly visible Inside an unrecognised iframe Use frame_locator
locator resolved to 0 elements on a custom-element page Closed shadow DOM Find the API or accept it's unscrapeable
intermittent timeout on iframe content Frame loads async Wait on a known internal selector first
same selector returns different counts each run Multiple frames with same internal class Scope to the right frame_locator

Hands-on lab

Open /challenges/dynamic/iframe/same-origin. Inspect the page, find the iframe. Write a Playwright script that uses frame_locator to extract the heading inside the iframe. Then visit /challenges/dynamic/shadow-dom/open and confirm Playwright's auto-piercing works without frame_locator. Finally /challenges/dynamic/shadow-dom/closed, see whether your scraper can reach inside or whether you've hit a genuinely closed boundary.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/iframe/same-origin

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

iframes and Shadow DOM, Piercing Nested Contexts1 / 8

What's the key DIFFERENCE between iframes and shadow DOM for a scraper?

Score so far: 0 / 0