iframes and Shadow DOM, Piercing Nested Contexts, Dynamic Web & Browser Automation

Two ways content can hide from a flat document.querySelectorAll. Pierce them correctly and you can scrape anything; pierce them wrong and you'll wonder why your selectors return nothing.

Most "I can't find this element" bugs come down to the element living inside a context your selectors don't reach. Two contexts cause this: iframes and shadow DOM. They look identical in the rendered page but require entirely different handling. This lesson covers both.

The difference at a glance

	iframe	Shadow DOM
What it is	An embedded document with its own DOM	A nested DOM attached to a host element
In source	`<iframe src="...">`	`<custom-element>` with internal `#shadow-root`
Cross-origin?	Often, especially for ads/widgets	Always same-origin
Reach from parent	Must "switch into" the frame	Open: pierce automatically; Closed: can't reach
Devtools breadcrumb	"frame > html > ..."	"host > #shadow-root > ..."

iframes are full documents inside a <iframe> tag. Shadow DOM is a scoping mechanism for web components. Both isolate their internals from outside JS, including your scraper.

Recognising the patterns

iframe in the source:

<iframe src="https://embedded.example.com/widget" id="checkout-frame"></iframe>

The content inside is fetched as a separate document. Inspect it in DevTools by clicking into the iframe's content area, you'll see "(iframe)" badges.

Shadow DOM in the source:

<custom-product-card>
  #shadow-root (open)
  <div class="product-name">Yellow Mug</div>
  <div class="price">$12</div>
</custom-product-card>

The #shadow-root line is the boundary. Below it, selectors from outside don't reach unless you explicitly pierce. The (open) vs (closed) annotation matters a lot.

Working with iframes

Use page.frame_locator() to scope queries inside an iframe:

frame = page.frame_locator("#checkout-frame")
frame.locator("input[name='card-number']").fill("4242424242424242")
frame.locator("button.pay").click()

frame_locator returns a FrameLocator, same interface as Locator but rooted inside the iframe. Chain it like any other locator.

For nested iframes:

inner = page.frame_locator("#outer").frame_locator("#inner")
inner.locator("h1").inner_text()

Each frame_locator call descends one level.

Older API: `page.frame()`

You'll still see this in older code:

frame = page.frame(name="checkout-frame")
frame.fill("input[name='card-number']", "...")

page.frame() returns a Frame object you act on directly. The newer frame_locator is preferred because it's a lazy locator with all the auto-wait benefits. Use frame_locator for new code.

Waiting for iframes

iframes load asynchronously. Wait for the iframe element first, then act on its contents:

page.wait_for_selector("#checkout-frame")
frame = page.frame_locator("#checkout-frame")
frame.locator("input").wait_for(state="attached")

The frame's DOM may not be populated until the inner document loads. wait_for on a known internal element synchronises both.

Cross-origin iframes

If the iframe is cross-origin, browser security still lets Playwright drive it, Playwright operates at the browser level, not the JS level. The same frame_locator works.

However, cookies and storage are isolated by origin. If the iframe needs an auth cookie, it must be set for the iframe's origin, not the parent's.

Working with shadow DOM, open

Playwright's locators pierce open shadow DOM automatically. You don't need a special syntax:

# host element has open shadow DOM containing .product-name
page.locator(".product-name").inner_text()  # works

That's the magic. Selenium, Puppeteer, and raw JS document.querySelectorAll don't pierce, you have to manually descend into shadowRoot. Playwright treats the shadow tree as part of the document for selector purposes.

When you need to scope to a specific component:

page.locator("custom-product-card").locator(".product-name").first

Both selectors are evaluated piercing-aware.

Open shadow DOM in raw JS

For comparison, here's what other tools (and Playwright if you used evaluate) would need:

const host = document.querySelector("custom-product-card");
const name = host.shadowRoot.querySelector(".product-name").textContent;

shadowRoot accesses the open shadow tree. Playwright hides this for you in normal selectors.

Working with shadow DOM, closed

Closed shadow DOM is genuinely closed. host.shadowRoot returns null. The element's internals are unreachable from external JS, by design.

Playwright handles this slightly: locators can still find elements inside closed shadow DOM via its browser-level instrumentation, but the support is partial and depends on Chromium internals. Test before relying on it.

# May or may not work depending on shadow boundary configuration
page.locator(".internal").inner_text()

If it doesn't work, the content is genuinely scraper-hostile. Three remaining options:

Hit the API. Closed shadow DOM components usually render data fetched via XHR. Capture and reproduce.
OCR a screenshot. Last resort. Lesson on canvas rendering covers the technique.
Accept it's unscrapeable. Sometimes the right call.

Finding the shadow boundary in DevTools

In Chrome DevTools → Elements, when you hover or click an element inside a shadow tree, you'll see #shadow-root (open) or (closed) in the breadcrumbs. The custom element's tag name is the host.

You can also run:

$0.shadowRoot  // in console, on the selected element

If it returns a ShadowRoot, it's open. If it returns null on an element that obviously has a shadow tree (you see the breadcrumb), it's closed.

A combined case: iframe with shadow DOM inside

Modern Stripe checkout, for example, is an iframe whose content uses shadow DOM. Compose:

frame = page.frame_locator("#stripe-iframe")
# Inside the frame, shadow DOM is pierced automatically by Playwright locators
frame.locator("input[name='card-number']").fill("4242 4242 4242 4242")

The frame_locator switches context to the iframe. Locators inside that scope pierce shadow DOM automatically. The two mechanisms compose cleanly.

Common errors and fixes

Error	Cause	Fix
`locator resolved to 0 elements` and the element is clearly visible	Inside an unrecognised iframe	Use `frame_locator`
`locator resolved to 0 elements` on a custom-element page	Closed shadow DOM	Find the API or accept it's unscrapeable
`intermittent timeout` on iframe content	Frame loads async	Wait on a known internal selector first
`same selector returns different counts each run`	Multiple frames with same internal class	Scope to the right `frame_locator`

Hands-on lab

Open /challenges/dynamic/iframe/same-origin. Inspect the page, find the iframe. Write a Playwright script that uses frame_locator to extract the heading inside the iframe. Then visit /challenges/dynamic/shadow-dom/open and confirm Playwright's auto-piercing works without frame_locator. Finally /challenges/dynamic/shadow-dom/closed, see whether your scraper can reach inside or whether you've hit a genuinely closed boundary.

iframes and Shadow DOM, Piercing Nested Contexts

What you’ll learn

The difference at a glance

Recognising the patterns

Working with iframes

Older API: `page.frame()`

Waiting for iframes

Cross-origin iframes

Working with shadow DOM, open

Open shadow DOM in raw JS

Working with shadow DOM, closed

Finding the shadow boundary in DevTools

A combined case: iframe with shadow DOM inside

Common errors and fixes

Hands-on lab

Hands-on lab

Quiz, check your understanding

What's the key DIFFERENCE between iframes and shadow DOM for a scraper?

iframes and Shadow DOM, Piercing Nested Contexts

What you’ll learn

The difference at a glance

Recognising the patterns

Working with iframes

Older API: page.frame()

Waiting for iframes

Cross-origin iframes

Working with shadow DOM, open

Open shadow DOM in raw JS

Working with shadow DOM, closed

Finding the shadow boundary in DevTools

A combined case: iframe with shadow DOM inside

Common errors and fixes

Hands-on lab

Hands-on lab

Quiz, check your understanding

What's the key DIFFERENCE between iframes and shadow DOM for a scraper?

Older API: `page.frame()`