Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.19intermediate5 min read

Lazy-Loaded Images and Skeleton Loaders

Images that appear blank, skeleton placeholders that fool naive scrapers, and the right way to wait for actual content.

What you’ll learn

  • Identify lazy-load patterns: native `loading="lazy"`, IntersectionObserver, data-src, blur-up.
  • Wait for real `src` values, not placeholders.
  • Handle skeleton-loader UIs that hide the loading state from auto-wait.
  • Decide when to scroll-trigger image loads vs. when to read the source attribute directly.

If your scraper returns empty src attributes or placeholder URLs, you've hit lazy loading. The image element exists, but its real source URL only gets set when the browser thinks you're about to see it. Same idea as infinite scroll, render only what's needed. Same fix: wait on the actual content, not the element.

The four lazy-load patterns

Pattern How to spot it How the real URL is set
Native loading="lazy" <img src="real.jpg" loading="lazy"> Browser handles it; URL is already there
data-src swap <img src="placeholder.gif" data-src="real.jpg"> JS sets el.src = el.dataset.src when in view
IntersectionObserver class swap <img class="lazy" data-srcset="..." /> Library swaps class and sets srcset
Blur-up <img src="data:image/jpeg;base64...tiny..."> JS replaces base64 with hi-res

Each leaves a different fingerprint when you hit it without driving the browser to the right scroll position.

Native loading="lazy"

The easiest case. The browser handles deferred loading entirely, your scraper doesn't care about the lazy attribute because the src is already the real URL.

<img src="https://cdn.example.com/products/1.jpg" loading="lazy" alt="...">

requests + BeautifulSoup gets you the URL directly. No browser needed. This pattern is dominant in 2025, about 60% of sites have migrated to it because Chrome added native support.

data-src swap (the classic)

The historically dominant lazy-load library pattern:

<img src="data:image/gif;base64,R0l..." data-src="https://cdn.example.com/products/1.jpg">

The src is a tiny base64 placeholder. Real URL lives in data-src. JS swaps them on scroll.

For HTTP scrapers, read data-src directly:

img["data-src"] or img["src"]

That's it. No browser, no scrolling. The URL is right there in the markup.

For browser scrapers, wait for the real src:

page.wait_for_selector("img.product-image[src]:not([src^='data:'])")

The selector [src]:not([src^='data:']) matches img tags whose src is set AND doesn't start with the base64 placeholder. Auto-waiting on img alone wouldn't work, the element exists immediately with a placeholder.

IntersectionObserver-based lazy loading

The modern pattern. A library like lazysizes or custom code:

<img class="lazyload" data-src="real.jpg" data-srcset="real@2x.jpg 2x">
const obs = new IntersectionObserver(entries => {
  entries.forEach(e => {
  if (e.isIntersecting) {
  e.target.src = e.target.dataset.src;
  e.target.classList.add("loaded");
  }
  });
});

When scraping with HTTP, read data-src / data-srcset. When scraping with a browser, scroll to bring each image into view:

for img in page.locator("img.product-image").all():
  img.scroll_into_view_if_needed()
  img.wait_for(state="attached")

page.wait_for_function(
  """() => Array.from(document.querySelectorAll('img.product-image'))
  .every(img => !img.src.startsWith('data:') && img.complete)"""
)

That waits until every image has both a non-placeholder src AND has finished loading.

Blur-up / progressive

<img src="data:image/jpeg;base64,/9j/4AA...verySmall...">

Sites generate a tiny (often 32×32 px) base64-encoded blurred thumbnail as the initial src. Real URL is set after the high-res loads. Common with Next.js Image component and Cloudinary.

Tell-tale signs:

  • src starts with data:image/...;base64, and is short (< 1 KB).
  • A data-original, data-src, or srcset attribute contains the real URL.
  • Sometimes embedded in a <noscript> fallback you can parse directly.

For HTTP scrapers, the <noscript> fallback is a gift:

<img class="placeholder" src="data:..." data-src="...">
<noscript><img src="https://cdn.example.com/real.jpg"></noscript>

Parse the noscript content with BeautifulSoup, that's the canonical URL.

Skeleton loaders

A separate problem. The page shows fake skeleton blocks (gray rectangles, shimmer animations) while data loads. Your auto-wait fires on the skeleton being visible, not the content:

page.goto(url)
page.wait_for_selector(".product-card")  # MATCHES THE SKELETON
print(page.locator(".product-card").count())  # 4 skeletons, no real products

Fix: wait for a content-specific signal, not a generic card class:

page.wait_for_selector(".product-card .product-name")  # has a real name child
page.wait_for_selector(".product-card:not(.skeleton)")  # not a skeleton variant
page.wait_for_function("() => document.querySelectorAll('.product-card .product-name').length > 0")

Three options, same idea: the selector must distinguish loaded cards from placeholder cards.

The right scraping playbook

  1. View Source. If data-src, data-original, or <noscript> attributes contain the real URLs, use HTTP scraping and parse them.
  2. Check srcset. Often the real URL is in srcset even when src is a placeholder.
  3. Native lazy. If src is already the real URL with loading="lazy", you're done.
  4. Browser only when needed. If the URL is genuinely computed in JS and not present in the markup at all, you have to render.

In practice, 70-80% of lazy-loaded image scrapes are doable with HTTP. The remaining cases are rare but require careful scroll-then-wait choreography.

Loading completeness

In a browser, "wait for the URL" isn't always enough, the image may have a real src but still be downloading:

page.wait_for_function(
  """() => Array.from(document.images).every(img => img.complete && img.naturalHeight > 0)"""
)

img.complete is true once the request finishes (success or error). naturalHeight > 0 ensures success specifically. Useful when you're capturing screenshots and need fully-loaded thumbnails.

A common foot-gun: srcset

Browser uses srcset to pick the right resolution. Your scraper probably wants the highest-resolution variant:

def best_src(img):
  srcset = img.get("srcset") or ""
  candidates = [c.strip().split() for c in srcset.split(",") if c.strip()]
  # each candidate is [url, '2x' or '500w']
  if candidates:
  return candidates[-1][0]  # last is usually the highest
  return img.get("data-src") or img.get("src")

Don't just take src, on retina-aware sites it's the lowest-resolution.

Hands-on lab

Open /challenges/dynamic/lazy-images. View source first. Note which attributes hold the real URLs. Then write two scrapers: (1) requests + BeautifulSoup parsing data-src, (2) Playwright scrolling each card into view and reading the resolved src. Both should produce the same list of image URLs. Compare timings, the HTTP version should be 10-20x faster.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/dynamic/lazy-images

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Lazy-Loaded Images and Skeleton Loaders1 / 8

What does the HTML attribute `loading="lazy"` on an `<img>` mean for a scraper?

Score so far: 0 / 0