Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Sub-path 3 of 6

Dynamic Web & Browser Automation

When static fails, drive a real browser.

For JS-rendered sites, SPAs, infinite scroll, modals, iframes, Shadow DOM. Playwright is the main tool, with Selenium and Puppeteer for completeness. Each lesson runs against the dynamic challenges at Catalog108.

~4 weeks part-time · 30 lessons

Lessons

  1. 2.1

    Detecting JS-Rendered Content (Three-Test Diagnostic)

    Before reaching for a headless browser, run three fast tests to confirm the page actually needs one. Most pages don't.

    Lab: /challenges/dynamic/spa-pure

    intermediate
  2. 2.2

    Client-Side Rendering vs SSR vs Hybrid

    The architectural difference between rendering modes determines whether you scrape with curl, parse a hydration payload, or drive a browser.

    Lab: /products

    intermediate
  3. 2.3

    When Browser Automation Is the Right Tool (And When It Isn't)

    A decision framework for choosing between HTTP scraping, API hunting, and headless browsers, with the honest trade-offs.

    intermediate
  4. 2.4

    Playwright Install + First Script (Python)

    Install Playwright, drive a real browser, screenshot a page, extract text, the minimum viable browser-automation pipeline.

    Lab: /

    intermediate
  5. 2.5

    Browser, Context, Page, The Mental Model

    Three nested objects define every Playwright script. Get the relationship right and concurrency, isolation, and sessions all become obvious.

    Lab: /products

    intermediate
  6. 2.6

    Locators: The Auto-Waiting Magic

    Locators are Playwright's most important abstraction. They auto-wait, auto-retry, and eliminate the entire category of timing bugs that plague Selenium.

    Lab: /challenges/dynamic/lazy-images

    intermediate
  7. 2.7

    Locator Strategies: CSS, XPath, Role, Text, Test-ID

    Choosing the right selector type is the single biggest factor in scraper stability. A clear hierarchy of which to prefer, when, and why.

    Lab: /products/1-white-wooden-vase

    intermediate
  8. 2.8

    Actions: click, fill, hover, type, drag

    The verbs of browser automation. Each action has subtle options that change behaviour, knowing them is the difference between flaky and rock-solid scrapers.

    Lab: /challenges/dynamic/click-required/reveal

    intermediate
  9. 2.9

    Waiting Strategies (The Make-or-Break Skill)

    Time-based sleeps are the #1 cause of flaky scrapers. Replace them with the four deterministic wait primitives Playwright provides.

    Lab: /challenges/dynamic/auto-typed/animated

    intermediate
  10. 2.10

    Playwright in Node, Why You'd Choose It

    Playwright's Node API is the original, the fastest-evolving, and the natural fit when the target site is itself JavaScript-heavy. Same concepts, async-first.

    Lab: /challenges/dynamic/spa-routed

    intermediate
  11. 2.11

    Async/Await Patterns in Node Scrapers

    Async is the model, but the wrong patterns leak browsers, deadlock on shared state, and silently swallow errors. Four idioms to internalise.

    Lab: /challenges/dynamic/heavy-dom/10k-items

    intermediate
  12. 2.12

    Symfony Panther, Playwright/ChromeDriver for PHP

    PHP's first-class browser automation library. Same model as Playwright, friendly Symfony integration, real browser control.

    Lab: /challenges/dynamic/spa-pure

    intermediate
  13. 2.13

    Building a Headless Scraper as a Symfony Console Command

    Wrap Panther in a Symfony Console command for cron-friendly, configurable, observable PHP scrapers.

    Lab: /challenges/dynamic/infinite-scroll/button-jsappend

    intermediate
  14. 2.14

    Selenium in Python (Legacy but Still Common)

    Selenium predates Playwright by a decade and still dominates legacy codebases. Know it well enough to read, port, and not fear inheriting a Selenium scraper.

    Lab: /challenges/dynamic/date-picker/custom

    intermediate
  15. 2.15

    Selenium in PHP via `php-webdriver/webdriver`

    Selenium for PHP. Maintained, W3C-compliant, the right tool when Panther doesn't fit or you need raw WebDriver control.

    Lab: /challenges/dynamic/date-picker/custom

    intermediate
  16. 2.16

    Puppeteer in Node.js

    Google's own browser-automation library, the Chromium-only ancestor of Playwright. Smaller, simpler, and still excellent for Chrome-specific scrapes.

    Lab: /challenges/dynamic/drag-drop/list-reorder

    intermediate
  17. 2.17

    Choosing Between Playwright, Selenium, and Puppeteer

    A working framework for picking the right browser automation tool for your project, and when the answer is 'don't use any of them.'

    intermediate
  18. 2.18

    Infinite Scroll, Five Implementation Patterns

    Every infinite scroll falls into one of five patterns. Identify the pattern first, then pick the right scraping technique, or find the underlying API and skip browser automation entirely.

    Lab: /challenges/dynamic/infinite-scroll/intersection

    intermediate
  19. 2.19

    Lazy-Loaded Images and Skeleton Loaders

    Images that appear blank, skeleton placeholders that fool naive scrapers, and the right way to wait for actual content.

    Lab: /challenges/dynamic/lazy-images

    intermediate
  20. 2.20

    Modals, Popups, Cookie Banners, Auto-Dismissing

    Every modern site throws three to five overlays at your scraper before you reach the content. Recognise them, dismiss them, ignore them, without breaking the scrape.

    Lab: /challenges/dynamic/modals/cookie-banner

    intermediate
  21. 2.21

    iframes and Shadow DOM, Piercing Nested Contexts

    Two ways content can hide from a flat document.querySelectorAll. Pierce them correctly and you can scrape anything; pierce them wrong and you'll wonder why your selectors return nothing.

    Lab: /challenges/dynamic/iframe/same-origin

    intermediate
  22. 2.22

    Drag-and-Drop, Date Pickers, Complex Form Controls

    The form controls that look custom because they are. Three patterns, drag-drop, custom date pickers, and rich select widgets, and how to drive them reliably.

    Lab: /events

    intermediate
  23. 2.23

    Capturing XHR / Fetch Calls the Page Makes

    The defining browser-automation pattern: drive the page just enough to discover the underlying API, then bypass the browser entirely. This is how production scrapers get fast.

    Lab: /locations

    intermediate
  24. 2.24

    Blocking Resources for 3–5x Speedup

    Most page weight is images, fonts, ads, and analytics. Blocking them at the browser level slashes scrape time without losing the data you actually want.

    Lab: /products

    intermediate
  25. 2.25

    Persistent Contexts and Browser Profiles

    Save a logged-in session once, replay it forever. The pattern that turns five-minute auth flows into 50-millisecond cookie injections.

    Lab: /account/dashboard

    intermediate
  26. 2.26

    Browser Pool Patterns for Concurrency

    Running one browser at a time is wasteful. Running 50 is a memory disaster. The right pattern: a bounded pool of contexts under a shared browser process.

    Lab: /products

    advanced
  27. 2.27

    How Sites Detect Headless Browsers

    Forty signals that distinguish a Playwright/Selenium browser from a real one. Knowing them is the prerequisite to evading them.

    Lab: /challenges/antibot/webdriver-detected

    advanced
  28. 2.28

    `playwright-stealth` and `undetected-chromedriver`

    Two community-maintained toolkits that automate the dozens of fingerprint patches a stealth scraper needs. Install, configure, verify, then stop reinventing.

    Lab: /challenges/antibot/canvas-fingerprint

    advanced
  29. 2.29

    Camoufox and Other Patched Browsers

    When JS-level stealth isn't enough, the next step is a browser whose binary itself has been patched to forge canvas, WebGL, and font fingerprints.

    Lab: /challenges/antibot/canvas-fingerprint

    advanced
  30. 2.30

    Mobile Emulation and Geolocation Spoofing

    Many sites serve different content to mobile users, geo-target by IP and JS, and gate features by user-agent. Emulating these correctly opens scrapes you'd otherwise be locked out of.

    Lab: /locations

    advanced

Every lesson has a hands-on lab target on Catalog108 , our first-party practice scraping sandbox. Each lab page has a /grade endpoint that returns pass/fail on your scraper output.