Scraping Infinite Scroll Pages - Browser Automation

Learn techniques to scrape infinite scroll pages using Playwright and Selenium. Handle lazy-loaded content and extract all data from endlessly scrolling websites.

Infinite scroll pages load new content as the user scrolls down, replacing traditional pagination. Sites like Twitter, Instagram, Pinterest, and many e-commerce platforms use this pattern. Scraping these pages requires automating the scroll action and waiting for new content to load after each scroll.

Playwright Approach

from playwright.sync_api import sync_playwright
import time

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://quotes.toscrape.com/scroll")
    page.wait_for_selector(".quote")

    all_quotes = set()
    previous_count = 0

    while True:
        # Scroll to the bottom of the page
        page.evaluate("window.scrollTo(0, document.body.scrollHeight)")

        # Wait for new content to load
        time.sleep(2)

        # Extract current quotes
        quotes = page.query_selector_all(".quote .text")
        for q in quotes:
            all_quotes.add(q.inner_text())

        # Stop if no new content loaded
        if len(all_quotes) == previous_count:
            break
        previous_count = len(all_quotes)

    print(f"Scraped {len(all_quotes)} quotes")
    browser.close()

Selenium Approach

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)

driver.get("https://quotes.toscrape.com/scroll")
time.sleep(2)

last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
    time.sleep(2)

    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

quotes = driver.find_elements(By.CSS_SELECTOR, ".quote .text")
print(f"Found {len(quotes)} quotes")
driver.quit()

Smarter Scroll: Wait for Network Idle

Instead of using fixed time.sleep(), you can wait for network requests to finish:

# Playwright, wait for network to be idle after scrolling
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
page.wait_for_load_state("networkidle")

Scrolling Inside a Container

Sometimes the scrollable element is not the page itself but a specific div:

# Playwright, scroll a specific container
page.evaluate("""
    const container = document.querySelector('.results-container');
    container.scrollTop = container.scrollHeight;
""")

Setting a Scroll Limit

To avoid scraping forever, set a maximum number of scrolls:

MAX_SCROLLS = 50
for i in range(MAX_SCROLLS):
    page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
    page.wait_for_load_state("networkidle")

Easier Alternative

Infinite scroll scraping is resource-intensive and slow. If the site's data is available through an underlying API (check the Network tab in DevTools), fetching the API directly is far more efficient. For sites without a public API, ScraperAPI offers built-in infinite scroll handling via their render option, saving you from managing browser automation yourself.

Next Steps

Handle forms, dropdowns, and click interactions
Learn browser fingerprinting and stealth techniques
Intercept network requests to find hidden APIs