Introduction to Playwright for Web Scraping
Learn to scrape JavaScript-heavy websites using Playwright. Handles SPAs, lazy loading, and dynamic content.
Browser Automation · #1intermediate2 min read
Playwright is a modern browser automation library that can control Chromium, Firefox, and WebKit. It is ideal for scraping JavaScript-rendered pages that Requests + BeautifulSoup cannot handle.
When to Use Playwright
- The page loads content with JavaScript (SPAs, React, Vue apps)
- You need to click buttons, fill forms, or scroll
- Content loads lazily as you scroll
- The site requires login or session management
Install
pip install playwright
playwright install
Basic Example
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://quotes.toscrape.com/js/")
page.wait_for_selector(".quote")
quotes = page.query_selector_all(".quote")
for quote in quotes:
text = quote.query_selector(".text").inner_text()
author = quote.query_selector(".author").inner_text()
print(f"{text}, {author}")
browser.close()
Key Methods
| Method | Purpose |
|---|---|
page.goto(url) |
Navigate to a URL |
page.wait_for_selector(sel) |
Wait for element to appear |
page.query_selector(sel) |
Find one element |
page.query_selector_all(sel) |
Find all matching elements |
page.click(sel) |
Click an element |
page.fill(sel, value) |
Fill an input field |
page.evaluate(js) |
Run JavaScript on the page |
Headless vs Headed Mode
# Headless (no visible browser, faster, for production)
browser = p.chromium.launch(headless=True)
# Headed (visible browser, for debugging)
browser = p.chromium.launch(headless=False)
Next Steps
- Learn to handle pagination with Playwright
- Use Playwright with proxies for large-scale scraping
- Export scraped data to CSV or database