Playwright Install + First Script (Python), Dynamic Web & Browser Automation

Install Playwright, drive a real browser, screenshot a page, extract text, the minimum viable browser-automation pipeline.

Playwright is the modern standard for browser automation: faster than Selenium, more reliable than Puppeteer's Python ports, and maintained by a team at Microsoft. This lesson gets you running.

Install

Playwright ships in two halves: the Python library and the browser binaries.

pip install playwright
playwright install chromium

The first command installs the playwright package. The second downloads a known-good Chromium build into ~/.cache/ms-playwright/. You can also install Firefox and WebKit:

playwright install firefox webkit

Or all three at once with playwright install (no argument). Most production scrapers stick to Chromium, it's the fastest and most-tested. Firefox and WebKit are useful for cross-browser bug repros, not daily scraping.

Verify the install

python -c "from playwright.sync_api import sync_playwright; print('ok')"

If that prints ok, you're done. If you see an import error, you missed the pip install. If you see "executable doesn't exist", you missed playwright install chromium.

Your first scraper

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
  browser = p.chromium.launch(headless=True)
  page = browser.new_page()
  page.goto("https://practice.scrapingcentral.com/")
  print(page.title())
  print(page.locator("h1").first.inner_text())
  browser.close()

Run it. You should see the page title and the <h1> text. That is a working Playwright scraper. Everything else in this sub-path is variations on these seven lines.

What each line does

with sync_playwright() as p:, starts the Playwright supervisor process. The with block guarantees clean shutdown.
p.chromium.launch(headless=True), spawns a Chromium instance. headless=False opens a real visible window (debugging).
browser.new_page(), creates a fresh page (tab) inside the default browser context.
page.goto(url), navigates and waits until the network is mostly idle. Returns a Response object.
page.locator("h1").first.inner_text(), queries the DOM, picks the first match, returns its text.
browser.close(), terminates the browser process.

Compared to requests, the new ideas are: launch a process, open a page, query with locators, close cleanly. That is the whole API surface at the top level.

Headed vs headless

browser = p.chromium.launch(headless=False, slow_mo=500)

headless=False opens a visible browser window. slow_mo=500 delays every action by 500ms so you can see what the scraper is doing. Both are debugging aids, turn them off for production.

A common pattern is to read these from environment variables:

import os
headless = os.environ.get("HEADLESS", "1") == "1"
slow_mo  = int(os.environ.get("SLOW_MO", "0"))
browser = p.chromium.launch(headless=headless, slow_mo=slow_mo)

Then HEADLESS=0 SLOW_MO=300 python scrape.py flips into debug mode without code changes.

Sync vs async, which to use

Playwright Python has two APIs:

# Sync
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
  ...

# Async
from playwright.async_api import async_playwright
async with async_playwright() as p:
  ...

Use sync for:

Scripts you run from the command line.
Code inside a Jupyter notebook (sometimes, depends on the kernel).
Anywhere that doesn't already have an event loop.

Use async for:

Scrapers that drive multiple pages concurrently inside one Python process (Lesson 2.26).
Integration with async frameworks (FastAPI, aiohttp, Scrapy with asyncio reactor).
Anywhere you need to interleave Playwright calls with other async I/O.

For learning purposes, start with sync. It's strictly simpler. You can swap to async later when concurrency demands it.

Pulling more data

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
  browser = p.chromium.launch()
  page = browser.new_page()
  page.goto("https://practice.scrapingcentral.com/")

  # Every link on the page
  for a in page.locator("a[href]").all():
  text = a.inner_text().strip()
  href = a.get_attribute("href")
  print(f"{text!r:30} → {href}")

  # Take a screenshot
  page.screenshot(path="home.png", full_page=True)

  browser.close()

Three new things:

page.locator(...).all() returns a list of all matching elements you can iterate.
get_attribute("href") reads an attribute (vs inner_text() for the rendered text).
page.screenshot(...) saves a PNG. full_page=True captures the whole scrollable area, not just the viewport.

When goto returns

page.goto(url) waits for the load event by default, the browser has fired DOMContentLoaded and most resources have loaded. You can change it:

page.goto(url, wait_until="domcontentloaded")  # earliest: HTML parsed
page.goto(url, wait_until="load")  # default: most resources loaded
page.goto(url, wait_until="networkidle")  # latest: 500ms with no network activity
page.goto(url, wait_until="commit")  # earliest possible: response received

networkidle is tempting but unreliable on sites with long-poll connections, analytics beacons, or live-update streams, those never go idle. Prefer domcontentloaded plus an explicit wait for the element you actually need. Lesson 2.9 covers the full waiting strategy.

Cleanup matters

The with block ensures the browser closes even if your code throws. Without it:

p = sync_playwright().start()
browser = p.chromium.launch()
# ... if anything below raises, browser stays alive ...
browser.close()
p.stop()

You will leak Chromium processes. They will eat your RAM. Use the context manager.

Hands-on lab

Install Playwright and run the seven-line script against https://practice.scrapingcentral.com/. Confirm you get the page title and an h1. Then switch to headless=False, slow_mo=400, re-run, and watch the browser actually do the work. That visual feedback is invaluable while you're learning.

Playwright Install + First Script (Python)

What you’ll learn

Install

Verify the install

Your first scraper

What each line does

Headed vs headless

Sync vs async, which to use

Pulling more data

When goto returns

Cleanup matters

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which two commands together install Playwright and a usable browser binary?