Persistent Contexts and Browser Profiles, Dynamic Web & Browser Automation

Save a logged-in session once, replay it forever. The pattern that turns five-minute auth flows into 50-millisecond cookie injections.

Authentication is expensive. A login flow involves typing credentials, sometimes a 2FA prompt, sometimes a CAPTCHA. Doing it on every scraper run wastes minutes, and gets you flagged for too-frequent logins. The right move is to authenticate once, persist the session, and replay it indefinitely (until it expires). Playwright gives you two tools.

Two persistence shapes

	`storage_state`	Persistent context
What it stores	Cookies + localStorage as JSON	Entire browser profile on disk (history, cache, extensions)
Lifecycle	Saved/loaded explicitly	Persists automatically between runs
Size	Tens of KB	Hundreds of MB
Use case	Most scraping, small, portable, version-controllable	Heavy automation needing full profile
Sharing	Easy, JSON file	Hard, tied to a disk path

storage_state is what you want for production scraping. Persistent contexts are heavier, useful for browser extensions, complex login state, or specific anti-bot evasion (some systems trust profiles that have history).

Saving storage_state

Run a one-time interactive script:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
  browser = p.chromium.launch(headless=False)
  context = browser.new_context()
  page = context.new_page()

  page.goto("https://practice.scrapingcentral.com/account/login")

  # Fill credentials manually OR programmatically
  page.locator("#email").fill("demo@example.com")
  page.locator("#password").fill("password")
  page.locator("button[type=submit]").click()

  page.wait_for_url("**/account/dashboard")

  # Save the state
  context.storage_state(path="auth.json")
  browser.close()

auth.json now contains every cookie and localStorage entry for the session, typically 5-50 KB of JSON. Check it into a private repo or store in a secrets manager.

Loading storage_state

In every production scraper:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
  browser = p.chromium.launch()
  context = browser.new_context(storage_state="auth.json")
  page = context.new_page()

  page.goto("https://practice.scrapingcentral.com/account/dashboard")
  # Already logged in, no auth flow needed.
  print(page.locator("h1").inner_text())

  browser.close()

storage_state is passed to new_context(). The context starts up with cookies pre-loaded; the first request to the site is already authenticated. No login flow, no credentials in your scraper code.

Persistent contexts

When you need more than just cookies, extension state, full history, custom Chrome flags persisted, use launch_persistent_context:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
  context = p.chromium.launch_persistent_context(
  user_data_dir="./chrome-profile",
  headless=False,
  )
  page = context.new_page()
  page.goto("https://practice.scrapingcentral.com/account/dashboard")
  # If you logged in once in this profile, you're still logged in.
  context.close()

The profile directory accumulates everything Chrome would normally store in ~/.config/google-chrome/Default/. Subsequent launches against the same user_data_dir pick up exactly where you left off.

Caveat: a persistent context replaces the browser launch, there's no separate browser object. You get one context (multiple pages allowed) tied to the profile.

Transferring cookies to plain requests

The hybrid pattern from Lesson 2.3: log in with the browser, scrape the bulk with requests.

from playwright.sync_api import sync_playwright
import requests

# Step 1: get an authenticated context
with sync_playwright() as p:
  browser = p.chromium.launch()
  context = browser.new_context(storage_state="auth.json")
  cookies = context.cookies()
  browser.close()

# Step 2: feed cookies into a requests Session
session = requests.Session()
for c in cookies:
  session.cookies.set(c["name"], c["value"], domain=c["domain"], path=c["path"])

# Step 3: scrape at HTTP speed
r = session.get("https://practice.scrapingcentral.com/api/account/orders")
orders = r.json()
print(orders)

The browser session sets up the auth state in auth.json. The cookies transfer cleanly to requests. From here, your scraper runs 10-50× faster than the equivalent browser-only version.

Caveats:

HttpOnly cookies are usable here because we go through context.cookies(), not through JS.

Some APIs check Origin / Referer / User-Agent headers. Set them on the requests.Session to match the browser:

session.headers.update({
"User-Agent": "Mozilla/5.0 ...",
"Origin": "https://practice.scrapingcentral.com",
"Referer": "https://practice.scrapingcentral.com/account/dashboard",
})

Detecting an expired session

Sessions expire. Your scraper needs to detect this and re-authenticate:

def scrape_with_retry(url):
  r = session.get(url)
  if r.status_code in {401, 403} or "Sign in" in r.text:
  print("Session expired, re-authenticating...")
  refresh_auth()
  r = session.get(url)
  return r

refresh_auth() re-runs the interactive flow (or, in a CI context, runs a headless login with credentials from a secrets manager) and updates auth.json. The next scraper run starts fresh.

For very long sessions, just rotate periodically, re-authenticate every N hours regardless of whether the current session still works.

Storage state for multiple accounts

Persist one state file per account:

for account in ["account_a.json", "account_b.json", "account_c.json"]:
  context = browser.new_context(storage_state=account)
  page = context.new_page()
  # scrape as this account
  context.close()

Each context is isolated. You can also run them in parallel (Lesson 2.26).

Security note

auth.json is the equivalent of a username and password. Treat it like a secret:

Never commit to a public repo.
Encrypt at rest (e.g., via SOPS, AWS KMS, or environment-injected at runtime).
Rotate when leaked.

For team-shared scrapers, store the file in a secrets manager (AWS Secrets Manager, Vault, 1Password) and pull at runtime.

When NOT to persist

A few scenarios where fresh sessions are better:

Sites that fingerprint session age. A "logged in 6 hours ago" cookie can be suspicious if your scraper bursts a thousand requests in two minutes.
Rotating proxies with session-tied auth. If your auth is IP-bound (rare but exists), persisted sessions break when the proxy IP changes.
A/B test buckets. Some sites assign you to a bucket on first visit; reusing that across runs may bias your scrape.

For most cases, persistence is a clear win. The exceptions are narrow.

Hands-on lab

Open /account/login, log in manually with headless=False, and save auth.json. Quit. Then run a separate script that uses storage_state="auth.json" to visit /account/dashboard, it should land logged in. Finally, extract the cookies and use requests to hit /account/orders directly. Note the speed difference: browser-driven vs HTTP-with-stolen-cookies.

Persistent Contexts and Browser Profiles

What you’ll learn

Two persistence shapes

Saving storage_state

Loading storage_state

Persistent contexts

Transferring cookies to plain requests

Detecting an expired session

Storage state for multiple accounts

Security note

When NOT to persist

Hands-on lab

Hands-on lab

Quiz, check your understanding

What does `context.storage_state(path='auth.json')` save?