Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

2.25intermediate5 min read

Persistent Contexts and Browser Profiles

Save a logged-in session once, replay it forever. The pattern that turns five-minute auth flows into 50-millisecond cookie injections.

What you’ll learn

  • Distinguish `storage_state` (serialised cookies/storage) from persistent contexts (full profile on disk).
  • Capture a session from an interactive login and reuse it in headless production runs.
  • Bootstrap a session via Playwright then transfer cookies to plain `requests`.
  • Choose the right strategy for short-lived vs long-lived auth tokens.

Authentication is expensive. A login flow involves typing credentials, sometimes a 2FA prompt, sometimes a CAPTCHA. Doing it on every scraper run wastes minutes, and gets you flagged for too-frequent logins. The right move is to authenticate once, persist the session, and replay it indefinitely (until it expires). Playwright gives you two tools.

Two persistence shapes

storage_state Persistent context
What it stores Cookies + localStorage as JSON Entire browser profile on disk (history, cache, extensions)
Lifecycle Saved/loaded explicitly Persists automatically between runs
Size Tens of KB Hundreds of MB
Use case Most scraping, small, portable, version-controllable Heavy automation needing full profile
Sharing Easy, JSON file Hard, tied to a disk path

storage_state is what you want for production scraping. Persistent contexts are heavier, useful for browser extensions, complex login state, or specific anti-bot evasion (some systems trust profiles that have history).

Saving storage_state

Run a one-time interactive script:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
  browser = p.chromium.launch(headless=False)
  context = browser.new_context()
  page = context.new_page()

  page.goto("https://practice.scrapingcentral.com/account/login")

  # Fill credentials manually OR programmatically
  page.locator("#email").fill("demo@example.com")
  page.locator("#password").fill("password")
  page.locator("button[type=submit]").click()

  page.wait_for_url("**/account/dashboard")

  # Save the state
  context.storage_state(path="auth.json")
  browser.close()

auth.json now contains every cookie and localStorage entry for the session, typically 5-50 KB of JSON. Check it into a private repo or store in a secrets manager.

Loading storage_state

In every production scraper:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
  browser = p.chromium.launch()
  context = browser.new_context(storage_state="auth.json")
  page = context.new_page()

  page.goto("https://practice.scrapingcentral.com/account/dashboard")
  # Already logged in, no auth flow needed.
  print(page.locator("h1").inner_text())

  browser.close()

storage_state is passed to new_context(). The context starts up with cookies pre-loaded; the first request to the site is already authenticated. No login flow, no credentials in your scraper code.

Persistent contexts

When you need more than just cookies, extension state, full history, custom Chrome flags persisted, use launch_persistent_context:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
  context = p.chromium.launch_persistent_context(
  user_data_dir="./chrome-profile",
  headless=False,
  )
  page = context.new_page()
  page.goto("https://practice.scrapingcentral.com/account/dashboard")
  # If you logged in once in this profile, you're still logged in.
  context.close()

The profile directory accumulates everything Chrome would normally store in ~/.config/google-chrome/Default/. Subsequent launches against the same user_data_dir pick up exactly where you left off.

Caveat: a persistent context replaces the browser launch, there's no separate browser object. You get one context (multiple pages allowed) tied to the profile.

Transferring cookies to plain requests

The hybrid pattern from Lesson 2.3: log in with the browser, scrape the bulk with requests.

from playwright.sync_api import sync_playwright
import requests

# Step 1: get an authenticated context
with sync_playwright() as p:
  browser = p.chromium.launch()
  context = browser.new_context(storage_state="auth.json")
  cookies = context.cookies()
  browser.close()

# Step 2: feed cookies into a requests Session
session = requests.Session()
for c in cookies:
  session.cookies.set(c["name"], c["value"], domain=c["domain"], path=c["path"])

# Step 3: scrape at HTTP speed
r = session.get("https://practice.scrapingcentral.com/api/account/orders")
orders = r.json()
print(orders)

The browser session sets up the auth state in auth.json. The cookies transfer cleanly to requests. From here, your scraper runs 10-50× faster than the equivalent browser-only version.

Caveats:

  • HttpOnly cookies are usable here because we go through context.cookies(), not through JS.

  • Some APIs check Origin / Referer / User-Agent headers. Set them on the requests.Session to match the browser:

    session.headers.update({
    "User-Agent": "Mozilla/5.0 ...",
    "Origin": "https://practice.scrapingcentral.com",
    "Referer": "https://practice.scrapingcentral.com/account/dashboard",
    })
    

Detecting an expired session

Sessions expire. Your scraper needs to detect this and re-authenticate:

def scrape_with_retry(url):
  r = session.get(url)
  if r.status_code in {401, 403} or "Sign in" in r.text:
  print("Session expired, re-authenticating...")
  refresh_auth()
  r = session.get(url)
  return r

refresh_auth() re-runs the interactive flow (or, in a CI context, runs a headless login with credentials from a secrets manager) and updates auth.json. The next scraper run starts fresh.

For very long sessions, just rotate periodically, re-authenticate every N hours regardless of whether the current session still works.

Storage state for multiple accounts

Persist one state file per account:

for account in ["account_a.json", "account_b.json", "account_c.json"]:
  context = browser.new_context(storage_state=account)
  page = context.new_page()
  # scrape as this account
  context.close()

Each context is isolated. You can also run them in parallel (Lesson 2.26).

Security note

auth.json is the equivalent of a username and password. Treat it like a secret:

  • Never commit to a public repo.
  • Encrypt at rest (e.g., via SOPS, AWS KMS, or environment-injected at runtime).
  • Rotate when leaked.

For team-shared scrapers, store the file in a secrets manager (AWS Secrets Manager, Vault, 1Password) and pull at runtime.

When NOT to persist

A few scenarios where fresh sessions are better:

  • Sites that fingerprint session age. A "logged in 6 hours ago" cookie can be suspicious if your scraper bursts a thousand requests in two minutes.
  • Rotating proxies with session-tied auth. If your auth is IP-bound (rare but exists), persisted sessions break when the proxy IP changes.
  • A/B test buckets. Some sites assign you to a bucket on first visit; reusing that across runs may bias your scrape.

For most cases, persistence is a clear win. The exceptions are narrow.

Hands-on lab

Open /account/login, log in manually with headless=False, and save auth.json. Quit. Then run a separate script that uses storage_state="auth.json" to visit /account/dashboard, it should land logged in. Finally, extract the cookies and use requests to hit /account/orders directly. Note the speed difference: browser-driven vs HTTP-with-stolen-cookies.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /account/dashboard

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Persistent Contexts and Browser Profiles1 / 8

What does `context.storage_state(path='auth.json')` save?

Score so far: 0 / 0