Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

3.16intermediate5 min read

Cookie-Based Session Replication

The oldest scraper auth pattern: log in, capture the session cookie, replay it. Still the most common in 2026, and full of subtle traps.

What you’ll learn

  • Execute a login flow programmatically and capture the session cookie.
  • Re-use the captured session across many requests with Session / CookieJar.
  • Detect session expiry and re-authenticate automatically.
  • Avoid the four classic cookie-handling bugs.

The simplest authenticated-scraping pattern: log in like a browser would, capture the Set-Cookie response, replay it on every subsequent request. Works against any site whose login flow doesn't add fingerprinting or CAPTCHAs.

It's also the pattern most riddled with subtle bugs.

The flow on Catalog108

/api/auth/login accepts a POST with {email, password} and sets a session cookie:

HTTP/1.1 200 OK
Set-Cookie: session=eyJ...; HttpOnly; Secure; Path=/; SameSite=Lax
Content-Type: application/json

{"access_token": "...", "user": {"email": "..."}}

After that, any request that includes the Cookie: session=eyJ... header is treated as logged in.

Python, requests.Session() handles it for you

import requests

s = requests.Session()

# 1. Log in (response sets the cookie)
r = s.post(
  "https://practice.scrapingcentral.com/api/auth/login",
  json={
  "email": "student@practice.scrapingcentral.com",
  "password": "practice123",
  },
)
r.raise_for_status()

# 2. Subsequent calls automatically include the cookie
r = s.get("https://practice.scrapingcentral.com/api/auth/me")
print(r.json())  # → {'email': '...', 'role': 'student'...}

r = s.get("https://practice.scrapingcentral.com/account/orders")
print(r.json())

The Session object stores cookies in s.cookies (a RequestsCookieJar). Print it to debug:

print(s.cookies.get_dict())  # → {'session': 'eyJ...'}

PHP, Guzzle CookieJar

use GuzzleHttp\Client;
use GuzzleHttp\Cookie\CookieJar;

$jar = new CookieJar();

$client = new Client([
  'base_uri' => 'https://practice.scrapingcentral.com',
  'cookies'  => $jar,
]);

$client->post('/api/auth/login', [
  'json' => [
  'email' => 'student@practice.scrapingcentral.com',
  'password' => 'practice123',
  ],
]);

$me = json_decode($client->get('/api/auth/me')->getBody()->getContents(), true);
print_r($me);

The CookieJar is the equivalent abstraction.

Persisting the session across runs

For long-running scrapers, you don't want to re-login on every script invocation. Serialize the cookies:

# Save after first login
import pickle
with open("session.pkl", "wb") as f:
  pickle.dump(s.cookies, f)

# Reload at the next run
with open("session.pkl", "rb") as f:
  s.cookies.update(pickle.load(f))

PHP equivalent: FileCookieJar:

use GuzzleHttp\Cookie\FileCookieJar;

$jar = new FileCookieJar(__DIR__ . '/cookies.json', true);
$client = new Client(['base_uri' => '...', 'cookies' => $jar]);

// Cookies are persisted automatically on script exit

Detecting session expiry

Sessions die. Cookie expiry, server-side timeout, a deploy that invalidates everyone. Your scraper must detect and re-login.

Pattern: wrap the call, check for 401 (or a redirect to /login), re-auth, retry once.

def authed_get(s, url, **kwargs):
  r = s.get(url, **kwargs)
  if r.status_code == 401 or "/login" in r.url:
  # Re-authenticate
  login(s)
  r = s.get(url, **kwargs)
  return r

def login(s):
  s.post(
  "https://practice.scrapingcentral.com/api/auth/login",
  json={"email": EMAIL, "password": PASSWORD},
  )

Cookie attributes and what they mean

A typical Set-Cookie header:

Set-Cookie: session=eyJ...; Domain=.example.com; Path=/; Expires=Wed, 21 Oct 2025 07:28:00 GMT; HttpOnly; Secure; SameSite=Lax

Attributes:

  • Domain, the cookie applies to this domain and all subdomains. If absent, only the exact host that set it.
  • Path, only sent for requests under this path.
  • Expires / Max-Age, when the cookie dies. Session cookies (no expires/max-age) die when the browser closes, your scraper has to manage that.
  • HttpOnly, JS can't read this cookie. Doesn't affect HTTP libraries; they still send it.
  • Secure, only sent over HTTPS.
  • SameSite, restricts cross-origin sending. Lax/Strict/None. Mostly irrelevant for scrapers.

HttpOnly is a frequent misconception: it doesn't block your scraper. It only blocks browser-side JS.

Four classic bugs

  1. Hard-coding a captured cookie. Works today, expires tomorrow. Always automate the login.

  2. Bare requests.get() instead of Session(). Each call is cookie-less. The session-after-login pattern only works through a Session/CookieJar.

  3. Two sessions on the same script. Created s1 = Session() for login and s2 = Session() for fetches. s2 has no cookies. Always reuse the same Session.

  4. Cookies set on a different domain than your fetch. Login on auth.example.com, scrape on api.example.com. If the Set-Cookie's Domain attribute is auth.example.com, the cookie won't travel to api.example.com. Check the Domain attribute; sometimes you need to manually copy the cookie or hit a /exchange endpoint.

Combining with the client class

The cleanest pattern: bake login + auto-refresh into the API client.

class Catalog108Client:
  BASE_URL = "https://practice.scrapingcentral.com"

  def __init__(self, email: str, password: str):
  self.email, self.password = email, password
  self.s = requests.Session()
  self.s.headers.update({"Accept": "application/json"})

  def _ensure_authed(self):
  r = self.s.get(f"{self.BASE_URL}/api/auth/me")
  if r.status_code == 401:
  self._login()

  def _login(self):
  self.s.post(f"{self.BASE_URL}/api/auth/login",
  json={"email": self.email, "password": self.password})

  def __call_with_retry(self, method, path, **kw):
  for attempt in range(2):
  r = self.s.request(method, f"{self.BASE_URL}{path}", **kw)
  if r.status_code == 401 and attempt == 0:
  self._login()
  continue
  r.raise_for_status()
  return r.json()

  def orders(self):
  return self.__call_with_retry("GET", "/account/orders")

Now the caller writes client.orders() and never thinks about expiry.

When cookie auth isn't enough

Cookie auth tops out around:

  • Sites with CSRF protection on POSTs. You also need an X-CSRF-Token (lesson 3.19).
  • Sites that fingerprint TLS or HTTP/2. Your cookie is valid but the request fingerprint is wrong (lesson 3.49, 3.50).
  • Sites that issue short-lived access tokens via JWT. Cookie auth gives way to JWT auth (lesson 3.17).

But for the long tail of regular SaaS dashboards, retail sites, and partner portals, cookie auth is enough. It's still the single most common pattern in 2026.

Hands-on lab

Log in to Catalog108 via /api/auth/login, then use the same session to GET /account/orders and /api/auth/me. Save the cookies to disk, restart your script, reload them, and confirm the session still works (until the cookie expires). Wrap the whole thing in a class that auto-logs-in on 401, so calls become idempotent regardless of session state.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /account/orders

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Cookie-Based Session Replication1 / 8

After successful login, why are subsequent `requests.get(...)` calls (without a Session) still unauthenticated?

Score so far: 0 / 0