Cookie-Based Session Replication
The oldest scraper auth pattern: log in, capture the session cookie, replay it. Still the most common in 2026, and full of subtle traps.
What you’ll learn
- Execute a login flow programmatically and capture the session cookie.
- Re-use the captured session across many requests with Session / CookieJar.
- Detect session expiry and re-authenticate automatically.
- Avoid the four classic cookie-handling bugs.
The simplest authenticated-scraping pattern: log in like a browser would, capture the Set-Cookie response, replay it on every subsequent request. Works against any site whose login flow doesn't add fingerprinting or CAPTCHAs.
It's also the pattern most riddled with subtle bugs.
The flow on Catalog108
/api/auth/login accepts a POST with {email, password} and sets a session cookie:
HTTP/1.1 200 OK
Set-Cookie: session=eyJ...; HttpOnly; Secure; Path=/; SameSite=Lax
Content-Type: application/json
{"access_token": "...", "user": {"email": "..."}}
After that, any request that includes the Cookie: session=eyJ... header is treated as logged in.
Python, requests.Session() handles it for you
import requests
s = requests.Session()
# 1. Log in (response sets the cookie)
r = s.post(
"https://practice.scrapingcentral.com/api/auth/login",
json={
"email": "student@practice.scrapingcentral.com",
"password": "practice123",
},
)
r.raise_for_status()
# 2. Subsequent calls automatically include the cookie
r = s.get("https://practice.scrapingcentral.com/api/auth/me")
print(r.json()) # → {'email': '...', 'role': 'student'...}
r = s.get("https://practice.scrapingcentral.com/account/orders")
print(r.json())
The Session object stores cookies in s.cookies (a RequestsCookieJar). Print it to debug:
print(s.cookies.get_dict()) # → {'session': 'eyJ...'}
PHP, Guzzle CookieJar
use GuzzleHttp\Client;
use GuzzleHttp\Cookie\CookieJar;
$jar = new CookieJar();
$client = new Client([
'base_uri' => 'https://practice.scrapingcentral.com',
'cookies' => $jar,
]);
$client->post('/api/auth/login', [
'json' => [
'email' => 'student@practice.scrapingcentral.com',
'password' => 'practice123',
],
]);
$me = json_decode($client->get('/api/auth/me')->getBody()->getContents(), true);
print_r($me);
The CookieJar is the equivalent abstraction.
Persisting the session across runs
For long-running scrapers, you don't want to re-login on every script invocation. Serialize the cookies:
# Save after first login
import pickle
with open("session.pkl", "wb") as f:
pickle.dump(s.cookies, f)
# Reload at the next run
with open("session.pkl", "rb") as f:
s.cookies.update(pickle.load(f))
PHP equivalent: FileCookieJar:
use GuzzleHttp\Cookie\FileCookieJar;
$jar = new FileCookieJar(__DIR__ . '/cookies.json', true);
$client = new Client(['base_uri' => '...', 'cookies' => $jar]);
// Cookies are persisted automatically on script exit
Detecting session expiry
Sessions die. Cookie expiry, server-side timeout, a deploy that invalidates everyone. Your scraper must detect and re-login.
Pattern: wrap the call, check for 401 (or a redirect to /login), re-auth, retry once.
def authed_get(s, url, **kwargs):
r = s.get(url, **kwargs)
if r.status_code == 401 or "/login" in r.url:
# Re-authenticate
login(s)
r = s.get(url, **kwargs)
return r
def login(s):
s.post(
"https://practice.scrapingcentral.com/api/auth/login",
json={"email": EMAIL, "password": PASSWORD},
)
Cookie attributes and what they mean
A typical Set-Cookie header:
Set-Cookie: session=eyJ...; Domain=.example.com; Path=/; Expires=Wed, 21 Oct 2025 07:28:00 GMT; HttpOnly; Secure; SameSite=Lax
Attributes:
Domain, the cookie applies to this domain and all subdomains. If absent, only the exact host that set it.Path, only sent for requests under this path.Expires/Max-Age, when the cookie dies. Session cookies (no expires/max-age) die when the browser closes, your scraper has to manage that.HttpOnly, JS can't read this cookie. Doesn't affect HTTP libraries; they still send it.Secure, only sent over HTTPS.SameSite, restricts cross-origin sending. Lax/Strict/None. Mostly irrelevant for scrapers.
HttpOnly is a frequent misconception: it doesn't block your scraper. It only blocks browser-side JS.
Four classic bugs
-
Hard-coding a captured cookie. Works today, expires tomorrow. Always automate the login.
-
Bare
requests.get()instead ofSession(). Each call is cookie-less. The session-after-login pattern only works through a Session/CookieJar. -
Two sessions on the same script. Created
s1 = Session()for login ands2 = Session()for fetches.s2has no cookies. Always reuse the same Session. -
Cookies set on a different domain than your fetch. Login on
auth.example.com, scrape onapi.example.com. If the Set-Cookie's Domain attribute isauth.example.com, the cookie won't travel toapi.example.com. Check the Domain attribute; sometimes you need to manually copy the cookie or hit a/exchangeendpoint.
Combining with the client class
The cleanest pattern: bake login + auto-refresh into the API client.
class Catalog108Client:
BASE_URL = "https://practice.scrapingcentral.com"
def __init__(self, email: str, password: str):
self.email, self.password = email, password
self.s = requests.Session()
self.s.headers.update({"Accept": "application/json"})
def _ensure_authed(self):
r = self.s.get(f"{self.BASE_URL}/api/auth/me")
if r.status_code == 401:
self._login()
def _login(self):
self.s.post(f"{self.BASE_URL}/api/auth/login",
json={"email": self.email, "password": self.password})
def __call_with_retry(self, method, path, **kw):
for attempt in range(2):
r = self.s.request(method, f"{self.BASE_URL}{path}", **kw)
if r.status_code == 401 and attempt == 0:
self._login()
continue
r.raise_for_status()
return r.json()
def orders(self):
return self.__call_with_retry("GET", "/account/orders")
Now the caller writes client.orders() and never thinks about expiry.
When cookie auth isn't enough
Cookie auth tops out around:
- Sites with CSRF protection on POSTs. You also need an
X-CSRF-Token(lesson 3.19). - Sites that fingerprint TLS or HTTP/2. Your cookie is valid but the request fingerprint is wrong (lesson 3.49, 3.50).
- Sites that issue short-lived access tokens via JWT. Cookie auth gives way to JWT auth (lesson 3.17).
But for the long tail of regular SaaS dashboards, retail sites, and partner portals, cookie auth is enough. It's still the single most common pattern in 2026.
Hands-on lab
Log in to Catalog108 via /api/auth/login, then use the same session to GET /account/orders and /api/auth/me. Save the cookies to disk, restart your script, reload them, and confirm the session still works (until the cookie expires). Wrap the whole thing in a class that auto-logs-in on 401, so calls become idempotent regardless of session state.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/account/ordersQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.