Sessions, Cookies, and Persistent State, Static Scraping

Use `requests.Session` to persist cookies, default headers, and connection pools across many requests, the right way to scrape any site that tracks state.

HTTP is stateless. Every request stands on its own. So how do sites remember who's logged in, what's in your cart, which A/B test bucket you're in? Cookies. And on the scraper side, the abstraction that makes them painless is the Session.

What a cookie is

A cookie is a small key/value pair the server hands you and you hand back on every subsequent request:

Server  →  Client:  Set-Cookie: session_id=abc123; Path=/; HttpOnly
Client  →  Server:  Cookie: session_id=abc123

The server writes the cookie via the Set-Cookie response header. The client (browser or scraper) stores it and sends it back via the Cookie request header on future requests to that domain. That's the entire mechanism.

Common cookie attributes:

Attribute	What it does
`Path=/`	Scope: only sent for URLs under this path
`Domain=.example.com`	Scope: sent to this domain and subdomains
`Expires=...` / `Max-Age=...`	When the cookie should be discarded
`HttpOnly`	JavaScript can't read it (irrelevant for HTTP-level scrapers)
`Secure`	Only sent over HTTPS
`SameSite=Lax`	Controls cross-site sending

For scrapers, you mostly care about Path, Domain, and expiration. HttpOnly and SameSite are browser concerns; requests ignores them and sends every applicable cookie regardless.

The wrong way: passing cookies manually

Beginners often do this:

# Painful
r1 = requests.get(login_url)
cookies = r1.cookies
r2 = requests.post(login_url, data=creds, cookies=cookies)
cookies.update(r2.cookies)
r3 = requests.get(dashboard_url, cookies=cookies)

This works but it's error-prone. There's a better way.

The right way: `requests.Session()`

import requests

s = requests.Session()

# Cookies set by the server are automatically stored and sent
s.get("https://practice.scrapingcentral.com/")
s.post(
  "https://practice.scrapingcentral.com/account/login",
  data={"username": "student@practice.scrapingcentral.com", "password": "practice123"},
)
r = s.get("https://practice.scrapingcentral.com/dashboard")
print("Logged in?", "Welcome" in r.text)

A Session object does three things automatically:

Persists cookies, every Set-Cookie is stored; every subsequent request includes applicable cookies.
Reuses TCP connections, connection pooling makes repeated requests to the same host much faster.
Applies default headers, set headers once on the session, they apply to every request.

You should be using requests.Session() for nearly every scraper. The top-level requests.get(...) is fine for one-off scripts but suboptimal for anything that hits the same site more than once.

Default headers on a session

s = requests.Session()
s.headers.update({
  "User-Agent": "Mozilla/5.0 ...",
  "Accept-Language": "en-US,en;q=0.9",
})
# All future s.get/s.post inherit these headers automatically

This is dramatically cleaner than passing headers={...} on every call.

Inspecting cookies

s = requests.Session()
s.get("https://practice.scrapingcentral.com/")
print(s.cookies)
for cookie in s.cookies:
  print(cookie.name, cookie.value, cookie.domain, cookie.path)

You can also access them as a dict:

print(s.cookies.get_dict())
# {'session_id': 'abc123', 'csrftoken': 'xyz'}

Setting cookies manually

Sometimes you have a session ID from elsewhere (e.g. you logged in via a browser, copied the cookie out of DevTools):

s = requests.Session()
s.cookies.set("session_id", "abc123", domain="practice.scrapingcentral.com")

Useful for one-off scripts where re-implementing the full login is more work than just borrowing a known-good session.

Clearing cookies

s.cookies.clear()  # all of them
s.cookies.clear(domain="example.com")  # just this domain

A site that requires a session cookie

Some labs at /challenges/static/cookies/required reject your request unless you've first visited the homepage to pick up a session cookie. The pattern looks like this:

s = requests.Session()

# 1. Visit the site root, server issues Set-Cookie: session_id=...
s.get("https://practice.scrapingcentral.com/")

# 2. Now the protected endpoint accepts you
r = s.get("https://practice.scrapingcentral.com/challenges/static/cookies/required")
print(r.status_code, r.text[:200])

If you skip step 1, step 2 fails. This is a common anti-bot tactic: serve a real page only to clients that have proven they can store and replay cookies. Most bare-bones bots don't bother. Sessions handle this for you for free.

Cookie persistence across runs

Session() cookies live in memory and vanish when the process exits. To persist across runs:

import pickle

# Save
with open("cookies.pkl", "wb") as f:
  pickle.dump(s.cookies, f)

# Load
with open("cookies.pkl", "rb") as f:
  s.cookies.update(pickle.load(f))

Or use http.cookiejar.MozillaCookieJar for a text format compatible with browser cookie files. For most scrapers though, just log in fresh each run, it's simpler and avoids stale-session bugs.

Connection pooling, the silent speedup

A session reuses the underlying TCP/TLS connection for repeated requests to the same host:

import time, requests

# Without session, new connection each time
t0 = time.time()
for _ in range(20):
  requests.get("https://practice.scrapingcentral.com/")
print("Without session:", time.time() - t0)

# With session, pooled connection
t0 = time.time()
s = requests.Session()
for _ in range(20):
  s.get("https://practice.scrapingcentral.com/")
print("With session:", time.time() - t0)

The session version is typically 2-3x faster purely from skipping repeated TLS handshakes. For larger scrapers, this difference compounds dramatically.

Hands-on lab

Open /challenges/static/cookies/required directly (no homepage visit first). Note what you get. Now use a requests.Session() to first hit /, then hit /challenges/static/cookies/required. Confirm the difference, the second approach should succeed where the first failed. Print s.cookies.get_dict() to see what was carried between requests.

Sessions, Cookies, and Persistent State

What you’ll learn

What a cookie is

The wrong way: passing cookies manually

The right way: `requests.Session()`

Default headers on a session

Inspecting cookies

Setting cookies manually

Clearing cookies

A site that requires a session cookie

Cookie persistence across runs

Connection pooling, the silent speedup

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which header does a server use to issue a new cookie to the client?

Sessions, Cookies, and Persistent State

What you’ll learn

What a cookie is

The wrong way: passing cookies manually

The right way: requests.Session()

Default headers on a session

Inspecting cookies

Setting cookies manually

Clearing cookies

A site that requires a session cookie

Cookie persistence across runs

Connection pooling, the silent speedup

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which header does a server use to issue a new cookie to the client?

The right way: `requests.Session()`