Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

1.4beginner5 min read

Sessions, Cookies, and Persistent State

Use `requests.Session` to persist cookies, default headers, and connection pools across many requests, the right way to scrape any site that tracks state.

What you’ll learn

  • Understand what a cookie is and how the browser stores and sends them.
  • Use `requests.Session()` to maintain state across multiple requests.
  • Inspect, set, and clear cookies programmatically.
  • Recognise when a site requires a session cookie before serving real content.

HTTP is stateless. Every request stands on its own. So how do sites remember who's logged in, what's in your cart, which A/B test bucket you're in? Cookies. And on the scraper side, the abstraction that makes them painless is the Session.

What a cookie is

A cookie is a small key/value pair the server hands you and you hand back on every subsequent request:

Server  →  Client:  Set-Cookie: session_id=abc123; Path=/; HttpOnly
Client  →  Server:  Cookie: session_id=abc123

The server writes the cookie via the Set-Cookie response header. The client (browser or scraper) stores it and sends it back via the Cookie request header on future requests to that domain. That's the entire mechanism.

Common cookie attributes:

Attribute What it does
Path=/ Scope: only sent for URLs under this path
Domain=.example.com Scope: sent to this domain and subdomains
Expires=... / Max-Age=... When the cookie should be discarded
HttpOnly JavaScript can't read it (irrelevant for HTTP-level scrapers)
Secure Only sent over HTTPS
SameSite=Lax Controls cross-site sending

For scrapers, you mostly care about Path, Domain, and expiration. HttpOnly and SameSite are browser concerns; requests ignores them and sends every applicable cookie regardless.

The wrong way: passing cookies manually

Beginners often do this:

# Painful
r1 = requests.get(login_url)
cookies = r1.cookies
r2 = requests.post(login_url, data=creds, cookies=cookies)
cookies.update(r2.cookies)
r3 = requests.get(dashboard_url, cookies=cookies)

This works but it's error-prone. There's a better way.

The right way: requests.Session()

import requests

s = requests.Session()

# Cookies set by the server are automatically stored and sent
s.get("https://practice.scrapingcentral.com/")
s.post(
  "https://practice.scrapingcentral.com/account/login",
  data={"username": "student@practice.scrapingcentral.com", "password": "practice123"},
)
r = s.get("https://practice.scrapingcentral.com/dashboard")
print("Logged in?", "Welcome" in r.text)

A Session object does three things automatically:

  1. Persists cookies, every Set-Cookie is stored; every subsequent request includes applicable cookies.
  2. Reuses TCP connections, connection pooling makes repeated requests to the same host much faster.
  3. Applies default headers, set headers once on the session, they apply to every request.

You should be using requests.Session() for nearly every scraper. The top-level requests.get(...) is fine for one-off scripts but suboptimal for anything that hits the same site more than once.

Default headers on a session

s = requests.Session()
s.headers.update({
  "User-Agent": "Mozilla/5.0 ...",
  "Accept-Language": "en-US,en;q=0.9",
})
# All future s.get/s.post inherit these headers automatically

This is dramatically cleaner than passing headers={...} on every call.

Inspecting cookies

s = requests.Session()
s.get("https://practice.scrapingcentral.com/")
print(s.cookies)
for cookie in s.cookies:
  print(cookie.name, cookie.value, cookie.domain, cookie.path)

You can also access them as a dict:

print(s.cookies.get_dict())
# {'session_id': 'abc123', 'csrftoken': 'xyz'}

Setting cookies manually

Sometimes you have a session ID from elsewhere (e.g. you logged in via a browser, copied the cookie out of DevTools):

s = requests.Session()
s.cookies.set("session_id", "abc123", domain="practice.scrapingcentral.com")

Useful for one-off scripts where re-implementing the full login is more work than just borrowing a known-good session.

Clearing cookies

s.cookies.clear()  # all of them
s.cookies.clear(domain="example.com")  # just this domain

A site that requires a session cookie

Some labs at /challenges/static/cookies/required reject your request unless you've first visited the homepage to pick up a session cookie. The pattern looks like this:

s = requests.Session()

# 1. Visit the site root, server issues Set-Cookie: session_id=...
s.get("https://practice.scrapingcentral.com/")

# 2. Now the protected endpoint accepts you
r = s.get("https://practice.scrapingcentral.com/challenges/static/cookies/required")
print(r.status_code, r.text[:200])

If you skip step 1, step 2 fails. This is a common anti-bot tactic: serve a real page only to clients that have proven they can store and replay cookies. Most bare-bones bots don't bother. Sessions handle this for you for free.

Cookie persistence across runs

Session() cookies live in memory and vanish when the process exits. To persist across runs:

import pickle

# Save
with open("cookies.pkl", "wb") as f:
  pickle.dump(s.cookies, f)

# Load
with open("cookies.pkl", "rb") as f:
  s.cookies.update(pickle.load(f))

Or use http.cookiejar.MozillaCookieJar for a text format compatible with browser cookie files. For most scrapers though, just log in fresh each run, it's simpler and avoids stale-session bugs.

Connection pooling, the silent speedup

A session reuses the underlying TCP/TLS connection for repeated requests to the same host:

import time, requests

# Without session, new connection each time
t0 = time.time()
for _ in range(20):
  requests.get("https://practice.scrapingcentral.com/")
print("Without session:", time.time() - t0)

# With session, pooled connection
t0 = time.time()
s = requests.Session()
for _ in range(20):
  s.get("https://practice.scrapingcentral.com/")
print("With session:", time.time() - t0)

The session version is typically 2-3x faster purely from skipping repeated TLS handshakes. For larger scrapers, this difference compounds dramatically.

Hands-on lab

Open /challenges/static/cookies/required directly (no homepage visit first). Note what you get. Now use a requests.Session() to first hit /, then hit /challenges/static/cookies/required. Confirm the difference, the second approach should succeed where the first failed. Print s.cookies.get_dict() to see what was carried between requests.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/static/cookies/required

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Sessions, Cookies, and Persistent State1 / 8

Which header does a server use to issue a new cookie to the client?

Score so far: 0 / 0