Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Scraping APIs That Require Cookies

Learn how to handle cookie-based authentication and session management when scraping APIs that rely on browser cookies.

API Scraping · #11intermediate3 min read
Share:WhatsAppLinkedIn

Many websites use cookies to track sessions, enforce authentication, and apply anti-bot measures. If you call their API without the right cookies, you get 403 errors or empty responses.

How Cookie-Based APIs Work

  1. You visit the website, it sets initial cookies (session ID, CSRF token)
  2. You log in, the server updates cookies with auth credentials
  3. Every API call includes these cookies automatically in a browser
  4. Your scraper must replicate this cookie flow

Using Sessions to Manage Cookies

The requests.Session object automatically stores and sends cookies:

import requests

session = requests.Session()

# Step 1: Visit the homepage to get initial cookies
session.get("https://quotes.toscrape.com/", timeout=15)
print(f"Cookies after homepage: {dict(session.cookies)}")

# Step 2: Log in, session captures the auth cookies
login_response = session.post(
    "https://quotes.toscrape.com/login",
    data={"username": "admin", "password": "admin"},
    timeout=15,
)
print(f"Cookies after login: {dict(session.cookies)}")

# Step 3: Access protected pages with the session
response = session.get("https://quotes.toscrape.com/", timeout=15)
print(f"Logged in: {'Logout' in response.text}")

Extracting Cookies from Your Browser

When manual login is complex (2FA, CAPTCHA), export cookies from your browser:

import requests

# Copy cookies from DevTools > Application > Cookies
cookies = {
    "session_id": "abc123def456",
    "auth_token": "eyJhbGciOi...",
    "_csrf": "x9y8z7w6",
}

session = requests.Session()
session.cookies.update(cookies)
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
})

response = session.get(
    "https://www.example.com/api/dashboard",
    timeout=15,
)
print(response.json())

Using browser_cookie3 to Auto-Extract Cookies

import browser_cookie3
import requests

# Grab cookies from Chrome for a specific domain
cookies = browser_cookie3.chrome(domain_name=".example.com")

session = requests.Session()
session.cookies = cookies

response = session.get("https://www.example.com/api/profile", timeout=15)
print(response.json())
pip install browser-cookie3

Handling CSRF Tokens

Some APIs require a CSRF token from the HTML page to be sent with each request:

import requests
from bs4 import BeautifulSoup

session = requests.Session()

# Fetch the page to get the CSRF token
page = session.get("https://www.example.com/login", timeout=15)
soup = BeautifulSoup(page.text, "html.parser")
csrf_token = soup.find("input", {"name": "csrf_token"})["value"]

# Include CSRF token in the login request
session.post(
    "https://www.example.com/login",
    data={
        "username": "user",
        "password": "pass",
        "csrf_token": csrf_token,
    },
    timeout=15,
)

# Now API calls work with proper session cookies
data = session.get("https://www.example.com/api/orders", timeout=15)
print(data.json())

Cookie Troubleshooting

Problem Solution
403 after login Check if CSRF token or Referer header is missing
Cookies expire Re-authenticate periodically
HttpOnly cookies Use Session object, it handles them correctly
SameSite cookies Ensure Referer and Origin headers match the domain

When dealing with complex cookie flows behind Cloudflare or similar protections, ScrapingAnt handles the full browser session including cookies, JavaScript execution, and CAPTCHA solving.

Next Steps

  • Explore APIs with Postman for easier debugging
  • Handle token-based auth alongside cookies
  • Build a persistent session manager for long-running scrapers