Tutorial
Web Scraping Headers and Cookies - Complete Guide
Master HTTP headers and cookies for web scraping. Learn which headers to set, how to manage cookies, and how to avoid detection.
Proper HTTP headers and cookie management are fundamental to successful web scraping. They determine whether a website treats your scraper as a legitimate browser or blocks it as a bot.
Essential HTTP Headers
User-Agent
The most important header. Without it, most sites immediately block you.
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
}
Full Browser-Like Headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"DNT": "1",
}
Rotating User Agents
Use different user agents to avoid fingerprinting.
import random
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/125.0.0.0",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/125.0.0.0",
]
headers["User-Agent"] = random.choice(user_agents)
Cookie Management
Using Sessions
import requests
session = requests.Session()
# First request sets cookies
session.get("https://example.com")
# Subsequent requests include cookies automatically
resp = session.get("https://example.com/data")
Manual Cookie Handling
cookies = {
"session_id": "abc123",
"consent": "accepted",
}
resp = requests.get("https://example.com", cookies=cookies)
Headers That Get You Blocked
| Missing Header | Risk Level | Effect |
|---|---|---|
| No User-Agent | Critical | Instant block |
| No Accept-Language | Medium | Flagged as suspicious |
| Wrong Referer | Medium | May get redirected |
| No Sec-Fetch headers | Low-Medium | Detected by advanced systems |
The Easy Way: Let ScraperAPI Handle It
ScraperAPI automatically sets appropriate headers, manages cookies, and rotates fingerprints.
# No need to manage headers or cookies manually
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url=https://example.com"
)
Best Practices
- Always set a User-Agent, The bare minimum for any scraper
- Use
requests.Session(), Maintains cookies across requests automatically - Match header order, Some sites check header ordering
- Use ScrapingAnt or ScraperAPI to automate header management
- Check response codes, A 403 usually means your headers are wrong
- Copy headers from your browser, Use DevTools to see exactly what headers your browser sends