Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Handling API Authentication (API Keys, OAuth, Bearer Tokens)

Learn how to handle API keys, Bearer tokens, OAuth flows, and cookie-based auth when scraping protected APIs.

API Scraping · #3intermediate2 min read
Share:WhatsAppLinkedIn

Most production APIs require some form of authentication. Understanding each method lets you access data that unauthenticated requests cannot reach.

API Key in Query Parameters

The simplest approach, the key is passed as a URL parameter:

import requests

api_key = "your_api_key_here"
url = "https://api.openweathermap.org/data/2.5/weather"
params = {
    "q": "London",
    "appid": api_key,
    "units": "metric"
}

response = requests.get(url, params=params, timeout=15)
data = response.json()
print(f"Temperature: {data['main']['temp']}C")

API Key in Headers

Many APIs expect the key in a custom header:

import requests

headers = {
    "X-API-Key": "your_api_key_here",
    "Accept": "application/json",
}

response = requests.get(
    "https://api.example.com/v1/products",
    headers=headers,
    timeout=15
)

Bearer Token Authentication

OAuth2 APIs and JWTs typically use Bearer tokens in the Authorization header:

import requests

token = "eyJhbGciOiJIUzI1NiIsInR5..."

session = requests.Session()
session.headers.update({
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json",
})

response = session.get("https://api.example.com/me", timeout=15)
profile = response.json()
print(profile["email"])

Login Flow to Obtain a Token

When you need to log in first, capture the token from the login response:

import requests

session = requests.Session()

# Step 1: Log in
login_data = {"username": "user@example.com", "password": "s3cret"}
auth_response = session.post(
    "https://api.example.com/auth/login",
    json=login_data,
    timeout=15
)
auth_response.raise_for_status()
token = auth_response.json()["access_token"]

# Step 2: Use the token for subsequent requests
session.headers["Authorization"] = f"Bearer {token}"

data = session.get("https://api.example.com/dashboard", timeout=15)
print(data.json())

Handling Token Expiry

Tokens often expire. Wrap your requests with refresh logic:

def make_request(session, url, refresh_url, credentials):
    response = session.get(url, timeout=15)
    if response.status_code == 401:
        # Token expired - refresh it
        auth = session.post(refresh_url, json=credentials, timeout=15)
        session.headers["Authorization"] = f"Bearer {auth.json()['access_token']}"
        response = session.get(url, timeout=15)
    response.raise_for_status()
    return response.json()

Authentication Methods Summary

Method Where Example
API Key (param) URL query string ?api_key=abc123
API Key (header) Custom header X-API-Key: abc123
Bearer Token Authorization header Bearer eyJ...
Basic Auth Authorization header Basic base64(user:pass)
Cookie Cookie header session=abc123

For APIs behind Cloudflare or similar protections, ScraperAPI and ScrapingAnt can proxy your authenticated requests while handling CAPTCHAs and IP rotation.

Next Steps

  • Scrape paginated API endpoints
  • Handle rate limiting gracefully
  • Work with cookies and session tokens