Handling API Authentication (API Keys, OAuth, Bearer Tokens)
Learn how to handle API keys, Bearer tokens, OAuth flows, and cookie-based auth when scraping protected APIs.
Most production APIs require some form of authentication. Understanding each method lets you access data that unauthenticated requests cannot reach.
API Key in Query Parameters
The simplest approach, the key is passed as a URL parameter:
import requests
api_key = "your_api_key_here"
url = "https://api.openweathermap.org/data/2.5/weather"
params = {
"q": "London",
"appid": api_key,
"units": "metric"
}
response = requests.get(url, params=params, timeout=15)
data = response.json()
print(f"Temperature: {data['main']['temp']}C")
API Key in Headers
Many APIs expect the key in a custom header:
import requests
headers = {
"X-API-Key": "your_api_key_here",
"Accept": "application/json",
}
response = requests.get(
"https://api.example.com/v1/products",
headers=headers,
timeout=15
)
Bearer Token Authentication
OAuth2 APIs and JWTs typically use Bearer tokens in the Authorization header:
import requests
token = "eyJhbGciOiJIUzI1NiIsInR5..."
session = requests.Session()
session.headers.update({
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
})
response = session.get("https://api.example.com/me", timeout=15)
profile = response.json()
print(profile["email"])
Login Flow to Obtain a Token
When you need to log in first, capture the token from the login response:
import requests
session = requests.Session()
# Step 1: Log in
login_data = {"username": "user@example.com", "password": "s3cret"}
auth_response = session.post(
"https://api.example.com/auth/login",
json=login_data,
timeout=15
)
auth_response.raise_for_status()
token = auth_response.json()["access_token"]
# Step 2: Use the token for subsequent requests
session.headers["Authorization"] = f"Bearer {token}"
data = session.get("https://api.example.com/dashboard", timeout=15)
print(data.json())
Handling Token Expiry
Tokens often expire. Wrap your requests with refresh logic:
def make_request(session, url, refresh_url, credentials):
response = session.get(url, timeout=15)
if response.status_code == 401:
# Token expired - refresh it
auth = session.post(refresh_url, json=credentials, timeout=15)
session.headers["Authorization"] = f"Bearer {auth.json()['access_token']}"
response = session.get(url, timeout=15)
response.raise_for_status()
return response.json()
Authentication Methods Summary
| Method | Where | Example |
|---|---|---|
| API Key (param) | URL query string | ?api_key=abc123 |
| API Key (header) | Custom header | X-API-Key: abc123 |
| Bearer Token | Authorization header | Bearer eyJ... |
| Basic Auth | Authorization header | Basic base64(user:pass) |
| Cookie | Cookie header | session=abc123 |
For APIs behind Cloudflare or similar protections, ScraperAPI and ScrapingAnt can proxy your authenticated requests while handling CAPTCHAs and IP rotation.
Next Steps
- Scrape paginated API endpoints
- Handle rate limiting gracefully
- Work with cookies and session tokens