OAuth 2.0 Flows for Scrapers
OAuth isn't a single flow, it's a family. Authorization Code, Client Credentials, Refresh. Here's which one applies to your scraper and how to execute it end-to-end.
What you’ll learn
- Distinguish the OAuth 2.0 flows by use case.
- Execute Client Credentials (machine-to-machine) and Authorization Code flows.
- Refresh access tokens with a refresh_token grant.
- Pick the right flow for a given target.
OAuth 2.0 is everywhere, Google, Spotify, Reddit, partner APIs, internal admin APIs. As a scraper, you'll meet two flows most often:
- Client Credentials, machine-to-machine. Your scraper has a
client_id+client_secret, swaps them for an access token. No user involved. - Authorization Code, a user grants your app access via a browser redirect. The scraper redeems a code for tokens.
A third, Refresh Token grant, applies to both, for renewing expired access tokens.
This lesson walks all three against Catalog108's /oauth/authorize and /oauth/token.
The Catalog108 OAuth setup
For the lab:
client_id=catalog108-demo-clientclient_secret=demo-secretredirect_uri=https://practice.scrapingcentral.com/challenges/api/auth/oauth2- Authorize URL:
/oauth/authorize - Token URL:
/oauth/token
Real-world creds are issued by visiting the target's "Developer" or "Apps" page and registering an application.
Flow 1, Client Credentials (machine-to-machine)
Your scraper IS the client. Simplest flow, no user, no browser.
import requests
r = requests.post(
"https://practice.scrapingcentral.com/oauth/token",
data={
"grant_type": "client_credentials",
"client_id": "catalog108-demo-client",
"client_secret": "demo-secret",
"scope": "read:products",
},
)
r.raise_for_status()
data = r.json()
# {'access_token': '...', 'token_type': 'Bearer', 'expires_in': 3600}
access = data["access_token"]
# Use it on calls
r = requests.get(
"https://practice.scrapingcentral.com/api/products",
headers={"Authorization": f"Bearer {access}"},
)
Notice:
- POST is form-encoded (
data=, notjson=). OAuth 2 mandatesapplication/x-www-form-urlencoded. grant_type=client_credentialstells the server "I'm a machine, here are my keys."scopeis the requested permissions. Servers may reject if you ask for too much.
This is the flow for B2B integrations, Stripe, Twilio, SendGrid all support it.
Flow 2, Authorization Code (user-delegated)
A user has data on the target site. Your scraper wants access. The user must consent via a browser.
The flow:
- Send the user to the authorize URL with
client_id,redirect_uri,scope, and a randomstate. - The user logs in, clicks "Allow."
- The target redirects to
redirect_uri?code=...&state=.... - Your scraper POSTs the
codeto the token endpoint and gets backaccess_token+refresh_token.
import requests, secrets, webbrowser
from urllib.parse import urlencode
client_id = "catalog108-demo-client"
client_secret = "demo-secret"
redirect_uri = "https://practice.scrapingcentral.com/challenges/api/auth/oauth2"
state = secrets.token_urlsafe(16)
# Step 1: send the user to the authorize URL
authorize_url = "https://practice.scrapingcentral.com/oauth/authorize?" + urlencode({
"response_type": "code",
"client_id": client_id,
"redirect_uri": redirect_uri,
"scope": "read:profile read:products",
"state": state,
})
print("Visit this URL in your browser:")
print(authorize_url)
# webbrowser.open(authorize_url) # uncomment to auto-open
# Step 2: after the user authorizes, copy the `code` from the redirect URL
code = input("Paste the `code` parameter from the redirect: ").strip()
# Step 3: exchange the code for tokens
r = requests.post(
"https://practice.scrapingcentral.com/oauth/token",
data={
"grant_type": "authorization_code",
"code": code,
"client_id": client_id,
"client_secret": client_secret,
"redirect_uri": redirect_uri,
},
)
r.raise_for_status()
data = r.json()
print(data)
# {'access_token': '...', 'refresh_token': '...', 'expires_in': 3600...}
The state parameter prevents CSRF on the redirect: the server echoes it back; you check it matches what you sent.
For unattended scrapers, you do the auth dance once manually, save the refresh token, and from then on use the Refresh grant.
Flow 3, Refresh Token grant
When the access token expires:
r = requests.post(
"https://practice.scrapingcentral.com/oauth/token",
data={
"grant_type": "refresh_token",
"refresh_token": stored_refresh_token,
"client_id": "catalog108-demo-client",
"client_secret": "demo-secret",
},
)
new_tokens = r.json()
Some servers rotate refresh tokens; always store whatever comes back.
PHP version (Client Credentials)
use GuzzleHttp\Client;
$client = new Client(['base_uri' => 'https://practice.scrapingcentral.com']);
$res = $client->post('/oauth/token', [
'form_params' => [
'grant_type' => 'client_credentials',
'client_id' => 'catalog108-demo-client',
'client_secret' => 'demo-secret',
'scope' => 'read:products',
],
]);
$data = json_decode($res->getBody()->getContents(), true);
$access = $data['access_token'];
$products = json_decode(
$client->get('/api/products', [
'headers' => ['Authorization' => "Bearer $access"],
])->getBody()->getContents(),
true
);
The other flows (for completeness)
You'll rarely scrape with these, but recognize them:
- Implicit flow, deprecated. Browser-side SPAs used to redirect with
#access_token=...in the URL fragment. Replaced by Authorization Code + PKCE. - Resource Owner Password Credentials, user gives their password to your app, which trades it for tokens. Heavily discouraged, deprecated in OAuth 2.1.
- Device Code, for input-constrained devices (TVs, CLIs). Sometimes useful for scraper auth on headless servers.
If you see PKCE (Proof Key for Code Exchange) in the docs: it's an extension to Authorization Code that adds a code_verifier / code_challenge. Required for public clients (no client_secret). Easy: generate a random verifier, hash it for the challenge, send the challenge in step 1, send the verifier in step 3.
When to use what
| Situation | Flow |
|---|---|
| Scraping your own org's API with a service account | Client Credentials |
| Scraping a user's data with their permission | Authorization Code (+ Refresh) |
| Public client (no client_secret possible) | Authorization Code + PKCE |
| Renewing an expired token | Refresh Token |
| Anything else | Read the docs |
A reusable token manager
Cache the token, refresh proactively:
import time, requests
class OAuthTokenManager:
def __init__(self, token_url, client_id, client_secret, scope=None):
self.token_url, self.client_id, self.client_secret = token_url, client_id, client_secret
self.scope = scope
self.access = None
self.expires_at = 0
def get(self):
if self.access and time.time() < self.expires_at - 30:
return self.access
self._fetch()
return self.access
def _fetch(self):
r = requests.post(self.token_url, data={
"grant_type": "client_credentials",
"client_id": self.client_id,
"client_secret": self.client_secret,
**({"scope": self.scope} if self.scope else {}),
})
r.raise_for_status()
d = r.json()
self.access = d["access_token"]
self.expires_at = time.time() + d.get("expires_in", 3600)
tokens = OAuthTokenManager(
"https://practice.scrapingcentral.com/oauth/token",
"catalog108-demo-client", "demo-secret",
scope="read:products",
)
# Use freely; refresh happens transparently
r = requests.get(
"https://practice.scrapingcentral.com/api/products",
headers={"Authorization": f"Bearer {tokens.get()}"},
)
Hands-on lab
Catalog108's /challenges/api/auth/oauth2 walks both flows. Execute Client Credentials end-to-end: post to /oauth/token, get access_token, use it on /api/products. Then do Authorization Code: visit /oauth/authorize in a browser, grab the code from the redirect, exchange it at /oauth/token. Notice the difference in scopes returned. Wrap Client Credentials in the OAuthTokenManager class so refresh is automatic.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/api/auth/oauth2Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.