OAuth 2.0 Flows for Scrapers, APIs, SERPs & Reverse Engineering

OAuth isn't a single flow, it's a family. Authorization Code, Client Credentials, Refresh. Here's which one applies to your scraper and how to execute it end-to-end.

OAuth 2.0 is everywhere, Google, Spotify, Reddit, partner APIs, internal admin APIs. As a scraper, you'll meet two flows most often:

Client Credentials, machine-to-machine. Your scraper has a client_id + client_secret, swaps them for an access token. No user involved.
Authorization Code, a user grants your app access via a browser redirect. The scraper redeems a code for tokens.

A third, Refresh Token grant, applies to both, for renewing expired access tokens.

This lesson walks all three against Catalog108's /oauth/authorize and /oauth/token.

The Catalog108 OAuth setup

For the lab:

client_id = catalog108-demo-client
client_secret = demo-secret
redirect_uri = https://practice.scrapingcentral.com/challenges/api/auth/oauth2
Authorize URL: /oauth/authorize
Token URL: /oauth/token

Real-world creds are issued by visiting the target's "Developer" or "Apps" page and registering an application.

Flow 1, Client Credentials (machine-to-machine)

Your scraper IS the client. Simplest flow, no user, no browser.

import requests

r = requests.post(
  "https://practice.scrapingcentral.com/oauth/token",
  data={
  "grant_type": "client_credentials",
  "client_id": "catalog108-demo-client",
  "client_secret": "demo-secret",
  "scope": "read:products",
  },
)
r.raise_for_status()
data = r.json()
# {'access_token': '...', 'token_type': 'Bearer', 'expires_in': 3600}
access = data["access_token"]

# Use it on calls
r = requests.get(
  "https://practice.scrapingcentral.com/api/products",
  headers={"Authorization": f"Bearer {access}"},
)

Notice:

POST is form-encoded (data=, not json=). OAuth 2 mandates application/x-www-form-urlencoded.
grant_type=client_credentials tells the server "I'm a machine, here are my keys."
scope is the requested permissions. Servers may reject if you ask for too much.

This is the flow for B2B integrations, Stripe, Twilio, SendGrid all support it.

Flow 2, Authorization Code (user-delegated)

A user has data on the target site. Your scraper wants access. The user must consent via a browser.

The flow:

Send the user to the authorize URL with client_id, redirect_uri, scope, and a random state.
The user logs in, clicks "Allow."
The target redirects to redirect_uri?code=...&state=....
Your scraper POSTs the code to the token endpoint and gets back access_token + refresh_token.

import requests, secrets, webbrowser
from urllib.parse import urlencode

client_id = "catalog108-demo-client"
client_secret = "demo-secret"
redirect_uri = "https://practice.scrapingcentral.com/challenges/api/auth/oauth2"
state = secrets.token_urlsafe(16)

# Step 1: send the user to the authorize URL
authorize_url = "https://practice.scrapingcentral.com/oauth/authorize?" + urlencode({
  "response_type": "code",
  "client_id": client_id,
  "redirect_uri": redirect_uri,
  "scope": "read:profile read:products",
  "state": state,
})
print("Visit this URL in your browser:")
print(authorize_url)
# webbrowser.open(authorize_url)  # uncomment to auto-open

# Step 2: after the user authorizes, copy the `code` from the redirect URL
code = input("Paste the `code` parameter from the redirect: ").strip()

# Step 3: exchange the code for tokens
r = requests.post(
  "https://practice.scrapingcentral.com/oauth/token",
  data={
  "grant_type": "authorization_code",
  "code": code,
  "client_id": client_id,
  "client_secret": client_secret,
  "redirect_uri": redirect_uri,
  },
)
r.raise_for_status()
data = r.json()
print(data)
# {'access_token': '...', 'refresh_token': '...', 'expires_in': 3600...}

The state parameter prevents CSRF on the redirect: the server echoes it back; you check it matches what you sent.

For unattended scrapers, you do the auth dance once manually, save the refresh token, and from then on use the Refresh grant.

Flow 3, Refresh Token grant

When the access token expires:

r = requests.post(
  "https://practice.scrapingcentral.com/oauth/token",
  data={
  "grant_type": "refresh_token",
  "refresh_token": stored_refresh_token,
  "client_id": "catalog108-demo-client",
  "client_secret": "demo-secret",
  },
)
new_tokens = r.json()

Some servers rotate refresh tokens; always store whatever comes back.

PHP version (Client Credentials)

use GuzzleHttp\Client;

$client = new Client(['base_uri' => 'https://practice.scrapingcentral.com']);

$res = $client->post('/oauth/token', [
  'form_params' => [
  'grant_type'  => 'client_credentials',
  'client_id'  => 'catalog108-demo-client',
  'client_secret' => 'demo-secret',
  'scope'  => 'read:products',
  ],
]);
$data = json_decode($res->getBody()->getContents(), true);
$access = $data['access_token'];

$products = json_decode(
  $client->get('/api/products', [
  'headers' => ['Authorization' => "Bearer $access"],
  ])->getBody()->getContents(),
  true
);

The other flows (for completeness)

You'll rarely scrape with these, but recognize them:

Implicit flow, deprecated. Browser-side SPAs used to redirect with #access_token=... in the URL fragment. Replaced by Authorization Code + PKCE.
Resource Owner Password Credentials, user gives their password to your app, which trades it for tokens. Heavily discouraged, deprecated in OAuth 2.1.
Device Code, for input-constrained devices (TVs, CLIs). Sometimes useful for scraper auth on headless servers.

If you see PKCE (Proof Key for Code Exchange) in the docs: it's an extension to Authorization Code that adds a code_verifier / code_challenge. Required for public clients (no client_secret). Easy: generate a random verifier, hash it for the challenge, send the challenge in step 1, send the verifier in step 3.

When to use what

Situation	Flow
Scraping your own org's API with a service account	Client Credentials
Scraping a user's data with their permission	Authorization Code (+ Refresh)
Public client (no client_secret possible)	Authorization Code + PKCE
Renewing an expired token	Refresh Token
Anything else	Read the docs

A reusable token manager

Cache the token, refresh proactively:

import time, requests

class OAuthTokenManager:
  def __init__(self, token_url, client_id, client_secret, scope=None):
  self.token_url, self.client_id, self.client_secret = token_url, client_id, client_secret
  self.scope = scope
  self.access = None
  self.expires_at = 0

  def get(self):
  if self.access and time.time() < self.expires_at - 30:
  return self.access
  self._fetch()
  return self.access

  def _fetch(self):
  r = requests.post(self.token_url, data={
  "grant_type": "client_credentials",
  "client_id": self.client_id,
  "client_secret": self.client_secret,
  **({"scope": self.scope} if self.scope else {}),
  })
  r.raise_for_status()
  d = r.json()
  self.access = d["access_token"]
  self.expires_at = time.time() + d.get("expires_in", 3600)

tokens = OAuthTokenManager(
  "https://practice.scrapingcentral.com/oauth/token",
  "catalog108-demo-client", "demo-secret",
  scope="read:products",
)

# Use freely; refresh happens transparently
r = requests.get(
  "https://practice.scrapingcentral.com/api/products",
  headers={"Authorization": f"Bearer {tokens.get()}"},
)

Hands-on lab

Catalog108's /challenges/api/auth/oauth2 walks both flows. Execute Client Credentials end-to-end: post to /oauth/token, get access_token, use it on /api/products. Then do Authorization Code: visit /oauth/authorize in a browser, grab the code from the redirect, exchange it at /oauth/token. Notice the difference in scopes returned. Wrap Client Credentials in the OAuthTokenManager class so refresh is automatic.

OAuth 2.0 Flows for Scrapers

What you’ll learn

The Catalog108 OAuth setup

Flow 1, Client Credentials (machine-to-machine)

Flow 2, Authorization Code (user-delegated)

Flow 3, Refresh Token grant

PHP version (Client Credentials)

The other flows (for completeness)

When to use what

A reusable token manager

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which OAuth 2.0 flow does NOT involve a user or a browser redirect?