Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

3.9intermediate4 min read

Building a Clean Python API Client (Class Design)

Stop writing inline requests calls. Wrap a target in a class, base URL, session, auth, retries, typed methods. The shape every senior Python scraper uses.

What you’ll learn

  • Structure an API client as a small class with a Session, base URL, and typed methods.
  • Centralise auth and headers so scraping code stays readable.
  • Add timeouts, JSON shortcuts, and basic error handling.
  • Recognise when to escalate to httpx, async, or a real SDK.

Inline requests.get(...) calls are fine for a one-off. The moment you have more than two endpoints, you want a client class. It centralizes the base URL, the session, the auth, the headers, and the error handling, so your scraping logic stays focused on data.

The shape

from __future__ import annotations
import requests
from typing import Any

class Catalog108Client:
  BASE_URL = "https://practice.scrapingcentral.com"

  def __init__(self, timeout: float = 10.0):
  self.session = requests.Session()
  self.session.headers.update({"Accept": "application/json"})
  self.timeout = timeout
  self.token: str | None = None

  def _request(self, method: str, path: str, **kwargs) -> Any:
  url = f"{self.BASE_URL}{path}"
  kwargs.setdefault("timeout", self.timeout)
  if self.token:
  kwargs.setdefault("headers", {})
  kwargs["headers"].setdefault("Authorization", f"Bearer {self.token}")
  r = self.session.request(method, url, **kwargs)
  r.raise_for_status()
  return r.json() if r.content else None

  def login(self, email: str, password: str) -> None:
  data = self._request("POST", "/api/auth/login",
  json={"email": email, "password": password})
  self.token = data["access_token"]

  def me(self) -> dict:
  return self._request("GET", "/api/auth/me")

  def products(self, page: int = 1, per_page: int = 12, category: str | None = None) -> dict:
  params = {"page": page, "per_page": per_page}
  if category:
  params["category"] = category
  return self._request("GET", "/api/products", params=params)

  def product(self, product_id: int) -> dict:
  return self._request("GET", f"/api/products/{product_id}")

  def reviews(self, product_id: int) -> list[dict]:
  return self._request("GET", f"/api/products/{product_id}/reviews")

Why each piece is there

  • BASE_URL as a class constant, change once, the whole client moves.
  • requests.Session(), connection reuse (TCP keepalive) and cookie persistence. 5–10x faster than calling requests.get repeatedly.
  • session.headers.update(...), defaults that apply to every call. No more remembering to pass Accept.
  • self.token, stored after login, automatically added to every authenticated call via _request.
  • _request(method, path, **kwargs), single choke point for cross-cutting concerns (timeout, auth header, error handling). Easy to add retries here later (lesson 3.10).
  • r.raise_for_status(), turns 4xx/5xx into exceptions. Your calling code stops checking status codes.
  • Typed methods (products, product, reviews), readable, discoverable, IDE-completable.

Usage

client = Catalog108Client()
client.login("student@practice.scrapingcentral.com", "practice123")
print(client.me())

page1 = client.products(page=1, per_page=50)
for p in page1["products"]:
  print(p["id"], p["name"], p["price"])

reviews = client.reviews(product_id=1)
for r in reviews:
  print(r["rating"], r["author"], r["text"])

Read that out loud. It's almost prose. Compare to inline:

# old style: brittle, repetitive
r = requests.post("https://practice.scrapingcentral.com/api/auth/login",
  json={"email": "...", "password": "..."})
token = r.json()["access_token"]
r = requests.get("https://practice.scrapingcentral.com/api/auth/me",
  headers={"Authorization": f"Bearer {token}"})
me = r.json()
# ... 50 more lines, all repeating the URL and the header

Adding retries, the next layer

The _request choke point makes retries trivial:

from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class Catalog108Client:
  def __init__(self, timeout: float = 10.0):
  self.session = requests.Session()
  retry = Retry(
  total=5,
  backoff_factor=1.0,  # 1s, 2s, 4s, 8s, 16s
  status_forcelist=[429, 500, 502, 503, 504],
  allowed_methods=["GET", "POST"],
  )
  adapter = HTTPAdapter(max_retries=retry)
  self.session.mount("https://", adapter)
  self.session.mount("http://", adapter)
  self.session.headers.update({"Accept": "application/json"})
  self.timeout = timeout
  self.token = None
  # _request and methods unchanged

Now the client retries 5xx and 429 responses with exponential backoff automatically. Lesson 3.10 dives deeper.

Iterators for pagination

A nice ergonomic touch, yield pages, let callers loop without bookkeeping:

def iter_products(self, per_page: int = 50, category: str | None = None):
  page = 1
  while True:
  data = self.products(page=page, per_page=per_page, category=category)
  for p in data["products"]:
  yield p
  if page * per_page >= data["pagination"]["total"]:
  break
  page += 1

# Usage
for product in client.iter_products(per_page=50, category="mugs"):
  print(product["name"])

The caller writes one for loop. The client does pagination.

Error handling

Three error tiers worth raising explicitly:

class APIError(Exception):
  pass

class AuthError(APIError):
  pass

class RateLimited(APIError):
  pass

def _request(self, method, path, **kwargs):
  # ... same setup ...
  r = self.session.request(method, url, **kwargs)
  if r.status_code == 401:
  raise AuthError(f"401 at {path}; token may be expired")
  if r.status_code == 429:
  raise RateLimited(f"429 at {path}; retry-after={r.headers.get('Retry-After')}")
  if not r.ok:
  raise APIError(f"{r.status_code} at {path}: {r.text[:200]}")
  return r.json() if r.content else None

Now callers try: ... except AuthError: client.login(...) and the recovery logic is clear.

When to scale up

The hand-rolled class is right for: one target, one or two scrapers, less than 50 endpoints.

Outgrow it when:

  • Async needed → switch to httpx.AsyncClient (lesson 3.11).
  • Many endpoints (50+) → OpenAPI codegen (openapi-python-client).
  • Distributable package → wrap in a real SDK (the lesson-3.14/3.15 pattern, in PHP, but the Python equivalents are mature).

Hands-on lab

Build the Catalog108Client above end-to-end. Add a search(query: str) method that hits /api/products?search=.... Add error handling that distinguishes 401 from 429 from other failures. Drive your existing scrapers through this client instead of inline requests.get. You should immediately notice your scraping code shrinks by half and reads twice as well.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /api/products

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Building a Clean Python API Client (Class Design)1 / 8

What is the primary advantage of using `requests.Session()` instead of bare `requests.get()` for repeated calls to the same host?

Score so far: 0 / 0