Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Sub-path 4 of 6

APIs, SERPs & Reverse Engineering

Skip the HTML. Hit JSON directly.

The pro path: REST, GraphQL, auth flows (cookie, JWT, OAuth, HMAC), reverse-engineering minified JS, and a complete tour of SERP-scraping APIs. The deepest, highest-leverage sub-path.

~8 weeks part-time · 50 lessons

Lessons

  1. 3.1

    Scrape the Data Source, Not the HTML

    Most modern sites render from JSON. Hitting the API directly is faster, more reliable, and structurally closer to what the site itself sees.

    Lab: /api/products

    beginner
  2. 3.2

    The Three Layers of Modern Web Data (HTML, XHR, Mobile)

    Every site exposes data through up to three layers. Knowing which layer to target, and the trade-offs of each, is the senior scraper's first decision.

    Lab: /products

    beginner
  3. 3.3

    Decision Framework: Browser vs API vs SERP-API

    Three tools, three cost models, three failure modes. Picking the wrong one is the single most expensive mistake in scraping. Here's the framework.

    beginner
  4. 3.4

    Network Tab Deep Dive, Every Filter and Why

    DevTools' Network panel is the scraper's microscope. Every filter, every column, every right-click action matters. Here's all of them, in priority order.

    Lab: /products/1-white-wooden-vase

    beginner
  5. 3.5

    Identifying the "Main" Data Endpoint

    A page makes 30 requests. Three contain the data you want. Here's how to spot them in seconds, not minutes.

    Lab: /search?q=mug

    beginner
  6. 3.6

    Copy as cURL → Working Python Request

    Take a captured browser request and translate it to clean Python in under 60 seconds. The single most-used micro-skill of API scraping.

    Lab: /api/products

    beginner
  7. 3.7

    Copy as cURL → Working PHP Request (Guzzle)

    Same captured curl, translated to idiomatic PHP with Guzzle. The minimum-viable client a senior PHP scraper writes.

    Lab: /api/products

    beginner
  8. 3.8

    Required vs Optional Headers, Minimum Viable Request

    A captured curl has 15 headers. Production code needs three. How to find and keep only what's load-bearing.

    Lab: /api/products

    beginner
  9. 3.9

    Building a Clean Python API Client (Class Design)

    Stop writing inline requests calls. Wrap a target in a class, base URL, session, auth, retries, typed methods. The shape every senior Python scraper uses.

    Lab: /api/products

    intermediate
  10. 3.10

    Retry, Backoff, and Rate Limit Handling

    Production scrapers fail. The right retry policy and the right respect for 429/Retry-After is what separates a scraper that runs for months from one that runs for an afternoon.

    Lab: /challenges/api/rest/rate-limited

    intermediate
  11. 3.11

    Async API Consumption with `httpx`

    Synchronous scraping leaves performance on the table. `httpx.AsyncClient` plus a semaphore plus `asyncio.gather` is the modern Python pattern.

    Lab: /api/products

    intermediate
  12. 3.12

    Building a Clean PHP API Client (PSR-18, Guzzle)

    The PHP version of the senior client pattern. Class-based, base URI, middleware, JSON helpers, and PSR-18 compatible so you can swap the transport later.

    Lab: /api/products

    intermediate
  13. 3.13

    Symfony HttpClient for API Consumption

    Guzzle isn't the only first-class PHP option. Symfony HttpClient is async-capable, PSR-18-compatible, and ships with Symfony out of the box.

    Lab: /api/products

    intermediate
  14. 3.14

    Building a Reusable PHP SDK, Composer Package Structure

    Turn your one-off scraper into a publishable SDK: composer.json, namespace, autoload, version constraints, tests. The full package skeleton.

    Lab: /api/products

    intermediate
  15. 3.15

    Publishing Your SDK to Packagist

    From local Composer package to public install in five steps. GitHub, Packagist, webhooks, and the credibility your README needs.

    intermediate
  16. 3.16

    Cookie-Based Session Replication

    The oldest scraper auth pattern: log in, capture the session cookie, replay it. Still the most common in 2026, and full of subtle traps.

    Lab: /account/orders

    intermediate
  17. 3.17

    JWT Tokens: Structure, Capture, Refresh

    The modern API-auth standard. Three dot-separated base64 chunks, an access token, a refresh token, a 15-minute expiry. Here's how scrapers handle all of it.

    Lab: /challenges/api/auth/jwt-with-refresh

    intermediate
  18. 3.18

    OAuth 2.0 Flows for Scrapers

    OAuth isn't a single flow, it's a family. Authorization Code, Client Credentials, Refresh. Here's which one applies to your scraper and how to execute it end-to-end.

    Lab: /challenges/api/auth/oauth2

    intermediate
  19. 3.19

    CSRF Tokens, Capturing Dynamically

    POST endpoints often require a one-time CSRF token. Static capture breaks immediately; you must fetch the token, then use it, in a single flow.

    Lab: /challenges/api/auth/csrf-form

    intermediate
  20. 3.20

    API Keys Hidden in JS Bundles

    Many sites embed API keys in their minified JavaScript. Find them with the right grep, the right DevTools workflow, and the right respect for what you're allowed to use them for.

    Lab: /challenges/api/auth/api-key-in-js

    advanced
  21. 3.21

    Signed Requests: Reverse-Engineering HMAC

    Each request carries a signature computed from its body and a secret. Replay attacks impossible; scraping possible, but only if you can read the signing algorithm.

    Lab: /challenges/api/auth/hmac-signed

    advanced
  22. 3.22

    What Is a SERP? Anatomy of a Modern Results Page

    A modern Google results page isn't 10 blue links, it's a dozen feature blocks, each with its own data shape. Here's the map.

    Lab: /search?q=phone

    beginner
  23. 3.23

    Organic Results vs Paid Ads vs Shopping Results

    Three result types share the SERP. Each is a different data shape, a different scrape target, and a different question your scraper answers.

    Lab: /search?q=phone

    beginner
  24. 3.24

    Knowledge Graph, Featured Snippets, People Also Ask

    Three high-value SERP blocks that answer the query directly. Each has its own data shape and scraping pattern.

    Lab: /search?q=catalog108

    intermediate
  25. 3.25

    Local Pack, Map Results, and the Local SEO World

    Three businesses with stars and a map. Behind it: a separate Google Maps API, GMB profiles, and a whole industry.

    Lab: /search?q=stores+near+me

    intermediate
  26. 3.26

    Image Pack, Video Carousel, Top Stories

    Three media-shaped SERP blocks. Each has its own data shape and its own scraping mechanics, distinct from text results.

    Lab: /search?q=phone&tab=images

    intermediate
  27. 3.27

    AI Overviews and AI Mode, The 2025–2026 Shift

    Generative AI is now embedded in the SERP. The block at the top changes click-through, citation flow, and what 'rank 1' even means.

    intermediate
  28. 3.28

    Mobile vs Desktop SERPs

    Same query, two different SERPs. Mobile has different blocks, different order, sometimes different organic results entirely.

    Lab: /search

    intermediate
  29. 3.29

    Location & Language Targeting (`gl`, `hl`, `location` Parameters)

    The single most-misunderstood parameter trio in SERP scraping. Get them right, get accurate geo-localized data.

    Lab: /search?gl=in&hl=en

    intermediate
  30. 3.30

    Why Scraping SERPs Directly Is Hard

    Captchas, IP bans, randomized markup, geo-IP mismatches, and an arms race that goes back two decades. Here's why nobody serious scrapes Google directly anymore.

    intermediate
  31. 3.31

    The SERP API Category, Why It Exists, Who Uses It

    A whole industry built around 'we scrape Google so you don't have to.' How the category emerged, who its customers are, and what they're paying for.

    beginner
  32. 3.32

    Comparing Major Providers: SerpApi, Bright Data, ScraperAPI, Zyte, ScrapingBee, ScrapingAnt, ZenRows

    Seven of the biggest SERP-API and scraping-API providers. Their positioning, coverage, and trade-offs, without endorsement.

    intermediate
  33. 3.33

    Evaluation Framework: Coverage, Reliability, Price, Latency

    Six dimensions to score any SERP-API on. Run a real test against each provider, then decide.

    intermediate
  34. 3.34

    Hands-On: SERP API in Python, A Complete Walkthrough

    Sign up, get a key, run your first query, parse the JSON, persist to a database. A complete production-shaped Python walkthrough.

    intermediate
  35. 3.35

    Hands-On: SERP API in PHP, Composer, SDK, Real Queries

    The PHP version of the SERP-API walkthrough. Composer, Guzzle, dotenv, SQLite, real queries, same shape, PHP idioms.

    intermediate
  36. 3.36

    Beyond Google: Bing, DuckDuckGo, Yandex, Baidu, Naver, Brave APIs

    Six non-Google engines that matter, for regional reach, AI training data, and audiences Google doesn't serve well.

    intermediate
  37. 3.37

    Beyond Search Engines: Amazon, Walmart, App Store, eBay, YouTube, Tripadvisor, Yelp APIs

    SERP-API providers also scrape major marketplaces and platforms. Same provider, different engine parameter, and a different data game.

    intermediate
  38. 3.38

    SERP-API-Specific Features: Async Searches, Search Archives, Location Lookups

    Beyond the basic search call, providers ship features that change what's possible. Async batching, history archives, location helpers, and more.

    intermediate
  39. 3.39

    Cost Optimization: Caching, Result Reuse, Selective Field Extraction

    At $1–$5 per 1k calls, every redundant search is real money. The four mechanical patterns that cut a SERP-API bill in half.

    intermediate
  40. 3.40

    Building Your Own Thin Wrapper Around a SERP API

    Don't ship provider-specific calls across your codebase. A 100-line wrapper isolates provider quirks, makes switching trivial, and adds caching/retries in one place.

    intermediate
  41. 3.41

    GraphQL Scraping: Queries and Endpoints

    GraphQL is a single POST endpoint, a typed schema, and a query language. Different from REST in every direction, and increasingly common.

    Lab: /challenges/api/graphql/playground

    intermediate
  42. 3.42

    Persisted Queries, The Modern GraphQL Trap

    Production GraphQL doesn't accept arbitrary queries, only known hashes. Scrapers must extract the precomputed hashes from the bundle or capture them live.

    Lab: /challenges/api/graphql/persisted

    advanced
  43. 3.43

    WebSocket Scraping for Real-Time Data

    When the data updates faster than HTTP polling can keep up. WebSockets are bidirectional, persistent, and surprisingly easy to scrape.

    Lab: /challenges/api/websocket/live-prices

    advanced
  44. 3.44

    Socket.IO and SignalR Protocols

    Not all real-time is raw WebSockets. Socket.IO and SignalR are higher-level protocols with their own handshakes, message framing, and quirks.

    Lab: /challenges/api/websocket/socketio

    advanced
  45. 3.45

    Reading Minified JavaScript Like a Detective

    Webpack output looks like noise. Read it like code anyway, pretty-print, source maps, search patterns, and the workflow that turns 50,000-char one-liners into discoverable systems.

    Lab: /challenges/api/auth/api-key-in-js

    advanced
  46. 3.46

    DevTools Breakpoints for Capturing Runtime Values

    When static analysis fails, breakpoints win. Five breakpoint types, when to use each, and the disciplined workflow of pause-inspect-step.

    Lab: /challenges/api/auth/hmac-signed

    advanced
  47. 3.47

    Mobile App API Capture with mitmproxy

    Phones talk to APIs the web doesn't expose. mitmproxy intercepts the traffic; with a CA cert on the device, you see everything decrypted.

    advanced
  48. 3.48

    SSL Pinning, Concepts and Approaches

    When the app hardcodes its server's certificate, mitmproxy alone isn't enough. The bypass landscape: Frida, objection, custom builds, and emulator-only realities.

    expert
  49. 3.49

    TLS Fingerprinting (JA3/JA4), `curl-cffi` and `tls-client`

    Anti-bot systems profile your TLS handshake. Python's `requests` looks nothing like Chrome. Two libraries fix this, at the price of a different dependency.

    Lab: /challenges/antibot/tls-fingerprint

    expert
  50. 3.50

    HTTP/2 Fingerprinting Evasion

    Beyond TLS, the HTTP/2 layer ALSO fingerprints. Settings frames, window sizes, header pseudo-order, priorities, all distinguishable. Here's the modern arms race.

    Lab: /challenges/antibot/tls-fingerprint

    expert

Every lesson has a hands-on lab target on Catalog108 , our first-party practice scraping sandbox. Each lab page has a /grade endpoint that returns pass/fail on your scraper output.