Sub-path 4 of 6

APIs, SERPs & Reverse Engineering

Skip the HTML. Hit JSON directly.

The pro path: REST, GraphQL, auth flows (cookie, JWT, OAuth, HMAC), reverse-engineering minified JS, and a complete tour of SERP-scraping APIs. The deepest, highest-leverage sub-path.

~8 weeks part-time · 50 lessons

Lessons

3.1
Scrape the Data Source, Not the HTML
Most modern sites render from JSON. Hitting the API directly is faster, more reliable, and structurally closer to what the site itself sees.
Lab: /api/products
beginner
3.2
The Three Layers of Modern Web Data (HTML, XHR, Mobile)
Every site exposes data through up to three layers. Knowing which layer to target, and the trade-offs of each, is the senior scraper's first decision.
Lab: /products
beginner
3.3
Decision Framework: Browser vs API vs SERP-API
Three tools, three cost models, three failure modes. Picking the wrong one is the single most expensive mistake in scraping. Here's the framework.
beginner
3.4
Network Tab Deep Dive, Every Filter and Why
DevTools' Network panel is the scraper's microscope. Every filter, every column, every right-click action matters. Here's all of them, in priority order.
Lab: /products/1-white-wooden-vase
beginner
3.5
Identifying the "Main" Data Endpoint
A page makes 30 requests. Three contain the data you want. Here's how to spot them in seconds, not minutes.
Lab: /search?q=mug
beginner
3.6
Copy as cURL → Working Python Request
Take a captured browser request and translate it to clean Python in under 60 seconds. The single most-used micro-skill of API scraping.
Lab: /api/products
beginner
3.7
Copy as cURL → Working PHP Request (Guzzle)
Same captured curl, translated to idiomatic PHP with Guzzle. The minimum-viable client a senior PHP scraper writes.
Lab: /api/products
beginner
3.8
Required vs Optional Headers, Minimum Viable Request
A captured curl has 15 headers. Production code needs three. How to find and keep only what's load-bearing.
Lab: /api/products
beginner
3.9
Building a Clean Python API Client (Class Design)
Stop writing inline requests calls. Wrap a target in a class, base URL, session, auth, retries, typed methods. The shape every senior Python scraper uses.
Lab: /api/products
intermediate
3.10
Retry, Backoff, and Rate Limit Handling
Production scrapers fail. The right retry policy and the right respect for 429/Retry-After is what separates a scraper that runs for months from one that runs for an afternoon.
Lab: /challenges/api/rest/rate-limited
intermediate
3.11
Async API Consumption with `httpx`
Synchronous scraping leaves performance on the table. `httpx.AsyncClient` plus a semaphore plus `asyncio.gather` is the modern Python pattern.
Lab: /api/products
intermediate
3.12
Building a Clean PHP API Client (PSR-18, Guzzle)
The PHP version of the senior client pattern. Class-based, base URI, middleware, JSON helpers, and PSR-18 compatible so you can swap the transport later.
Lab: /api/products
intermediate
3.13
Symfony HttpClient for API Consumption
Guzzle isn't the only first-class PHP option. Symfony HttpClient is async-capable, PSR-18-compatible, and ships with Symfony out of the box.
Lab: /api/products
intermediate
3.14
Building a Reusable PHP SDK, Composer Package Structure
Turn your one-off scraper into a publishable SDK: composer.json, namespace, autoload, version constraints, tests. The full package skeleton.
Lab: /api/products
intermediate
3.15
Publishing Your SDK to Packagist
From local Composer package to public install in five steps. GitHub, Packagist, webhooks, and the credibility your README needs.
intermediate
3.16
Cookie-Based Session Replication
The oldest scraper auth pattern: log in, capture the session cookie, replay it. Still the most common in 2026, and full of subtle traps.
Lab: /account/orders
intermediate
3.17
JWT Tokens: Structure, Capture, Refresh
The modern API-auth standard. Three dot-separated base64 chunks, an access token, a refresh token, a 15-minute expiry. Here's how scrapers handle all of it.
Lab: /challenges/api/auth/jwt-with-refresh
intermediate
3.18
OAuth 2.0 Flows for Scrapers
OAuth isn't a single flow, it's a family. Authorization Code, Client Credentials, Refresh. Here's which one applies to your scraper and how to execute it end-to-end.
Lab: /challenges/api/auth/oauth2
intermediate
3.19
CSRF Tokens, Capturing Dynamically
POST endpoints often require a one-time CSRF token. Static capture breaks immediately; you must fetch the token, then use it, in a single flow.
Lab: /challenges/api/auth/csrf-form
intermediate
3.20
API Keys Hidden in JS Bundles
Many sites embed API keys in their minified JavaScript. Find them with the right grep, the right DevTools workflow, and the right respect for what you're allowed to use them for.
Lab: /challenges/api/auth/api-key-in-js
advanced
3.21
Signed Requests: Reverse-Engineering HMAC
Each request carries a signature computed from its body and a secret. Replay attacks impossible; scraping possible, but only if you can read the signing algorithm.
Lab: /challenges/api/auth/hmac-signed
advanced
3.22
What Is a SERP? Anatomy of a Modern Results Page
A modern Google results page isn't 10 blue links, it's a dozen feature blocks, each with its own data shape. Here's the map.
Lab: /search?q=phone
beginner
3.23
Organic Results vs Paid Ads vs Shopping Results
Three result types share the SERP. Each is a different data shape, a different scrape target, and a different question your scraper answers.
Lab: /search?q=phone
beginner
3.24
Knowledge Graph, Featured Snippets, People Also Ask
Three high-value SERP blocks that answer the query directly. Each has its own data shape and scraping pattern.
Lab: /search?q=catalog108
intermediate
3.25
Local Pack, Map Results, and the Local SEO World
Three businesses with stars and a map. Behind it: a separate Google Maps API, GMB profiles, and a whole industry.
Lab: /search?q=stores+near+me
intermediate
3.26
Image Pack, Video Carousel, Top Stories
Three media-shaped SERP blocks. Each has its own data shape and its own scraping mechanics, distinct from text results.
Lab: /search?q=phone&tab=images
intermediate
3.27
AI Overviews and AI Mode, The 2025–2026 Shift
Generative AI is now embedded in the SERP. The block at the top changes click-through, citation flow, and what 'rank 1' even means.
intermediate
3.28
Mobile vs Desktop SERPs
Same query, two different SERPs. Mobile has different blocks, different order, sometimes different organic results entirely.
Lab: /search
intermediate
3.29
Location & Language Targeting (`gl`, `hl`, `location` Parameters)
The single most-misunderstood parameter trio in SERP scraping. Get them right, get accurate geo-localized data.
Lab: /search?gl=in&hl=en
intermediate
3.30
Why Scraping SERPs Directly Is Hard
Captchas, IP bans, randomized markup, geo-IP mismatches, and an arms race that goes back two decades. Here's why nobody serious scrapes Google directly anymore.
intermediate
3.31
The SERP API Category, Why It Exists, Who Uses It
A whole industry built around 'we scrape Google so you don't have to.' How the category emerged, who its customers are, and what they're paying for.
beginner
3.32
Comparing Major Providers: SerpApi, Bright Data, ScraperAPI, Zyte, ScrapingBee, ScrapingAnt, ZenRows
Seven of the biggest SERP-API and scraping-API providers. Their positioning, coverage, and trade-offs, without endorsement.
intermediate
3.33
Evaluation Framework: Coverage, Reliability, Price, Latency
Six dimensions to score any SERP-API on. Run a real test against each provider, then decide.
intermediate
3.34
Hands-On: SERP API in Python, A Complete Walkthrough
Sign up, get a key, run your first query, parse the JSON, persist to a database. A complete production-shaped Python walkthrough.
intermediate
3.35
Hands-On: SERP API in PHP, Composer, SDK, Real Queries
The PHP version of the SERP-API walkthrough. Composer, Guzzle, dotenv, SQLite, real queries, same shape, PHP idioms.
intermediate
3.36
Beyond Google: Bing, DuckDuckGo, Yandex, Baidu, Naver, Brave APIs
Six non-Google engines that matter, for regional reach, AI training data, and audiences Google doesn't serve well.
intermediate
3.37
Beyond Search Engines: Amazon, Walmart, App Store, eBay, YouTube, Tripadvisor, Yelp APIs
SERP-API providers also scrape major marketplaces and platforms. Same provider, different engine parameter, and a different data game.
intermediate
3.38
SERP-API-Specific Features: Async Searches, Search Archives, Location Lookups
Beyond the basic search call, providers ship features that change what's possible. Async batching, history archives, location helpers, and more.
intermediate
3.39
Cost Optimization: Caching, Result Reuse, Selective Field Extraction
At $1–$5 per 1k calls, every redundant search is real money. The four mechanical patterns that cut a SERP-API bill in half.
intermediate
3.40
Building Your Own Thin Wrapper Around a SERP API
Don't ship provider-specific calls across your codebase. A 100-line wrapper isolates provider quirks, makes switching trivial, and adds caching/retries in one place.
intermediate
3.41
GraphQL Scraping: Queries and Endpoints
GraphQL is a single POST endpoint, a typed schema, and a query language. Different from REST in every direction, and increasingly common.
Lab: /challenges/api/graphql/playground
intermediate
3.42
Persisted Queries, The Modern GraphQL Trap
Production GraphQL doesn't accept arbitrary queries, only known hashes. Scrapers must extract the precomputed hashes from the bundle or capture them live.
Lab: /challenges/api/graphql/persisted
advanced
3.43
WebSocket Scraping for Real-Time Data
When the data updates faster than HTTP polling can keep up. WebSockets are bidirectional, persistent, and surprisingly easy to scrape.
Lab: /challenges/api/websocket/live-prices
advanced
3.44
Socket.IO and SignalR Protocols
Not all real-time is raw WebSockets. Socket.IO and SignalR are higher-level protocols with their own handshakes, message framing, and quirks.
Lab: /challenges/api/websocket/socketio
advanced
3.45
Reading Minified JavaScript Like a Detective
Webpack output looks like noise. Read it like code anyway, pretty-print, source maps, search patterns, and the workflow that turns 50,000-char one-liners into discoverable systems.
Lab: /challenges/api/auth/api-key-in-js
advanced
3.46
DevTools Breakpoints for Capturing Runtime Values
When static analysis fails, breakpoints win. Five breakpoint types, when to use each, and the disciplined workflow of pause-inspect-step.
Lab: /challenges/api/auth/hmac-signed
advanced
3.47
Mobile App API Capture with mitmproxy
Phones talk to APIs the web doesn't expose. mitmproxy intercepts the traffic; with a CA cert on the device, you see everything decrypted.
advanced
3.48
SSL Pinning, Concepts and Approaches
When the app hardcodes its server's certificate, mitmproxy alone isn't enough. The bypass landscape: Frida, objection, custom builds, and emulator-only realities.
expert
3.49
TLS Fingerprinting (JA3/JA4), `curl-cffi` and `tls-client`
Anti-bot systems profile your TLS handshake. Python's `requests` looks nothing like Chrome. Two libraries fix this, at the price of a different dependency.
Lab: /challenges/antibot/tls-fingerprint
expert
3.50
HTTP/2 Fingerprinting Evasion
Beyond TLS, the HTTP/2 layer ALSO fingerprints. Settings frames, window sizes, header pseudo-order, priorities, all distinguishable. Here's the modern arms race.
Lab: /challenges/antibot/tls-fingerprint
expert

Every lesson has a hands-on lab target on Catalog108 , our first-party practice scraping sandbox. Each lab page has a /grade endpoint that returns pass/fail on your scraper output.

APIs, SERPs & Reverse Engineering

Lessons

Scrape the Data Source, Not the HTML

The Three Layers of Modern Web Data (HTML, XHR, Mobile)

Decision Framework: Browser vs API vs SERP-API

Network Tab Deep Dive, Every Filter and Why

Identifying the "Main" Data Endpoint

Copy as cURL → Working Python Request

Copy as cURL → Working PHP Request (Guzzle)

Required vs Optional Headers, Minimum Viable Request

Building a Clean Python API Client (Class Design)

Retry, Backoff, and Rate Limit Handling

Async API Consumption with `httpx`

Building a Clean PHP API Client (PSR-18, Guzzle)

Symfony HttpClient for API Consumption

Building a Reusable PHP SDK, Composer Package Structure

Publishing Your SDK to Packagist

Cookie-Based Session Replication

JWT Tokens: Structure, Capture, Refresh

OAuth 2.0 Flows for Scrapers

CSRF Tokens, Capturing Dynamically

API Keys Hidden in JS Bundles

Signed Requests: Reverse-Engineering HMAC

What Is a SERP? Anatomy of a Modern Results Page

Organic Results vs Paid Ads vs Shopping Results

Knowledge Graph, Featured Snippets, People Also Ask

Local Pack, Map Results, and the Local SEO World

Image Pack, Video Carousel, Top Stories

AI Overviews and AI Mode, The 2025–2026 Shift

Mobile vs Desktop SERPs

Location & Language Targeting (`gl`, `hl`, `location` Parameters)

Why Scraping SERPs Directly Is Hard

The SERP API Category, Why It Exists, Who Uses It

Comparing Major Providers: SerpApi, Bright Data, ScraperAPI, Zyte, ScrapingBee, ScrapingAnt, ZenRows

Evaluation Framework: Coverage, Reliability, Price, Latency

Hands-On: SERP API in Python, A Complete Walkthrough

Hands-On: SERP API in PHP, Composer, SDK, Real Queries

Beyond Google: Bing, DuckDuckGo, Yandex, Baidu, Naver, Brave APIs

Beyond Search Engines: Amazon, Walmart, App Store, eBay, YouTube, Tripadvisor, Yelp APIs

SERP-API-Specific Features: Async Searches, Search Archives, Location Lookups

Cost Optimization: Caching, Result Reuse, Selective Field Extraction

Building Your Own Thin Wrapper Around a SERP API

GraphQL Scraping: Queries and Endpoints

Persisted Queries, The Modern GraphQL Trap

WebSocket Scraping for Real-Time Data

Socket.IO and SignalR Protocols

Reading Minified JavaScript Like a Detective

DevTools Breakpoints for Capturing Runtime Values

Mobile App API Capture with mitmproxy

SSL Pinning, Concepts and Approaches

TLS Fingerprinting (JA3/JA4), `curl-cffi` and `tls-client`

HTTP/2 Fingerprinting Evasion