Sub-path 4 of 6
APIs, SERPs & Reverse Engineering
Skip the HTML. Hit JSON directly.
The pro path: REST, GraphQL, auth flows (cookie, JWT, OAuth, HMAC), reverse-engineering minified JS, and a complete tour of SERP-scraping APIs. The deepest, highest-leverage sub-path.
~8 weeks part-time · 50 lessons
Lessons
- 3.1beginner
Scrape the Data Source, Not the HTML
Most modern sites render from JSON. Hitting the API directly is faster, more reliable, and structurally closer to what the site itself sees.
Lab:
/api/products - 3.2beginner
The Three Layers of Modern Web Data (HTML, XHR, Mobile)
Every site exposes data through up to three layers. Knowing which layer to target, and the trade-offs of each, is the senior scraper's first decision.
Lab:
/products - 3.3beginner
Decision Framework: Browser vs API vs SERP-API
Three tools, three cost models, three failure modes. Picking the wrong one is the single most expensive mistake in scraping. Here's the framework.
- 3.4beginner
Network Tab Deep Dive, Every Filter and Why
DevTools' Network panel is the scraper's microscope. Every filter, every column, every right-click action matters. Here's all of them, in priority order.
Lab:
/products/1-white-wooden-vase - 3.5beginner
Identifying the "Main" Data Endpoint
A page makes 30 requests. Three contain the data you want. Here's how to spot them in seconds, not minutes.
Lab:
/search?q=mug - 3.6beginner
Copy as cURL → Working Python Request
Take a captured browser request and translate it to clean Python in under 60 seconds. The single most-used micro-skill of API scraping.
Lab:
/api/products - 3.7beginner
Copy as cURL → Working PHP Request (Guzzle)
Same captured curl, translated to idiomatic PHP with Guzzle. The minimum-viable client a senior PHP scraper writes.
Lab:
/api/products - 3.8beginner
Required vs Optional Headers, Minimum Viable Request
A captured curl has 15 headers. Production code needs three. How to find and keep only what's load-bearing.
Lab:
/api/products - 3.9intermediate
Building a Clean Python API Client (Class Design)
Stop writing inline requests calls. Wrap a target in a class, base URL, session, auth, retries, typed methods. The shape every senior Python scraper uses.
Lab:
/api/products - 3.10intermediate
Retry, Backoff, and Rate Limit Handling
Production scrapers fail. The right retry policy and the right respect for 429/Retry-After is what separates a scraper that runs for months from one that runs for an afternoon.
Lab:
/challenges/api/rest/rate-limited - 3.11intermediate
Async API Consumption with `httpx`
Synchronous scraping leaves performance on the table. `httpx.AsyncClient` plus a semaphore plus `asyncio.gather` is the modern Python pattern.
Lab:
/api/products - 3.12intermediate
Building a Clean PHP API Client (PSR-18, Guzzle)
The PHP version of the senior client pattern. Class-based, base URI, middleware, JSON helpers, and PSR-18 compatible so you can swap the transport later.
Lab:
/api/products - 3.13intermediate
Symfony HttpClient for API Consumption
Guzzle isn't the only first-class PHP option. Symfony HttpClient is async-capable, PSR-18-compatible, and ships with Symfony out of the box.
Lab:
/api/products - 3.14intermediate
Building a Reusable PHP SDK, Composer Package Structure
Turn your one-off scraper into a publishable SDK: composer.json, namespace, autoload, version constraints, tests. The full package skeleton.
Lab:
/api/products - 3.15intermediate
Publishing Your SDK to Packagist
From local Composer package to public install in five steps. GitHub, Packagist, webhooks, and the credibility your README needs.
- 3.16intermediate
Cookie-Based Session Replication
The oldest scraper auth pattern: log in, capture the session cookie, replay it. Still the most common in 2026, and full of subtle traps.
Lab:
/account/orders - 3.17intermediate
JWT Tokens: Structure, Capture, Refresh
The modern API-auth standard. Three dot-separated base64 chunks, an access token, a refresh token, a 15-minute expiry. Here's how scrapers handle all of it.
Lab:
/challenges/api/auth/jwt-with-refresh - 3.18intermediate
OAuth 2.0 Flows for Scrapers
OAuth isn't a single flow, it's a family. Authorization Code, Client Credentials, Refresh. Here's which one applies to your scraper and how to execute it end-to-end.
Lab:
/challenges/api/auth/oauth2 - 3.19intermediate
CSRF Tokens, Capturing Dynamically
POST endpoints often require a one-time CSRF token. Static capture breaks immediately; you must fetch the token, then use it, in a single flow.
Lab:
/challenges/api/auth/csrf-form - 3.20advanced
API Keys Hidden in JS Bundles
Many sites embed API keys in their minified JavaScript. Find them with the right grep, the right DevTools workflow, and the right respect for what you're allowed to use them for.
Lab:
/challenges/api/auth/api-key-in-js - 3.21advanced
Signed Requests: Reverse-Engineering HMAC
Each request carries a signature computed from its body and a secret. Replay attacks impossible; scraping possible, but only if you can read the signing algorithm.
Lab:
/challenges/api/auth/hmac-signed - 3.22beginner
What Is a SERP? Anatomy of a Modern Results Page
A modern Google results page isn't 10 blue links, it's a dozen feature blocks, each with its own data shape. Here's the map.
Lab:
/search?q=phone - 3.23beginner
Organic Results vs Paid Ads vs Shopping Results
Three result types share the SERP. Each is a different data shape, a different scrape target, and a different question your scraper answers.
Lab:
/search?q=phone - 3.24intermediate
Knowledge Graph, Featured Snippets, People Also Ask
Three high-value SERP blocks that answer the query directly. Each has its own data shape and scraping pattern.
Lab:
/search?q=catalog108 - 3.25intermediate
Local Pack, Map Results, and the Local SEO World
Three businesses with stars and a map. Behind it: a separate Google Maps API, GMB profiles, and a whole industry.
Lab:
/search?q=stores+near+me - 3.26intermediate
Image Pack, Video Carousel, Top Stories
Three media-shaped SERP blocks. Each has its own data shape and its own scraping mechanics, distinct from text results.
Lab:
/search?q=phone&tab=images - 3.27intermediate
AI Overviews and AI Mode, The 2025–2026 Shift
Generative AI is now embedded in the SERP. The block at the top changes click-through, citation flow, and what 'rank 1' even means.
- 3.28intermediate
Mobile vs Desktop SERPs
Same query, two different SERPs. Mobile has different blocks, different order, sometimes different organic results entirely.
Lab:
/search - 3.29intermediate
Location & Language Targeting (`gl`, `hl`, `location` Parameters)
The single most-misunderstood parameter trio in SERP scraping. Get them right, get accurate geo-localized data.
Lab:
/search?gl=in&hl=en - 3.30intermediate
Why Scraping SERPs Directly Is Hard
Captchas, IP bans, randomized markup, geo-IP mismatches, and an arms race that goes back two decades. Here's why nobody serious scrapes Google directly anymore.
- 3.31beginner
The SERP API Category, Why It Exists, Who Uses It
A whole industry built around 'we scrape Google so you don't have to.' How the category emerged, who its customers are, and what they're paying for.
- 3.32intermediate
Comparing Major Providers: SerpApi, Bright Data, ScraperAPI, Zyte, ScrapingBee, ScrapingAnt, ZenRows
Seven of the biggest SERP-API and scraping-API providers. Their positioning, coverage, and trade-offs, without endorsement.
- 3.33intermediate
Evaluation Framework: Coverage, Reliability, Price, Latency
Six dimensions to score any SERP-API on. Run a real test against each provider, then decide.
- 3.34intermediate
Hands-On: SERP API in Python, A Complete Walkthrough
Sign up, get a key, run your first query, parse the JSON, persist to a database. A complete production-shaped Python walkthrough.
- 3.35intermediate
Hands-On: SERP API in PHP, Composer, SDK, Real Queries
The PHP version of the SERP-API walkthrough. Composer, Guzzle, dotenv, SQLite, real queries, same shape, PHP idioms.
- 3.36intermediate
Beyond Google: Bing, DuckDuckGo, Yandex, Baidu, Naver, Brave APIs
Six non-Google engines that matter, for regional reach, AI training data, and audiences Google doesn't serve well.
- 3.37intermediate
Beyond Search Engines: Amazon, Walmart, App Store, eBay, YouTube, Tripadvisor, Yelp APIs
SERP-API providers also scrape major marketplaces and platforms. Same provider, different engine parameter, and a different data game.
- 3.38intermediate
SERP-API-Specific Features: Async Searches, Search Archives, Location Lookups
Beyond the basic search call, providers ship features that change what's possible. Async batching, history archives, location helpers, and more.
- 3.39intermediate
Cost Optimization: Caching, Result Reuse, Selective Field Extraction
At $1–$5 per 1k calls, every redundant search is real money. The four mechanical patterns that cut a SERP-API bill in half.
- 3.40intermediate
Building Your Own Thin Wrapper Around a SERP API
Don't ship provider-specific calls across your codebase. A 100-line wrapper isolates provider quirks, makes switching trivial, and adds caching/retries in one place.
- 3.41intermediate
GraphQL Scraping: Queries and Endpoints
GraphQL is a single POST endpoint, a typed schema, and a query language. Different from REST in every direction, and increasingly common.
Lab:
/challenges/api/graphql/playground - 3.42advanced
Persisted Queries, The Modern GraphQL Trap
Production GraphQL doesn't accept arbitrary queries, only known hashes. Scrapers must extract the precomputed hashes from the bundle or capture them live.
Lab:
/challenges/api/graphql/persisted - 3.43advanced
WebSocket Scraping for Real-Time Data
When the data updates faster than HTTP polling can keep up. WebSockets are bidirectional, persistent, and surprisingly easy to scrape.
Lab:
/challenges/api/websocket/live-prices - 3.44advanced
Socket.IO and SignalR Protocols
Not all real-time is raw WebSockets. Socket.IO and SignalR are higher-level protocols with their own handshakes, message framing, and quirks.
Lab:
/challenges/api/websocket/socketio - 3.45advanced
Reading Minified JavaScript Like a Detective
Webpack output looks like noise. Read it like code anyway, pretty-print, source maps, search patterns, and the workflow that turns 50,000-char one-liners into discoverable systems.
Lab:
/challenges/api/auth/api-key-in-js - 3.46advanced
DevTools Breakpoints for Capturing Runtime Values
When static analysis fails, breakpoints win. Five breakpoint types, when to use each, and the disciplined workflow of pause-inspect-step.
Lab:
/challenges/api/auth/hmac-signed - 3.47advanced
Mobile App API Capture with mitmproxy
Phones talk to APIs the web doesn't expose. mitmproxy intercepts the traffic; with a CA cert on the device, you see everything decrypted.
- 3.48expert
SSL Pinning, Concepts and Approaches
When the app hardcodes its server's certificate, mitmproxy alone isn't enough. The bypass landscape: Frida, objection, custom builds, and emulator-only realities.
- 3.49expert
TLS Fingerprinting (JA3/JA4), `curl-cffi` and `tls-client`
Anti-bot systems profile your TLS handshake. Python's `requests` looks nothing like Chrome. Two libraries fix this, at the price of a different dependency.
Lab:
/challenges/antibot/tls-fingerprint - 3.50expert
HTTP/2 Fingerprinting Evasion
Beyond TLS, the HTTP/2 layer ALSO fingerprints. Settings frames, window sizes, header pseudo-order, priorities, all distinguishable. Here's the modern arms race.
Lab:
/challenges/antibot/tls-fingerprint
Every lesson has a hands-on lab target on Catalog108 , our first-party practice scraping sandbox. Each lab page has a /grade endpoint that returns pass/fail on your scraper output.