Free curriculum
Web Scraping: Complete Learning Path
A structured, hands-on curriculum that takes you from “what is HTTP” to running production scrapers at scale. Every lesson comes with a quiz and a real lab target on Catalog108, our first-party practice sandbox.
The path
Five sub-paths plus a final mastery project. Each sub-path is shippable on its own, you can stop after Static Scraping and already be employable.
- 1
Foundations
~4 weeks part-time · 20 lessonsPrerequisites before any sub-path
How the web actually works under the hood: HTTP, HTML, CSS, XPath, DevTools, plus the Python and PHP setup that the rest of the curriculum builds on. Skip nothing here, every later lesson assumes this.
20 lessons published →
- 2
Static Scraping
~6 weeks part-time · 34 lessonsHTTP + HTML. Fast, lightweight. Python and PHP.
Send requests, parse HTML, follow pagination, submit forms, store results. Taught in Python (requests + BeautifulSoup + lxml) and PHP (Guzzle + DomCrawler), equally first-class. Every lesson lands on a stable lab target at Catalog108.
34 lessons published →
- 3
Dynamic Web & Browser Automation
~4 weeks part-time · 30 lessonsWhen static fails, drive a real browser.
For JS-rendered sites, SPAs, infinite scroll, modals, iframes, Shadow DOM. Playwright is the main tool, with Selenium and Puppeteer for completeness. Each lesson runs against the dynamic challenges at Catalog108.
30 lessons published →
- 4
APIs, SERPs & Reverse Engineering
~8 weeks part-time · 50 lessonsSkip the HTML. Hit JSON directly.
The pro path: REST, GraphQL, auth flows (cookie, JWT, OAuth, HMAC), reverse-engineering minified JS, and a complete tour of SERP-scraping APIs. The deepest, highest-leverage sub-path.
50 lessons published →
- 5
Production, Scale & Career
~10 weeks part-time · 85 lessonsRun everything at scale, reliably.
Scrapy and Symfony for production scrapers. Async, proxies, fingerprinting, CAPTCHAs, distributed crawling, monitoring, deployment, and the legal/career framing that turns this into a livelihood.
85 lessons published →
- 6
Final Mastery Project
~4 weeks part-time · 1 projectShip the one project that proves it.
Pick a multi-source data product, build it end-to-end, deploy it, document it. Five suggested capstones, price intelligence, jobs analytics, real-estate, public data, or SERP rank tracker.
7 lessons published →
Why this curriculum exists
- Two languages, equally first-class. Most courses pick Python or PHP and ignore the other.
- First-party labs at Catalog108. No dependency on external sandboxes that disappear or rate-limit you.
- Auto-graded labs. Submit your scraper’s output and get pass/fail, not passive reading.
- Reverse engineering, taught explicitly. Almost no other free course covers this.