Sub-path 1 of 6
Foundations
Prerequisites before any sub-path
How the web actually works under the hood: HTTP, HTML, CSS, XPath, DevTools, plus the Python and PHP setup that the rest of the curriculum builds on. Skip nothing here, every later lesson assumes this.
~4 weeks part-time · 20 lessons
Lessons
- F1beginner
Client, Server, DNS, IP, the Architecture
The four moving parts behind every web request, what each does, and why scrapers need to understand all of them.
Lab:
/ - F2beginner
HTTP Protocol: Methods, Status Codes, Headers
The actual on-the-wire format your scraper speaks. Methods you'll use, status codes you'll interpret, headers that change behaviour.
Lab:
/ - F3beginner
HTTPS, TLS, and Why It Matters for Scraping
What TLS actually does, the certificate verification you'll be tempted to disable but shouldn't, and the fingerprint that gets your scraper blocked before you ever send a request.
Lab:
/ - F4beginner
Cookies, Sessions, and Authentication Basics
How servers remember who you are between requests, and how scrapers persist that state correctly without re-logging-in every time.
Lab:
/challenges/static/cookies/set-on-visit - F5beginner
HTML Structure and the DOM
What HTML actually is, what a parser turns it into, and the tree-shaped mental model your scraper needs to navigate.
Lab:
/products - F6beginner
CSS Selectors, Complete Reference
Every CSS selector you need to know, organised by what you'll actually use them for in scrapers.
Lab:
/challenges/static/lists/cards - F7beginner
XPath, Complete Reference
The more powerful, less familiar query language for DOM nodes. When CSS runs out, XPath keeps going.
Lab:
/challenges/static/tables/nested - F8beginner
Choosing Between CSS Selectors and XPath
When to use which. A short decision framework with concrete examples, not 'XPath is more powerful so always use XPath.'
Lab:
/products/1-yellow-ceramic-mug - F9beginner
Elements Panel, Inspect, Edit, Copy Selectors
Your eyes on a live page. How to use the Elements panel to plan a scrape before writing a single line of code.
Lab:
/products - F10beginner
Network Panel, The Scraper's Most Important Tool
Where the data actually lives. Use Network to find the JSON endpoint the page is hitting and skip HTML scraping entirely.
Lab:
/products/1-yellow-ceramic-mug - F11beginner
Console, Sources, Application Tabs
The three other DevTools panels every scraper should know, for interactive prototyping, reading minified JS, and inspecting client-side storage.
Lab:
/account/login - F12beginner
Copy as cURL and Why It's a Superpower
Right-click → Copy → Copy as cURL. The single most time-saving habit in web scraping. Here's how to use it, what to strip, and how to translate to Python or PHP.
Lab:
/products - F13beginner
Python, pip, venv, uv, Modern Toolchain
Install Python correctly, isolate every project, and meet the tooling that actually makes Python pleasant in 2026.
- F14beginner
Python Crash-Course for Scrapers
The 5% of Python you'll use 95% of the time when writing scrapers. Strings, lists, dicts, comprehensions, file I/O, error handling, and the f-string.
Lab:
/ - F15beginner
JSON, CSV, and Regex Essentials in Python
The three data-handling skills you'll use in every scraper: parsing JSON responses, writing structured output, and reaching for regex without abusing it.
Lab:
/api/products - F16beginner
PHP, Composer, and a Modern Dev Environment
PHP 8.x is a serious language. Here's how to set it up cleanly and use Composer the way modern PHP projects expect.
- F17beginner
PHP Crash-Course for Scrapers
The PHP you'll actually use to write scrapers. Arrays, strings, file I/O, error handling, JSON, and the modern PHP 8 features that make it pleasant.
Lab:
/ - F18beginner
Why PHP Is a Legitimate Scraping Language (and When to Pick It)
PHP is not just for WordPress. It has a serious scraping ecosystem and three concrete advantages over Python in specific situations. Here's when to reach for which.
- F19beginner
Git and GitHub for Scraper Projects
The minimum Git you need to keep scraper projects sane, collaborate, and ship to production. Plus the patterns specific to scraper repos.
- F20beginner
Legal & Ethical Scraping, Your Compass
robots.txt, Terms of Service, GDPR, the CFAA, the hiQ vs LinkedIn ruling, the legal and ethical scaffolding every scraping project should consider before code is written.
Lab:
/robots.txt
Every lesson has a hands-on lab target on Catalog108 , our first-party practice scraping sandbox. Each lab page has a /grade endpoint that returns pass/fail on your scraper output.