Pick One Project, Ship It Publicly, Final Mastery Project

How the capstone works, what counts as done, and how to pick between the five project options without overthinking it.

You've spent four sub-paths learning. The capstone is the single artefact that proves you can do it, to yourself, to a hiring manager, to a future client.

What the capstone is for

Three audiences:

You. Building a real thing that runs daily forces you to face the parts of scraping you've handwaved, proxy budgets, error rates, schema drift, the 3am cron failure. You won't truly understand any of it until you've shipped something that breaks and you've fixed it.
A hiring manager. A scraping engineer with a deployed GitHub project + a blog post explaining the architecture beats a CV with "5 years experience" for every job posting that doesn't go through HR keyword filters.
A future client. When someone asks "can you build me X?", linking to a live instance you already built ends the conversation faster than any pitch deck.

The capstone is not a test. It's an audition. Treat it like one.

What "done" looks like

Per the curriculum spec, your project must include:

Requirement	Why
Working scrapers in both Python and PHP	Proves the two-language fluency the curriculum taught
SERP API integration for at least one source	Demonstrates you know when to buy vs build
Proxy rotation	Real production scrapers can't run on one IP
Browser automation for at least one source	Shows Sub-Path 2 fluency, not just static fluency
Symfony or Scrapy as the backend	Picks one production-grade framework, not a hand-rolled loop
Public GitHub repo	The artefact a hiring manager / client can inspect
Deployed live (VPS, Apify, Zyte, AWS, your choice)	Proves you operationalised it
Detailed blog post on scrapingcentral.com/blogs (or your own)	Proves you can explain what you built

Optional but recommended:

Monitoring dashboard (Grafana / a tiny status page)
README with architecture diagram (text-mode is fine; mermaid renders on GitHub)
Cost breakdown, VPS cost, proxy cost, SERP API spend. Real numbers, not handwaving.
Failure log, three things that broke during the build and how you fixed them

How to pick between the five projects

The five options are deliberately different in shape, not difficulty. Pick by what data interests you plus what you can actually deploy without spending money you don't have.

Quick filter:

Do you have $50/month for proxies + SERP API budget over 4 weeks?
├── Yes → any of A, B, E
└── No → C or D (work mostly against free public data + Catalog108)

Do you care about commercial / pricing data?
├── Yes → A
└── No → continue

Do you care about the job market / your own next role?
├── Yes → B
└── No → continue

Do you care about housing in your city?
├── Yes → C
└── No → continue

Do you want to publish something useful to the public?
├── Yes → D
└── No → E (rank tracker, most commercial, hardest to monetize without traffic)

If two feel equally interesting, pick the one whose first data source is already in your DevTools within an hour of starting. The project you can't get the first scrape working on by day 1 is the wrong project.

Effort and timeline

The capstones are scoped to 4 weeks part-time (10–15 hours / week = 40–60 total hours). That maps roughly to:

Week	Focus
1	Scrape one data source end-to-end; persist to a real database; deploy a static dashboard
2	Add second + third sources; introduce proxy rotation; standardise the schema
3	Add browser-automation source; SERP API source; daily cron; monitoring
4	Polish, blog post, README, public launch

If you blow past 6 weeks, you've over-scoped. Cut features, ship sooner.

Where each project's lab targets are

Catalog108 covers the first scraper for every option:

A (Price intelligence), practice.scrapingcentral.com/products + /deals/live
B (Job market), /jobs + /jobs/companies/{slug}
C (Real estate), no direct Catalog108 target; use external public listings
D (Public-data aggregator), depends on the dataset you pick
E (SERP rank tracker), /search for the SERP-shape practice, then SERP API for the real engines

External data sources are listed in each project brief. Pick targets where:

The robots.txt allows the path you want to hit.
The site has structured data (not just JS rendering into canvas).
The volume is reasonable (≤10k records, not millions).
You can defend the use case to a non-technical friend.

What "shipped" really means

When the dust settles, you should have three things to show:

A public GitHub repo with the working code.
A live deployment URL (anything reachable on the public internet).
A blog post documenting what you built (700+ words, on your blog or scrapingcentral.com/blogs).

That's the trio a hiring manager or potential client clicks through. Don't build a fourth thing until those three are solid.

What to do after you ship

The instinct is to start a second project. Resist for two weeks. Instead:

Tweet / post on LinkedIn about the project. One screenshot, one paragraph, the GitHub link. Even with zero followers, this trains you to write about your work.
Apply for three scraping jobs / freelance projects. Link the capstone in your application. Note the response rate.
Maintain the deployed instance for 30 days. Things will break. Fix them. The "things that broke and how I fixed them" section of your blog post is what proves you know production.
Then start a second project.

The cycle repeats. Each project is shorter and better than the last.

What this lesson is, exactly

This isn't a tutorial. It's a brief for a brief. The next five lessons are the five project options in detail, read all five, pick one, then go and build.

Hands-on lab

Pick your project. Open a new GitHub repo today, even if it's empty. Name it. Write a 200-word README explaining what you intend to build. Commit. Push. That's day 1.

Pick One Project, Ship It Publicly

What you’ll learn