Pick One Project, Ship It Publicly
How the capstone works, what counts as done, and how to pick between the five project options without overthinking it.
What you’ll learn
- Understand what the capstone is for, beyond the obvious.
- Pick one of the five project options confidently.
- Know exactly what 'done' looks like before you start.
- Plan a realistic 4-week build timeline.
You've spent four sub-paths learning. The capstone is the single artefact that proves you can do it, to yourself, to a hiring manager, to a future client.
What the capstone is for
Three audiences:
-
You. Building a real thing that runs daily forces you to face the parts of scraping you've handwaved, proxy budgets, error rates, schema drift, the 3am cron failure. You won't truly understand any of it until you've shipped something that breaks and you've fixed it.
-
A hiring manager. A scraping engineer with a deployed GitHub project + a blog post explaining the architecture beats a CV with "5 years experience" for every job posting that doesn't go through HR keyword filters.
-
A future client. When someone asks "can you build me X?", linking to a live instance you already built ends the conversation faster than any pitch deck.
The capstone is not a test. It's an audition. Treat it like one.
What "done" looks like
Per the curriculum spec, your project must include:
| Requirement | Why |
|---|---|
| Working scrapers in both Python and PHP | Proves the two-language fluency the curriculum taught |
| SERP API integration for at least one source | Demonstrates you know when to buy vs build |
| Proxy rotation | Real production scrapers can't run on one IP |
| Browser automation for at least one source | Shows Sub-Path 2 fluency, not just static fluency |
| Symfony or Scrapy as the backend | Picks one production-grade framework, not a hand-rolled loop |
| Public GitHub repo | The artefact a hiring manager / client can inspect |
| Deployed live (VPS, Apify, Zyte, AWS, your choice) | Proves you operationalised it |
| Detailed blog post on scrapingcentral.com/blogs (or your own) | Proves you can explain what you built |
Optional but recommended:
- Monitoring dashboard (Grafana / a tiny status page)
- README with architecture diagram (text-mode is fine; mermaid renders on GitHub)
- Cost breakdown, VPS cost, proxy cost, SERP API spend. Real numbers, not handwaving.
- Failure log, three things that broke during the build and how you fixed them
How to pick between the five projects
The five options are deliberately different in shape, not difficulty. Pick by what data interests you plus what you can actually deploy without spending money you don't have.
Quick filter:
Do you have $50/month for proxies + SERP API budget over 4 weeks?
├── Yes → any of A, B, E
└── No → C or D (work mostly against free public data + Catalog108)
Do you care about commercial / pricing data?
├── Yes → A
└── No → continue
Do you care about the job market / your own next role?
├── Yes → B
└── No → continue
Do you care about housing in your city?
├── Yes → C
└── No → continue
Do you want to publish something useful to the public?
├── Yes → D
└── No → E (rank tracker, most commercial, hardest to monetize without traffic)
If two feel equally interesting, pick the one whose first data source is already in your DevTools within an hour of starting. The project you can't get the first scrape working on by day 1 is the wrong project.
Effort and timeline
The capstones are scoped to 4 weeks part-time (10–15 hours / week = 40–60 total hours). That maps roughly to:
| Week | Focus |
|---|---|
| 1 | Scrape one data source end-to-end; persist to a real database; deploy a static dashboard |
| 2 | Add second + third sources; introduce proxy rotation; standardise the schema |
| 3 | Add browser-automation source; SERP API source; daily cron; monitoring |
| 4 | Polish, blog post, README, public launch |
If you blow past 6 weeks, you've over-scoped. Cut features, ship sooner.
Where each project's lab targets are
Catalog108 covers the first scraper for every option:
- A (Price intelligence),
practice.scrapingcentral.com/products+/deals/live - B (Job market),
/jobs+/jobs/companies/{slug} - C (Real estate), no direct Catalog108 target; use external public listings
- D (Public-data aggregator), depends on the dataset you pick
- E (SERP rank tracker),
/searchfor the SERP-shape practice, then SERP API for the real engines
External data sources are listed in each project brief. Pick targets where:
- The robots.txt allows the path you want to hit.
- The site has structured data (not just JS rendering into canvas).
- The volume is reasonable (≤10k records, not millions).
- You can defend the use case to a non-technical friend.
What "shipped" really means
When the dust settles, you should have three things to show:
- A public GitHub repo with the working code.
- A live deployment URL (anything reachable on the public internet).
- A blog post documenting what you built (700+ words, on your blog or scrapingcentral.com/blogs).
That's the trio a hiring manager or potential client clicks through. Don't build a fourth thing until those three are solid.
What to do after you ship
The instinct is to start a second project. Resist for two weeks. Instead:
- Tweet / post on LinkedIn about the project. One screenshot, one paragraph, the GitHub link. Even with zero followers, this trains you to write about your work.
- Apply for three scraping jobs / freelance projects. Link the capstone in your application. Note the response rate.
- Maintain the deployed instance for 30 days. Things will break. Fix them. The "things that broke and how I fixed them" section of your blog post is what proves you know production.
- Then start a second project.
The cycle repeats. Each project is shorter and better than the last.
What this lesson is, exactly
This isn't a tutorial. It's a brief for a brief. The next five lessons are the five project options in detail, read all five, pick one, then go and build.
Hands-on lab
Pick your project. Open a new GitHub repo today, even if it's empty. Name it. Write a 200-word README explaining what you intend to build. Commit. Push. That's day 1.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.