Building a Scraping SaaS (Real Examples and Margins)
The shape, economics, and risks of running a scraping SaaS, drawn from the public histories of companies in the space.
What you’ll learn
- Distinguish DaaS, scraping-API, and proxy-SaaS business models.
- Understand the structural costs that determine SaaS margins.
- Recognize the moats and risks specific to scraping SaaS.
A scraping SaaS is a company whose product is built on scraped data or scraping infrastructure. The space has several distinct business models with quite different economics. This lesson is the map, not financial advice, and not promises of specific revenue.
The three main business models
| Model | What you sell | Examples | Margin profile |
|---|---|---|---|
| Data-as-a-Service (DaaS) | Curated datasets via API/dashboard | SimilarWeb, ZoomInfo, Apify Datasets | High; software margins after scrape costs |
| Scraping API | "Send a URL, get HTML/JSON", abstracted anti-bot | ScrapingBee, ScraperAPI, Zyte API, ScraperHero | Medium; high proxy costs |
| Proxy / unblocking SaaS | Proxies, anti-bot bypass tooling | Bright Data, Oxylabs, Smartproxy | Variable; bandwidth-heavy |
There are also vertical SaaS players (SEMrush, SimilarWeb, Phantombuster) whose product is the analysis built on top of scraped data, those are closer to traditional SaaS but built on scraping.
DaaS economics
A DaaS company:
- Invests once in scraper infrastructure and data curation.
- Charges $X/mo per seat or API call.
- Has near-zero marginal cost per customer (compute + bandwidth + support).
The math at $50/mo × 200 customers = $10k MRR with maybe $1–2k/mo in infrastructure and proxies. Gross margins can be 70–90%.
Caveats:
- Sales cycle for B2B DaaS is long (1–3 months for the first contact to convert).
- Churn at lower tiers can be brutal (5–10%/mo).
- The data has to be reliably fresh, backsliding on freshness kills retention.
Scraping API economics
A scraping API company:
- Built abstracted infrastructure: proxies, browser farms, anti-bot bypass.
- Customers send URLs; service returns rendered HTML or parsed JSON.
- Charges per request (e.g. $0.50–2.00 per 1000 requests).
Costs are dominated by:
- Residential/mobile proxy bandwidth. Often the biggest line item.
- Headless browser CPU/RAM, expensive at scale.
- Anti-bot R&D, perpetual catch-up game.
Margins after proxy costs are 30–60%, narrower than DaaS. Scale matters more, a profitable scraping API typically needs $1M+ ARR to absorb the fixed engineering costs.
Proxy SaaS economics
Proxy providers (Bright Data, Oxylabs, Smartproxy, IPRoyal):
- Source IPs from residential users, mobile carriers, datacenters.
- Multiplex across many customers.
- Charge per GB of traffic or per IP/hour.
Capital-intensive: you must contract with bandwidth providers, build the proxy network, handle abuse. Margins vary wildly; commodity datacenter proxies have low margins; specialized residential/mobile has higher.
Vertical SaaS using scraping
The big winners in this space are often not "scraping companies" but vertical SaaS that happens to scrape:
- SEO tools (SEMrush, Ahrefs), scrape SERPs, sell rank-tracking software.
- Competitive intelligence (SimilarWeb), scrape and aggregate web traffic estimates.
- Job aggregators (Indeed, Glassdoor history).
- Real estate aggregators (Redfin, Zillow at various points).
These are SaaS with scraping as the back-end. Pricing follows traditional SaaS, $10s to $1000s per seat per month. Customers often don't know (or care) the data comes from scraping.
This is arguably the highest-leverage path: build a vertical-specific application; scraping is implementation detail.
What makes scraping SaaS hard
1. Anti-bot escalation is permanent. Targets fight back. You're in a perpetual cat-and-mouse with sites that don't want to be scraped. Each major target's protection upgrade can hit your service.
2. Legal posture grows with revenue. A $1k/mo side project rarely attracts lawyers. A $1M/yr company scraping a Fortune-500 target absolutely does. hiQ vs LinkedIn (covered in lesson 82) is the canonical reference.
3. Concentration risk. If 60% of your customers care about one target site, and that site implements a new anti-bot scheme overnight, your business has a bad week, or quarter.
4. Customer support is technical. Customers debug their own scrapes against your API. You're often supporting their integration, not just your service.
5. Cost variance. Proxy bandwidth on residential IPs varies $5–20/GB. A poorly-engineered scrape can blow your margins.
What makes it work
1. Vertical focus. "Scraping API for everything" is harder to compete in than "scraping API for e-commerce price intelligence." Niches reduce the surface.
2. Defensible data assets. If your DaaS has built a multi-source aggregated dataset over years, new entrants can't replicate quickly.
3. Strong customer relationships. B2B retention compounds. Long-tenured customers buy expansion, refer peers, ride out service issues.
4. Engineering excellence. Bad scrapers cost orders of magnitude more. A team that can maintain 99% success rate beats one running 80%.
5. Honest legal posture. Avoid the obvious-trouble targets (personal data, paywalled content). The successful long-running scraping SaaS firms typically scrape public, factual data.
Honest revenue ranges (very rough)
These are wildly variable and depend on niche, sales effort, retention, and luck. Treat as orientation:
- Solo or 2-person DaaS / vertical SaaS focused on a niche: $50k–500k/yr ARR is plausible within a few years.
- 5–20 person scraping API: $1M–10M ARR plausible if positioned well.
- Top-tier proxy/unblocking (Bright Data, etc.): $100M+ revenue companies exist; rare.
Most attempts don't reach the highest tier. The middle is where many find sustainable businesses.
Funding considerations
Most scraping SaaS bootstraps. VC interest is mixed:
- Legal uncertainty makes some investors nervous.
- Markets are perceived as smaller than mainstream SaaS.
- Successful exits (Bright Data, Apify raises) exist but aren't the default trajectory.
Bootstrapping is the modal path. Slow growth, customer-funded, profitable from early.
Exit options
- Acquisition by data-aggregator companies (LinkedIn-style, real estate data, B2B sales tools).
- Acquisition by adjacent SaaS (SEO tools, market intelligence).
- Strategic acquisition by a customer (rare but happens).
- Lifestyle-business profitability (the most common "exit", just keep running it).
What to learn from established players
Look at companies in the space, watch their:
- Pricing pages. What they sell, what tiers exist, what they emphasize.
- API documentation. Quality and clarity of docs is a moat in itself.
- Marketing. What problems they describe, what customer profiles they speak to.
- Job postings. Reveals what they're investing in.
Concrete: read Apify, Zyte, Bright Data, Smartproxy public materials. Read SEMrush and SimilarWeb pricing. You'll see clear patterns.
What to try
Pick three scraping-related companies whose business you find interesting. For each:
- What exactly are they selling?
- Who is their customer? (Be specific.)
- What's their pricing structure?
- What public moat do they have?
- What's their biggest risk?
Synthesize: which model would you build, given your skills and risk tolerance?
That exercise is the start of a real business plan, or the realization that you'd rather stay freelance, which is also valid.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.