Key Scraper Metrics: Success Rate, Items/Sec, Ban Rate, Proxy Health
The specific metrics that matter for a scraper, what they tell you, how to compute them, and the alert thresholds that catch real problems.
What you’ll learn
- Define the five most important scraper KPIs.
- Compute each from Prometheus counters and histograms.
- Read /admin/stats on Catalog108 as a real-world metrics dashboard.
Generic system metrics (CPU, memory) tell you if the host is alive. They don't tell you if the scraper is doing useful work. Scrapers need scraper-specific KPIs.
The Catalog108 /admin/stats endpoint exposes the kind of request analytics a target site sees about your scraper, bans, rate-limit triggers, request volume. Worth checking while you build the metrics on your side.
The five that matter
| Metric | Healthy | Concerning | Critical |
|---|---|---|---|
| HTTP success rate | > 95% | 80–95% | < 80% |
| Items per second | within ±15% of expected | 30%+ deviation | trending to zero |
| Ban / 403 / 429 rate | < 1% | 1–5% | > 5% |
| Proxy success rate | > 90% | 70–90% | < 70% |
| End-to-end freshness | within SLA | exceeding SLA | many hours stale |
These five catch the vast majority of real scraper incidents.
1. HTTP success rate
sum(rate(scraper_requests_total{status=~"2..|3.."}[5m]))
/
sum(rate(scraper_requests_total[5m]))
Success here means 2xx/3xx. 4xx is sometimes expected (404 for delisted products); be specific:
# Real failure rate, excluding 404
sum(rate(scraper_requests_total{status=~"5..|429"}[5m]))
/
sum(rate(scraper_requests_total[5m]))
Drops in this rate tell you the target is breaking, the network is degrading, or you're being rate-limited. Investigate by drilling into status and spider dimensions.
2. Items per second (throughput)
A scraper can have great HTTP success and still extract zero items if a selector is broken. Track:
ITEMS_SCRAPED = Counter("scraper_items_total", "Items emitted.", ["spider"])
Increment on every yield/emit. Then:
sum by (spider) (rate(scraper_items_total[5m]))
A sudden drop with stable request rate = parser broken. A drop with reduced request rate = upstream bottleneck. Both warrant action, but the diagnoses differ.
3. Ban / 403 / 429 rate
The most direct signal of an anti-bot system pushing back:
sum(rate(scraper_requests_total{status=~"403|429"}[5m]))
/
sum(rate(scraper_requests_total[5m]))
Slice by proxy region or user-agent to find which configuration is being banned:
sum by (proxy_region) (rate(scraper_requests_total{status="403"}[5m]))
If one region's ban rate spikes, rotate that pool out. If all regions are banned, the site detected your scraper fingerprint, not your IPs.
4. Proxy success rate
PROXY_REQUESTS = Counter(
"proxy_requests_total",
"Requests routed through a proxy.",
["proxy_pool", "outcome"] # outcome: success, timeout, banned, refused
)
sum by (proxy_pool) (rate(proxy_requests_total{outcome="success"}[5m]))
/
sum by (proxy_pool) (rate(proxy_requests_total[5m]))
Different proxy pools (datacenter, residential, mobile, region-specific) get their own series. When residential drops below 70%, the provider is probably congested, failover to a backup.
5. End-to-end freshness
Throughput metrics don't catch "the queue is backed up." Freshness measures the gap between "when was this URL last successfully scraped" and now.
FRESHNESS_LAG = Gauge(
"scraper_freshness_lag_seconds",
"Age of the oldest still-pending URL.",
["spider"]
)
# Updated periodically by a metric exporter:
oldest = db.query("SELECT EXTRACT(EPOCH FROM (now() - min(scraped_at))) FROM products WHERE scraped_at < now() - interval '1 day'")
FRESHNESS_LAG.labels(spider="products").set(oldest)
Alert when freshness exceeds your data SLA (e.g. all products scraped within 24h).
Anti-pattern: averaging away the spike
A "95% success rate" averaged over 24 hours can hide a 30-minute window where it was 40%. Always:
- Compute success rate over short windows (1m, 5m, 15m).
- Look at the time series, not just the current number.
- Set alerts on short-window rates, not 24-hour rates.
Per-target dimensions
For scrapers hitting multiple targets:
REQUESTS = Counter(
"scraper_requests_total", "...",
["spider", "target", "status"]
)
target could be a domain or a "product feed source." Now you can answer "is amazon failing while ebay is fine?" with one query. Caveat: keep cardinality bounded, limit target to a known list, not every URL.
Reading Catalog108's /admin/stats
Catalog108 has an admin-only /admin/stats endpoint that returns aggregated request analytics, what the target site sees from your scraper. It includes:
- Total requests in last N hours, by IP/UA cluster.
- 4xx/5xx response counts.
- Banned client identifiers.
When building a scraper against Catalog108, check /admin/stats while scraping. The numbers there should approximately match your local metrics. Big divergences mean you're double-counting, or the target is seeing requests you didn't expect (a runaway worker, a stuck retry).
Dashboards: the four panels
A scraper Grafana dashboard with these four panels covers most on-call needs:
- Requests/sec by status. Color 5xx and 429 red. At a glance, you see traffic shape and failure types.
- Items/sec by spider. Reveals parser breakage independent of HTTP success.
- Ban rate (403+429) by proxy_region. Shows which proxies are blown.
- Freshness lag. Shows whether the scraper is keeping up.
If you can answer "is the scraper healthy right now?" in under 30 seconds from this dashboard, your observability is in good shape.
Hands-on lab
Add the five counters/gauges to your Catalog108 scraper. Run a scrape, then:
- Use Catalog108's
/admin/statsto see what the target observed. - Compare to your Prometheus metrics.
- If your
scraper_requests_totalis much higher than/admin/statsshows, you have retries inflating local counts, or you've been served from local cache.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/admin/statsQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.