Blocking Resources for 3–5x Speedup
Most page weight is images, fonts, ads, and analytics. Blocking them at the browser level slashes scrape time without losing the data you actually want.
What you’ll learn
- Block images, fonts, stylesheets, and trackers via Playwright's `route()` API.
- Measure the bandwidth and time saved per block category.
- Distinguish blockable resources (cosmetic) from required ones (the page won't render).
- Apply category-blocking, domain-blocking, and pattern-blocking strategies appropriately.
A modern page is 80% cruft your scraper doesn't need. Images, fonts, ads, analytics, chat widgets, video preloads. Blocking them at the browser network layer cuts page load by 3–5×, drops memory use, and reduces detection surface, the more requests you make, the more fingerprintable you become. This is one of the highest-ROI optimisations available.
What page weight looks like
A typical e-commerce listing page:
| Resource type | Size | Time | Required for data? |
|---|---|---|---|
| HTML | 50 KB | 100 ms | Yes |
| CSS | 200 KB | 200 ms | No (you don't render) |
| JS bundles | 800 KB | 500 ms | Usually yes (runs the SPA) |
| Images | 2-5 MB | 1-3 s | No |
| Fonts | 300 KB | 300 ms | No |
| Ads/analytics | 500 KB | varies | No |
| Total | 4-7 MB | 2-5 s |
Block images, fonts, and ads, and you've cut 60-80% of the bytes and roughly half the time. Block CSS too (the page often still works) and you save another second.
Playwright's route API
def blocker(route):
if route.request.resource_type in {"image", "font", "media"}:
route.abort()
else:
route.continue_()
page.route("**/*", blocker)
page.route(pattern, handler) intercepts requests matching the pattern. Your handler either aborts (request never goes out) or continues (request proceeds normally). The pattern **/* catches everything; you filter inside the handler by resource type, URL, or anything else.
Resource types
Playwright categorises every request:
resource_type |
What it includes |
|---|---|
document |
The main HTML |
stylesheet |
CSS |
script |
JS files |
image |
All image formats |
font |
Web fonts |
media |
Video/audio |
xhr |
XHR (legacy) |
fetch |
Fetch API requests |
websocket |
WebSocket frames |
manifest |
Web app manifests |
other |
Everything else |
For most scrapes, block image, font, and media. Keep document, script, fetch, xhr. Stylesheets are often safe to block too but occasionally break layout-dependent JS, test on your target.
Blocking by domain
Ads, analytics, and tracking pixels usually come from third-party domains:
BLOCKED_DOMAINS = {
"google-analytics.com",
"googletagmanager.com",
"doubleclick.net",
"facebook.com",
"facebook.net",
"scorecardresearch.com",
"hotjar.com",
"intercom.io",
"segment.com",
"amplitude.com",
}
def blocker(route):
host = route.request.url.split("/")[2] # crude but effective
if any(b in host for b in BLOCKED_DOMAINS):
route.abort()
elif route.request.resource_type in {"image", "font", "media"}:
route.abort()
else:
route.continue_()
page.route("**/*", blocker)
This combined filter blocks (a) any third-party tracker and (b) any image/font/media. Most pages render the data you want in 60% less time.
Measuring the savings
Before / after benchmark:
import time
# Without blocking
t0 = time.perf_counter()
page.goto("https://practice.scrapingcentral.com/products")
page.wait_for_selector(".product-card")
unblocked = time.perf_counter() - t0
# With blocking, same code in a new context with route() registered
# ...
blocked = time.perf_counter() - t0
print(f"Saved {(unblocked - blocked):.2f}s per page")
Run it. Typical savings on a real e-commerce listing: 1.5–4 seconds per page. Over 1000 pages, that's hours.
When blocking BREAKS the page
Not all resources are dispensable. Three risks:
1. JS that requires CSS to "see" elements. Some SPAs measure element sizes after CSS applies. Block CSS and getBoundingClientRect returns zeros, breaking layout-dependent code.
2. Fonts that block JS. Rarely, JS waits for document.fonts.ready. Blocking fonts can hang the page.
3. Images that the page logic checks. Sometimes a missing image triggers an "error" state in JS that redirects you away. Diagnose by checking what the page does without images.
If blocking causes failures, narrow it: block only third-party domains, or only specific URL patterns, instead of broad resource-type blocks.
Pattern-based blocking
# Block only video preview thumbnails (specific URL pattern)
page.route("**/*-thumb.jpg", lambda r: r.abort())
# Block a particular CDN
page.route("https://cdn.example.com/**", lambda r: r.abort())
# Block a particular file
page.route("**/heavy-analytics.js", lambda r: r.abort())
Multiple route calls compose. Each handler is called in registration order until one acts.
The opposite: allow-listing
For aggressive optimisation, flip the default:
ALLOWED_RESOURCE_TYPES = {"document", "script", "fetch", "xhr"}
ALLOWED_DOMAINS = {"practice.scrapingcentral.com"}
def blocker(route):
host = route.request.url.split("/")[2]
if (route.request.resource_type in ALLOWED_RESOURCE_TYPES
and any(d in host for d in ALLOWED_DOMAINS)):
route.continue_()
else:
route.abort()
page.route("**/*", blocker)
Block everything except known-good types from known-good hosts. Faster, but you'll occasionally block something the page actually needs, be prepared to debug.
Stealth implications
Resource blocking changes your fingerprint. A real browser fetches favicon.ico, makes analytics calls, downloads fonts. A scraper that doesn't fetch any of these looks suspicious to fingerprinting systems. Some considerations:
- Block by domain (third-party trackers) is generally safer than block by type (all images).
- For anti-bot-protected sites, partial blocking, letting first-party resources through and only nuking third parties, is the right balance.
- Sub-Path 5 covers fingerprint preservation in depth.
A practical recipe
TRACKER_DOMAINS = (
"google-analytics", "googletagmanager", "doubleclick",
"facebook.com", "facebook.net", "scorecardresearch",
"hotjar", "intercom.io", "segment", "amplitude",
"mixpanel", "fullstory", "newrelic",
)
BLOCK_TYPES = {"image", "font", "media"}
def make_blocker(allow_images=False):
def handler(route):
url = route.request.url
rtype = route.request.resource_type
if any(t in url for t in TRACKER_DOMAINS):
return route.abort()
if rtype in BLOCK_TYPES and not (rtype == "image" and allow_images):
return route.abort()
return route.continue_()
return handler
page.route("**/*", make_blocker())
Plug this into every scraper. Default-safe, easy to tune.
Hands-on lab
Open /products. Measure baseline scrape time (load page, count products). Add the blocker recipe. Measure again. You should see a 50–70% reduction in load time and a comparable drop in bytes downloaded (which the DevTools Network panel will confirm). Try blocking JS too, note how the page falls apart, demonstrating which categories are truly optional vs. required.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/productsQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.