Blocking Resources for 3–5x Speedup, Dynamic Web & Browser Automation

Most page weight is images, fonts, ads, and analytics. Blocking them at the browser level slashes scrape time without losing the data you actually want.

A modern page is 80% cruft your scraper doesn't need. Images, fonts, ads, analytics, chat widgets, video preloads. Blocking them at the browser network layer cuts page load by 3–5×, drops memory use, and reduces detection surface, the more requests you make, the more fingerprintable you become. This is one of the highest-ROI optimisations available.

What page weight looks like

A typical e-commerce listing page:

Resource type	Size	Time	Required for data?
HTML	50 KB	100 ms	Yes
CSS	200 KB	200 ms	No (you don't render)
JS bundles	800 KB	500 ms	Usually yes (runs the SPA)
Images	2-5 MB	1-3 s	No
Fonts	300 KB	300 ms	No
Ads/analytics	500 KB	varies	No
Total	4-7 MB	2-5 s

Block images, fonts, and ads, and you've cut 60-80% of the bytes and roughly half the time. Block CSS too (the page often still works) and you save another second.

Playwright's `route` API

def blocker(route):
  if route.request.resource_type in {"image", "font", "media"}:
  route.abort()
  else:
  route.continue_()

page.route("**/*", blocker)

page.route(pattern, handler) intercepts requests matching the pattern. Your handler either aborts (request never goes out) or continues (request proceeds normally). The pattern **/* catches everything; you filter inside the handler by resource type, URL, or anything else.

Resource types

Playwright categorises every request:

`resource_type`	What it includes
`document`	The main HTML
`stylesheet`	CSS
`script`	JS files
`image`	All image formats
`font`	Web fonts
`media`	Video/audio
`xhr`	XHR (legacy)
`fetch`	Fetch API requests
`websocket`	WebSocket frames
`manifest`	Web app manifests
`other`	Everything else

For most scrapes, block image, font, and media. Keep document, script, fetch, xhr. Stylesheets are often safe to block too but occasionally break layout-dependent JS, test on your target.

Blocking by domain

Ads, analytics, and tracking pixels usually come from third-party domains:

BLOCKED_DOMAINS = {
  "google-analytics.com",
  "googletagmanager.com",
  "doubleclick.net",
  "facebook.com",
  "facebook.net",
  "scorecardresearch.com",
  "hotjar.com",
  "intercom.io",
  "segment.com",
  "amplitude.com",
}

def blocker(route):
  host = route.request.url.split("/")[2]  # crude but effective
  if any(b in host for b in BLOCKED_DOMAINS):
  route.abort()
  elif route.request.resource_type in {"image", "font", "media"}:
  route.abort()
  else:
  route.continue_()

page.route("**/*", blocker)

This combined filter blocks (a) any third-party tracker and (b) any image/font/media. Most pages render the data you want in 60% less time.

Measuring the savings

Before / after benchmark:

import time

# Without blocking
t0 = time.perf_counter()
page.goto("https://practice.scrapingcentral.com/products")
page.wait_for_selector(".product-card")
unblocked = time.perf_counter() - t0

# With blocking, same code in a new context with route() registered
# ...
blocked = time.perf_counter() - t0
print(f"Saved {(unblocked - blocked):.2f}s per page")

Run it. Typical savings on a real e-commerce listing: 1.5–4 seconds per page. Over 1000 pages, that's hours.

When blocking BREAKS the page

Not all resources are dispensable. Three risks:

1. JS that requires CSS to "see" elements. Some SPAs measure element sizes after CSS applies. Block CSS and getBoundingClientRect returns zeros, breaking layout-dependent code.

2. Fonts that block JS. Rarely, JS waits for document.fonts.ready. Blocking fonts can hang the page.

3. Images that the page logic checks. Sometimes a missing image triggers an "error" state in JS that redirects you away. Diagnose by checking what the page does without images.

If blocking causes failures, narrow it: block only third-party domains, or only specific URL patterns, instead of broad resource-type blocks.

Pattern-based blocking

# Block only video preview thumbnails (specific URL pattern)
page.route("**/*-thumb.jpg", lambda r: r.abort())

# Block a particular CDN
page.route("https://cdn.example.com/**", lambda r: r.abort())

# Block a particular file
page.route("**/heavy-analytics.js", lambda r: r.abort())

Multiple route calls compose. Each handler is called in registration order until one acts.

The opposite: allow-listing

For aggressive optimisation, flip the default:

ALLOWED_RESOURCE_TYPES = {"document", "script", "fetch", "xhr"}
ALLOWED_DOMAINS = {"practice.scrapingcentral.com"}

def blocker(route):
  host = route.request.url.split("/")[2]
  if (route.request.resource_type in ALLOWED_RESOURCE_TYPES
  and any(d in host for d in ALLOWED_DOMAINS)):
  route.continue_()
  else:
  route.abort()

page.route("**/*", blocker)

Block everything except known-good types from known-good hosts. Faster, but you'll occasionally block something the page actually needs, be prepared to debug.

Stealth implications

Resource blocking changes your fingerprint. A real browser fetches favicon.ico, makes analytics calls, downloads fonts. A scraper that doesn't fetch any of these looks suspicious to fingerprinting systems. Some considerations:

Block by domain (third-party trackers) is generally safer than block by type (all images).
For anti-bot-protected sites, partial blocking, letting first-party resources through and only nuking third parties, is the right balance.
Sub-Path 5 covers fingerprint preservation in depth.

A practical recipe

TRACKER_DOMAINS = (
  "google-analytics", "googletagmanager", "doubleclick",
  "facebook.com", "facebook.net", "scorecardresearch",
  "hotjar", "intercom.io", "segment", "amplitude",
  "mixpanel", "fullstory", "newrelic",
)

BLOCK_TYPES = {"image", "font", "media"}

def make_blocker(allow_images=False):
  def handler(route):
  url = route.request.url
  rtype = route.request.resource_type
  if any(t in url for t in TRACKER_DOMAINS):
  return route.abort()
  if rtype in BLOCK_TYPES and not (rtype == "image" and allow_images):
  return route.abort()
  return route.continue_()
  return handler

page.route("**/*", make_blocker())

Plug this into every scraper. Default-safe, easy to tune.

Hands-on lab

Open /products. Measure baseline scrape time (load page, count products). Add the blocker recipe. Measure again. You should see a 50–70% reduction in load time and a comparable drop in bytes downloaded (which the DevTools Network panel will confirm). Try blocking JS too, note how the page falls apart, demonstrating which categories are truly optional vs. required.

Blocking Resources for 3–5x Speedup

What you’ll learn

What page weight looks like

Playwright's `route` API

Resource types

Blocking by domain

Measuring the savings

When blocking BREAKS the page

Pattern-based blocking

The opposite: allow-listing

Stealth implications

A practical recipe

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which resource types are USUALLY safe to block in a scraper and yield the biggest speedup?

Blocking Resources for 3–5x Speedup

What you’ll learn

What page weight looks like

Playwright's route API

Resource types

Blocking by domain

Measuring the savings

When blocking BREAKS the page

Pattern-based blocking

The opposite: allow-listing

Stealth implications

A practical recipe

Hands-on lab

Hands-on lab

Quiz, check your understanding

Which resource types are USUALLY safe to block in a scraper and yield the biggest speedup?

Playwright's `route` API