How Sites Detect Headless Browsers
Forty signals that distinguish a Playwright/Selenium browser from a real one. Knowing them is the prerequisite to evading them.
What you’ll learn
- List the dozen most-checked headless fingerprint signals.
- Verify each signal manually in DevTools so you can debug stealth failures.
- Categorise signals as cheap (string match) vs expensive (behavioural).
- Recognise the limits of stealth, some signals are hard to forge convincingly.
Anti-bot systems don't care about your scraper's intent. They check a few dozen browser signals and compare against the distribution of real users. Where you stand out, you get rate-limited, served fake data, or blocked. This lesson is the catalog of signals, the next two lessons cover the tools that patch them.
The signal hierarchy
Signals fall into three tiers:
- Cheap string checks.
navigator.webdriver === true, "HeadlessChrome" in User-Agent. Anyone can detect. - JS environment checks. Plugins, languages, fonts, screen size, timing. Mid-tier anti-bot vendors check these.
- Behavioural / timing fingerprints. Mouse movement entropy, key-press intervals, request-pattern entropy. Premium anti-bot (DataDome, Akamai Bot Manager, Kasada) lives here.
Tier 1 is fixable by anyone. Tier 2 needs playwright-stealth or similar. Tier 3 requires patched browsers (Camoufox, rebrowser) or behavioural emulation.
The flagship signal: navigator.webdriver
WebDriver-driven browsers expose window.navigator.webdriver === true. This is the single most-checked signal:
if (navigator.webdriver) {
// it's a bot
}
Both Selenium and Playwright (by default) leave this true. Stealth plugins set it to false via JS injection. Some anti-bot vendors check whether the property can be set (a getter that returns true is a different shape than a simple boolean), which is harder to forge.
Test:
page.goto("about:blank")
print(page.evaluate("navigator.webdriver"))
# → True without stealth, False with stealth (correctly applied)
User-Agent: "HeadlessChrome"
Default headless Chrome's UA contains "HeadlessChrome":
Mozilla/5.0 (...) HeadlessChrome/127.0.0.0 Safari/537.36
Trivially detected. Fix: override at launch.
context = browser.new_context(
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
)
Keep the version number consistent with the actual Chrome you're driving, version mismatches are themselves a signal.
navigator.plugins and navigator.mimeTypes
Real browsers list ~3 plugins (PDF Viewer, etc.) and a handful of mime types. Headless Chrome shows empty arrays by default. Stealth fakes them:
Object.defineProperty(navigator, 'plugins', {
get: () => [{ name: 'PDF Viewer' }, { name: 'Chrome PDF Viewer' }, { name: 'Chromium PDF Viewer' }]
});
navigator.languages
Real Chrome returns something like ["en-US", "en"]. Headless returns ["en-US"] or empty. Fix:
context = browser.new_context(locale="en-US", extra_http_headers={"Accept-Language": "en-US,en;q=0.9"})
But navigator.languages is a JS getter, set it via init script:
page.add_init_script("Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] })")
Screen dimensions
Headless reports a default screen size that's often 0×0 or matches the viewport exactly. Real browsers have screen.width > window.innerWidth (the screen includes taskbars and chrome). Fix: set both viewport and screen dimensions and ensure they differ:
context = browser.new_context(
viewport={"width": 1280, "height": 720},
screen={"width": 1920, "height": 1080},
)
window.chrome
Real Chrome exposes a window.chrome object with runtime, app, loadTimes properties. Headless Chrome's window.chrome is thinner; non-Chrome browsers don't have it at all. Anti-bot looks for the right shape.
WebGL fingerprint
const canvas = document.createElement('canvas');
const gl = canvas.getContext('webgl');
const debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
const renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
// "ANGLE (Intel, Mesa Intel(R) HD Graphics ...)" in real browsers
// "Google Inc. (Google)" or empty in headless
The WebGL renderer string differs between headless and real GPUs. Catalog108's /challenges/antibot/canvas-fingerprint covers this.
Canvas fingerprint
Same Canvas operations produce slightly different pixel outputs on different GPU/OS combos. Real browsers have a stable, identifiable fingerprint. Headless browsers (no GPU, software rasterizer) have a different fingerprint. Anti-bot computes a hash of canvas output and compares against known-headless hashes. Lesson 2.28.
AudioContext fingerprint
Same as canvas, AudioContext has device-specific output. Some sites use it as a stable identifier.
Timing and behaviour
These are the hardest to fake:
- Mouse movement entropy. Real users move the mouse with continuous, organic paths. Scrapers either don't move at all or jump straight to elements. Anti-bot tracks
mousemoveevents and checks their distribution. - Keystroke intervals. Real typing has variable inter-key intervals (40-150ms with outliers).
page.type(text, delay=50)produces uniform 50ms intervals, detectable. - Page-load timing. Real browsers spend variable time on each page (read, scroll, hover). Scrapers blast through. Per-IP or per-session page-rate is a tell.
- Touchscreen / pointer events. Real users on touch devices generate pointer/touch events. Headless desktop never does.
- Performance API.
performance.timingshows different patterns under headless vs headed.
TLS fingerprint (JA3 / JA4)
The TLS handshake, cipher suites, extensions, signature algorithms, is browser-specific. Python requests has a Python TLS fingerprint that doesn't match Chrome's. Even with a perfect Chrome User-Agent, the JA3 hash gives you away.
Playwright drives a real Chromium, so its TLS fingerprint matches Chrome. But Python HTTP scrapers that try to look like Chrome get caught here. Sub-Path 5 has a full lesson; for browser automation, this is mostly free.
HTTP/2 fingerprint
HTTP/2 frame ordering and settings differ per browser. Like JA3, Playwright matches Chrome here automatically.
Hardware concurrency
navigator.hardwareConcurrency reports CPU core count. Headless containers often report 1-2; real machines report 4-16. A navigator.hardwareConcurrency === 1 is suspicious.
Notification permissions
Notification.permission === "default" // real Chrome
Notification.permission === "denied" // headless Chrome
A trivial check, but it works.
The full detection cocktail
Production anti-bot scores every visitor across 30-50 signals. Each signal contributes a few percent. A user with webdriver=true AND default UA AND no plugins is unambiguously a bot. A user with one suspicious signal and 30 normal ones gets through.
Stealth is statistics: minimise your score across as many signals as possible.
Testing your own scraper
https://bot.sannysoft.com/ is the canonical "show me what you look like" page, green for human-like, red for headless. Catalog108's /challenges/antibot/webdriver-detected is a focused version.
Or in DevTools console:
console.log("webdriver:", navigator.webdriver);
console.log("plugins:", navigator.plugins.length);
console.log("languages:", navigator.languages);
console.log("ua:", navigator.userAgent);
console.log("hardwareConcurrency:", navigator.hardwareConcurrency);
Run that in your scraper via page.evaluate("...") and check the output. Each red flag is a fix away.
The limits of stealth
You can fake static signals (webdriver, plugins, UA). Behavioural signals, mouse paths, timing distributions, require actual emulation, which is slow and imperfect. For premium anti-bot (Akamai, Kasada, DataDome), pure stealth often isn't enough; you also need realistic interaction patterns or paid SERP/proxy services. Sub-Path 5 covers the broader picture.
Hands-on lab
Open /challenges/antibot/webdriver-detected in a vanilla Playwright session, the page should flag you as a bot. Note which signal it caught (the page tells you). Then apply the fix manually (set the property via add_init_script or override the user agent). Re-run and see whether you pass. This is the diagnostic skill, the next two lessons automate it with plugins.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/antibot/webdriver-detectedQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.