Understanding and Bypassing DataDome - Anti-Detection

Learn how DataDome's anti-bot system works and strategies to scrape DataDome-protected websites.

DataDome is a premium anti-bot service used by major e-commerce and media sites. It is significantly harder to bypass than Cloudflare because it relies heavily on client-side signals.

How DataDome Works

DataDome injects a JavaScript tag that collects hundreds of signals from the browser:

Device fingerprint, screen resolution, GPU, installed fonts, audio context
TLS fingerprint, the JA3 hash of your SSL handshake
Behavioral signals, mouse movement patterns, keystroke dynamics, touch events
Cookie validation, DataDome sets a datadome cookie that must be present on subsequent requests
Request patterns, timing, ordering, and header consistency

Identifying DataDome Protection

Look for these signs:

# Response headers
Server: DataDome

# Cookie
Set-Cookie: datadome=...

# Blocked page URL pattern
https://geo.captcha-delivery.com/captcha/...

Why Simple Approaches Fail

Plain HTTP requests fail because DataDome requires:

A valid datadome cookie generated by JavaScript execution
Consistent TLS fingerprinting (Python's requests library has a recognizable TLS signature)
Behavioral data collected by the injected JS

Method 1: curl_cffi for TLS Fingerprint Impersonation

from curl_cffi import requests

# curl_cffi can impersonate real browser TLS fingerprints
session = requests.Session(impersonate="chrome124")

response = session.get("https://datadome-protected-site.com")
print(response.status_code)

# The session preserves cookies including the datadome cookie
response2 = session.get("https://datadome-protected-site.com/page2")
print(response2.status_code)

Install with pip install curl_cffi. This works because it uses a real browser TLS stack rather than Python's default.

Method 2: Playwright with Careful Configuration

from playwright.sync_api import sync_playwright
import time
import random

def scrape_datadome_site(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = context.new_page()

        # Navigate and wait for DataDome JS to execute
        page.goto(url, wait_until="networkidle")
        time.sleep(random.uniform(2, 4))

        # Simulate human-like mouse movement
        page.mouse.move(random.randint(100, 800), random.randint(100, 600))
        time.sleep(random.uniform(0.5, 1.5))

        content = page.content()
        browser.close()
        return content

Recommended: Use a Scraping API

DataDome is one of the hardest anti-bot systems to bypass reliably. Services like ScraperAPI and ScrapingAnt maintain dedicated infrastructure for bypassing DataDome at scale, saving you significant development time.

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://datadome-protected-site.com"
resp = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"
)

Key Takeaways

DataDome is harder to bypass than Cloudflare; expect lower success rates with DIY solutions
TLS fingerprint impersonation (curl_cffi) is essential for HTTP-level approaches
A real browser with behavioral simulation gives the best DIY results
For production scraping, a managed API service is the most cost-effective option