Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Understanding and Bypassing DataDome

Learn how DataDome's anti-bot system works and strategies to scrape DataDome-protected websites.

Anti-Detection · #4advanced3 min read
Share:WhatsAppLinkedIn

DataDome is a premium anti-bot service used by major e-commerce and media sites. It is significantly harder to bypass than Cloudflare because it relies heavily on client-side signals.

How DataDome Works

DataDome injects a JavaScript tag that collects hundreds of signals from the browser:

  • Device fingerprint, screen resolution, GPU, installed fonts, audio context
  • TLS fingerprint, the JA3 hash of your SSL handshake
  • Behavioral signals, mouse movement patterns, keystroke dynamics, touch events
  • Cookie validation, DataDome sets a datadome cookie that must be present on subsequent requests
  • Request patterns, timing, ordering, and header consistency

Identifying DataDome Protection

Look for these signs:

# Response headers
Server: DataDome

# Cookie
Set-Cookie: datadome=...

# Blocked page URL pattern
https://geo.captcha-delivery.com/captcha/...

Why Simple Approaches Fail

Plain HTTP requests fail because DataDome requires:

  1. A valid datadome cookie generated by JavaScript execution
  2. Consistent TLS fingerprinting (Python's requests library has a recognizable TLS signature)
  3. Behavioral data collected by the injected JS

Method 1: curl_cffi for TLS Fingerprint Impersonation

from curl_cffi import requests

# curl_cffi can impersonate real browser TLS fingerprints
session = requests.Session(impersonate="chrome124")

response = session.get("https://datadome-protected-site.com")
print(response.status_code)

# The session preserves cookies including the datadome cookie
response2 = session.get("https://datadome-protected-site.com/page2")
print(response2.status_code)

Install with pip install curl_cffi. This works because it uses a real browser TLS stack rather than Python's default.

Method 2: Playwright with Careful Configuration

from playwright.sync_api import sync_playwright
import time
import random

def scrape_datadome_site(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = context.new_page()

        # Navigate and wait for DataDome JS to execute
        page.goto(url, wait_until="networkidle")
        time.sleep(random.uniform(2, 4))

        # Simulate human-like mouse movement
        page.mouse.move(random.randint(100, 800), random.randint(100, 600))
        time.sleep(random.uniform(0.5, 1.5))

        content = page.content()
        browser.close()
        return content

Recommended: Use a Scraping API

DataDome is one of the hardest anti-bot systems to bypass reliably. Services like ScraperAPI and ScrapingAnt maintain dedicated infrastructure for bypassing DataDome at scale, saving you significant development time.

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://datadome-protected-site.com"
resp = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"
)

Key Takeaways

  • DataDome is harder to bypass than Cloudflare; expect lower success rates with DIY solutions
  • TLS fingerprint impersonation (curl_cffi) is essential for HTTP-level approaches
  • A real browser with behavioral simulation gives the best DIY results
  • For production scraping, a managed API service is the most cost-effective option