Understanding and Bypassing DataDome
Learn how DataDome's anti-bot system works and strategies to scrape DataDome-protected websites.
DataDome is a premium anti-bot service used by major e-commerce and media sites. It is significantly harder to bypass than Cloudflare because it relies heavily on client-side signals.
How DataDome Works
DataDome injects a JavaScript tag that collects hundreds of signals from the browser:
- Device fingerprint, screen resolution, GPU, installed fonts, audio context
- TLS fingerprint, the JA3 hash of your SSL handshake
- Behavioral signals, mouse movement patterns, keystroke dynamics, touch events
- Cookie validation, DataDome sets a
datadomecookie that must be present on subsequent requests - Request patterns, timing, ordering, and header consistency
Identifying DataDome Protection
Look for these signs:
# Response headers
Server: DataDome
# Cookie
Set-Cookie: datadome=...
# Blocked page URL pattern
https://geo.captcha-delivery.com/captcha/...
Why Simple Approaches Fail
Plain HTTP requests fail because DataDome requires:
- A valid
datadomecookie generated by JavaScript execution - Consistent TLS fingerprinting (Python's
requestslibrary has a recognizable TLS signature) - Behavioral data collected by the injected JS
Method 1: curl_cffi for TLS Fingerprint Impersonation
from curl_cffi import requests
# curl_cffi can impersonate real browser TLS fingerprints
session = requests.Session(impersonate="chrome124")
response = session.get("https://datadome-protected-site.com")
print(response.status_code)
# The session preserves cookies including the datadome cookie
response2 = session.get("https://datadome-protected-site.com/page2")
print(response2.status_code)
Install with pip install curl_cffi. This works because it uses a real browser TLS stack rather than Python's default.
Method 2: Playwright with Careful Configuration
from playwright.sync_api import sync_playwright
import time
import random
def scrape_datadome_site(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
locale="en-US",
)
page = context.new_page()
# Navigate and wait for DataDome JS to execute
page.goto(url, wait_until="networkidle")
time.sleep(random.uniform(2, 4))
# Simulate human-like mouse movement
page.mouse.move(random.randint(100, 800), random.randint(100, 600))
time.sleep(random.uniform(0.5, 1.5))
content = page.content()
browser.close()
return content
Recommended: Use a Scraping API
DataDome is one of the hardest anti-bot systems to bypass reliably. Services like ScraperAPI and ScrapingAnt maintain dedicated infrastructure for bypassing DataDome at scale, saving you significant development time.
import requests
API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://datadome-protected-site.com"
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&render=true"
)
Key Takeaways
- DataDome is harder to bypass than Cloudflare; expect lower success rates with DIY solutions
- TLS fingerprint impersonation (
curl_cffi) is essential for HTTP-level approaches - A real browser with behavioral simulation gives the best DIY results
- For production scraping, a managed API service is the most cost-effective option