Guide
WebSocket Bot Detection Techniques and How to Bypass Them
Learn how anti-bot systems use WebSocket connections for real-time bot detection and how to handle WebSocket-based challenges when scraping.
Some anti-bot systems use WebSocket connections to continuously monitor browser behavior in real time. This is harder to bypass than traditional HTTP-based detection because it maintains a persistent connection.
How WebSocket Bot Detection Works
Instead of one-time challenge-response checks, WebSocket-based detection:
- Opens a Persistent Connection, The detection script connects to a WebSocket server on page load
- Streams Behavioral Data, Mouse movements, scroll events, and keystrokes are sent continuously
- Receives Instructions, The server can request additional checks or trigger challenges in real time
- Validates Consistency, The stream of events is analyzed for bot-like patterns (perfectly even intervals, no idle periods, impossible speeds)
Identifying WebSocket Detection
Check for WebSocket connections in your browser DevTools (Network tab, filter by WS):
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Monitor WebSocket connections
page.on("websocket", lambda ws: print(f"WebSocket opened: {ws.url}"))
page.goto("https://target-site.com")
page.wait_for_timeout(5000)
browser.close()
Handling WebSocket Detection with Playwright
The key is to let the WebSocket connection function normally while providing realistic behavioral signals.
from playwright.sync_api import sync_playwright
import random
import time
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://target-site.com")
# Simulate human-like behavior that WebSocket monitors report
for _ in range(10):
# Random mouse movements
x = random.randint(100, 1200)
y = random.randint(100, 700)
page.mouse.move(x, y)
time.sleep(random.uniform(0.1, 0.5))
# Simulate scrolling
page.mouse.wheel(0, random.randint(200, 500))
time.sleep(random.uniform(0.5, 1.5))
content = page.content()
print(content[:500])
browser.close()
Intercepting WebSocket Messages
For debugging, you can intercept WebSocket frames:
def handle_ws(ws):
ws.on("framesent", lambda payload: print(f"SENT: {payload[:100]}"))
ws.on("framereceived", lambda payload: print(f"RECV: {payload[:100]}"))
page.on("websocket", handle_ws)
The Simplest Solution
WebSocket-based detection is complex to bypass manually because you need to maintain realistic behavioral streams throughout the session. ScraperAPI with render=true handles these sites by running real browser sessions that naturally generate valid WebSocket traffic.
import requests
response = requests.get(
"http://api.scraperapi.com",
params={
"api_key": "YOUR_SCRAPERAPI_KEY",
"url": "https://ws-protected-site.com",
"render": "true"
}
)
Key Points
- WebSocket detection is becoming more common on high-value sites
- It requires continuous behavioral data, not just a one-time check
- Full browser automation with human-like behavior simulation is necessary
- Managed APIs are the most practical solution for WebSocket-protected sites