Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Guide

WebSocket Bot Detection Techniques and How to Bypass Them

Learn how anti-bot systems use WebSocket connections for real-time bot detection and how to handle WebSocket-based challenges when scraping.

Some anti-bot systems use WebSocket connections to continuously monitor browser behavior in real time. This is harder to bypass than traditional HTTP-based detection because it maintains a persistent connection.

How WebSocket Bot Detection Works

Instead of one-time challenge-response checks, WebSocket-based detection:

  1. Opens a Persistent Connection, The detection script connects to a WebSocket server on page load
  2. Streams Behavioral Data, Mouse movements, scroll events, and keystrokes are sent continuously
  3. Receives Instructions, The server can request additional checks or trigger challenges in real time
  4. Validates Consistency, The stream of events is analyzed for bot-like patterns (perfectly even intervals, no idle periods, impossible speeds)

Identifying WebSocket Detection

Check for WebSocket connections in your browser DevTools (Network tab, filter by WS):

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()

    # Monitor WebSocket connections
    page.on("websocket", lambda ws: print(f"WebSocket opened: {ws.url}"))

    page.goto("https://target-site.com")
    page.wait_for_timeout(5000)

    browser.close()

Handling WebSocket Detection with Playwright

The key is to let the WebSocket connection function normally while providing realistic behavioral signals.

from playwright.sync_api import sync_playwright
import random
import time

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.goto("https://target-site.com")

    # Simulate human-like behavior that WebSocket monitors report
    for _ in range(10):
        # Random mouse movements
        x = random.randint(100, 1200)
        y = random.randint(100, 700)
        page.mouse.move(x, y)
        time.sleep(random.uniform(0.1, 0.5))

    # Simulate scrolling
    page.mouse.wheel(0, random.randint(200, 500))
    time.sleep(random.uniform(0.5, 1.5))

    content = page.content()
    print(content[:500])
    browser.close()

Intercepting WebSocket Messages

For debugging, you can intercept WebSocket frames:

def handle_ws(ws):
    ws.on("framesent", lambda payload: print(f"SENT: {payload[:100]}"))
    ws.on("framereceived", lambda payload: print(f"RECV: {payload[:100]}"))

page.on("websocket", handle_ws)

The Simplest Solution

WebSocket-based detection is complex to bypass manually because you need to maintain realistic behavioral streams throughout the session. ScraperAPI with render=true handles these sites by running real browser sessions that naturally generate valid WebSocket traffic.

import requests

response = requests.get(
    "http://api.scraperapi.com",
    params={
        "api_key": "YOUR_SCRAPERAPI_KEY",
        "url": "https://ws-protected-site.com",
        "render": "true"
    }
)

Key Points

  • WebSocket detection is becoming more common on high-value sites
  • It requires continuous behavioral data, not just a one-time check
  • Full browser automation with human-like behavior simulation is necessary
  • Managed APIs are the most practical solution for WebSocket-protected sites