Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

3.44advanced4 min read

Socket.IO and SignalR Protocols

Not all real-time is raw WebSockets. Socket.IO and SignalR are higher-level protocols with their own handshakes, message framing, and quirks.

What you’ll learn

  • Recognise Socket.IO and SignalR handshake patterns.
  • Connect using protocol-aware libraries (python-socketio, signalrcore).
  • Subscribe to events and consume messages.
  • Handle reconnection, ack, and namespace patterns.

Raw WebSockets are the transport. Socket.IO (Node.js / browsers) and SignalR (.NET) are higher-level libraries that ride on top of WebSockets (and sometimes long-polling fallbacks) to provide events, rooms, reconnection, and serialization. They're widely deployed; scrapers must recognize and speak them.

This lesson is the protocol-aware playbook.

Socket.IO

Used by: Node.js apps, many JS SPAs, chat applications, real-time dashboards.

Signs in DevTools:

  • WebSocket URLs containing /socket.io/?EIO=4&transport=websocket.
  • Messages prefixed with digits: 0, 2[…], 42[…], 40, 41, etc.
  • First message often 0{"sid":"abc","upgrades":[],"pingInterval":25000,"pingTimeout":20000}.

The numeric prefixes are Engine.IO frame codes:

  • 0, open
  • 1, close
  • 2, ping
  • 3, pong
  • 4, message
  • Then Socket.IO packet types: 0 connect, 1 disconnect, 2 event, 3 ack, etc.

So 42["hello","world"] is "engine.io message, socket.io event, named 'hello' with arg 'world'."

Catalog108 Socket.IO shim

/api/ws/socketio and /challenges/api/websocket/socketio simulate Socket.IO endpoints via polling so you can practice without WebSocket infrastructure.

Python, python-socketio

import socketio

sio = socketio.Client()

@sio.event
def connect():
  print("connected")
  sio.emit("subscribe", {"room": "prices"})

@sio.on("price_tick")
def on_tick(data):
  print("tick:", data)

@sio.event
def disconnect():
  print("disconnected")

sio.connect("https://example.com")
sio.wait()

The library handles the Engine.IO/Socket.IO framing, reconnection, and ack management. You write event handlers; it takes care of the protocol mechanics.

For a real-world target it's almost always the right tool, far cheaper than reverse-engineering the framing manually.

Async variant

import asyncio, socketio

sio = socketio.AsyncClient()

@sio.event
async def connect():
  print("connected")
  await sio.emit("subscribe", {"channel": "prices"})

@sio.on("price_tick")
async def tick(data):
  print(data)

async def main():
  await sio.connect("https://example.com")
  await sio.wait()

asyncio.run(main())

Auth on Socket.IO

sio.connect("https://example.com", auth={"token": "Bearer abc123"})
# or
sio.connect("https://example.com", headers={"Authorization": f"Bearer {token}"})
# or pass via query
sio.connect("https://example.com?token=abc123")

Depends on the server implementation. Capture a real connection in DevTools, mirror.

Namespaces and rooms

Socket.IO supports namespaces (think paths: /admin, /chat) and rooms within them. Specify in connect:

sio.connect("https://example.com", namespaces=["/chat"])

@sio.on("message", namespace="/chat")
def msg(data):
  print(data)

Capture in DevTools, namespaces appear in the URL: /socket.io/?EIO=4&transport=websocket&NSP=/chat.

SignalR

Used by: .NET/ASP.NET apps, Microsoft products, many enterprise dashboards.

Signs in DevTools:

  • Endpoints under /hubs/<hubname> or /signalr/.
  • Initial GET to /hubs/<name>/negotiate?negotiateVersion=1.
  • Subsequent WebSocket to /hubs/<name>?id=<token>.
  • Messages framed with \x1e (RecordSeparator) as delimiter.

The negotiation returns connection metadata + supported transports. Then the WS opens; messages exchanged.

Python, signalrcore

from signalrcore.hub_connection_builder import HubConnectionBuilder

hub = (
  HubConnectionBuilder()
  .with_url("https://example.com/hubs/prices", options={
  "access_token_factory": lambda: token,
  })
  .with_automatic_reconnect({"type": "raw", "keep_alive_interval": 10, "reconnect_interval": 5, "max_attempts": 5})
  .build()
)

hub.on("priceTick", lambda args: print("tick:", args))
hub.on_open(lambda: print("connected"))
hub.start()

hub.send("subscribe", ["BTC-USD"])  # invoke server method

import time
while True: time.sleep(1)

signalrcore handles the negotiation, WebSocket, message framing, and reconnection.

Manual decoding (advanced)

If no library exists for your language:

import re

def decode_socketio(frame: str):
  # Frame: digits prefix, then optional JSON
  m = re.match(r"^(\d+)(\[.*\])?$", frame)
  if not m: return None
  code = m.group(1)
  payload = m.group(2)
  return {"code": code, "payload": payload and __import__("json").loads(payload)}

print(decode_socketio('42["price_tick",{"symbol":"BTC","price":42000}]'))
# → {"code": "42", "payload": ["price_tick", {"symbol": "BTC", "price": 42000}]}

For SignalR you split on \x1e and JSON-parse each chunk.

This is the protocol-stripping approach. Use it only when no library is available.

Reconnection semantics

Both protocols support reconnection:

  • Socket.IO, auto-reconnect with backoff is built into client libraries; resubscribes are NOT automatic, you must re-emit subscribes after connect event fires.
  • SignalR, explicit with_automatic_reconnect configuration in signalrcore. Hub state must be re-established post-reconnect.

Treat every connect event as a fresh start: re-subscribe, re-authenticate (if first-message auth), re-join rooms.

Acks

Socket.IO has acks:

# emit with ack callback
sio.emit("get_price", "BTC-USD", callback=lambda data: print("ack:", data))

The server sends back an ack response. Useful for request-response patterns over WebSocket (much like HTTP, but on the persistent connection).

When to use these vs raw WS

Always use the protocol-aware library if the server uses Socket.IO or SignalR:

  • It handles framing.
  • It handles reconnection and ack.
  • It handles transport fallback (long-polling if WS fails).
  • It produces cleaner, more maintainable scraper code.

Manual decoding is a last resort.

Hands-on lab

On Catalog108: hit /challenges/api/websocket/socketio and inspect the responses. Note the Socket.IO-style framing. Use python-socketio (in client mode) to connect to a real Socket.IO server like https://socket.io (the project's own demo). Subscribe to an event, watch messages flow. The protocol mechanics are now muscle memory.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/api/websocket/socketio

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Socket.IO and SignalR Protocols1 / 8

Socket.IO messages are framed with numeric prefixes. What does `42["price_tick", {...}]` mean?

Score so far: 0 / 0