Socket.IO and SignalR Protocols
Not all real-time is raw WebSockets. Socket.IO and SignalR are higher-level protocols with their own handshakes, message framing, and quirks.
What you’ll learn
- Recognise Socket.IO and SignalR handshake patterns.
- Connect using protocol-aware libraries (python-socketio, signalrcore).
- Subscribe to events and consume messages.
- Handle reconnection, ack, and namespace patterns.
Raw WebSockets are the transport. Socket.IO (Node.js / browsers) and SignalR (.NET) are higher-level libraries that ride on top of WebSockets (and sometimes long-polling fallbacks) to provide events, rooms, reconnection, and serialization. They're widely deployed; scrapers must recognize and speak them.
This lesson is the protocol-aware playbook.
Socket.IO
Used by: Node.js apps, many JS SPAs, chat applications, real-time dashboards.
Signs in DevTools:
- WebSocket URLs containing
/socket.io/?EIO=4&transport=websocket. - Messages prefixed with digits:
0,2[…],42[…],40,41, etc. - First message often
0{"sid":"abc","upgrades":[],"pingInterval":25000,"pingTimeout":20000}.
The numeric prefixes are Engine.IO frame codes:
0, open1, close2, ping3, pong4, message- Then Socket.IO packet types:
0connect,1disconnect,2event,3ack, etc.
So 42["hello","world"] is "engine.io message, socket.io event, named 'hello' with arg 'world'."
Catalog108 Socket.IO shim
/api/ws/socketio and /challenges/api/websocket/socketio simulate Socket.IO endpoints via polling so you can practice without WebSocket infrastructure.
Python, python-socketio
import socketio
sio = socketio.Client()
@sio.event
def connect():
print("connected")
sio.emit("subscribe", {"room": "prices"})
@sio.on("price_tick")
def on_tick(data):
print("tick:", data)
@sio.event
def disconnect():
print("disconnected")
sio.connect("https://example.com")
sio.wait()
The library handles the Engine.IO/Socket.IO framing, reconnection, and ack management. You write event handlers; it takes care of the protocol mechanics.
For a real-world target it's almost always the right tool, far cheaper than reverse-engineering the framing manually.
Async variant
import asyncio, socketio
sio = socketio.AsyncClient()
@sio.event
async def connect():
print("connected")
await sio.emit("subscribe", {"channel": "prices"})
@sio.on("price_tick")
async def tick(data):
print(data)
async def main():
await sio.connect("https://example.com")
await sio.wait()
asyncio.run(main())
Auth on Socket.IO
sio.connect("https://example.com", auth={"token": "Bearer abc123"})
# or
sio.connect("https://example.com", headers={"Authorization": f"Bearer {token}"})
# or pass via query
sio.connect("https://example.com?token=abc123")
Depends on the server implementation. Capture a real connection in DevTools, mirror.
Namespaces and rooms
Socket.IO supports namespaces (think paths: /admin, /chat) and rooms within them. Specify in connect:
sio.connect("https://example.com", namespaces=["/chat"])
@sio.on("message", namespace="/chat")
def msg(data):
print(data)
Capture in DevTools, namespaces appear in the URL: /socket.io/?EIO=4&transport=websocket&NSP=/chat.
SignalR
Used by: .NET/ASP.NET apps, Microsoft products, many enterprise dashboards.
Signs in DevTools:
- Endpoints under
/hubs/<hubname>or/signalr/. - Initial GET to
/hubs/<name>/negotiate?negotiateVersion=1. - Subsequent WebSocket to
/hubs/<name>?id=<token>. - Messages framed with
\x1e(RecordSeparator) as delimiter.
The negotiation returns connection metadata + supported transports. Then the WS opens; messages exchanged.
Python, signalrcore
from signalrcore.hub_connection_builder import HubConnectionBuilder
hub = (
HubConnectionBuilder()
.with_url("https://example.com/hubs/prices", options={
"access_token_factory": lambda: token,
})
.with_automatic_reconnect({"type": "raw", "keep_alive_interval": 10, "reconnect_interval": 5, "max_attempts": 5})
.build()
)
hub.on("priceTick", lambda args: print("tick:", args))
hub.on_open(lambda: print("connected"))
hub.start()
hub.send("subscribe", ["BTC-USD"]) # invoke server method
import time
while True: time.sleep(1)
signalrcore handles the negotiation, WebSocket, message framing, and reconnection.
Manual decoding (advanced)
If no library exists for your language:
import re
def decode_socketio(frame: str):
# Frame: digits prefix, then optional JSON
m = re.match(r"^(\d+)(\[.*\])?$", frame)
if not m: return None
code = m.group(1)
payload = m.group(2)
return {"code": code, "payload": payload and __import__("json").loads(payload)}
print(decode_socketio('42["price_tick",{"symbol":"BTC","price":42000}]'))
# → {"code": "42", "payload": ["price_tick", {"symbol": "BTC", "price": 42000}]}
For SignalR you split on \x1e and JSON-parse each chunk.
This is the protocol-stripping approach. Use it only when no library is available.
Reconnection semantics
Both protocols support reconnection:
- Socket.IO, auto-reconnect with backoff is built into client libraries; resubscribes are NOT automatic, you must re-emit subscribes after
connectevent fires. - SignalR, explicit
with_automatic_reconnectconfiguration insignalrcore. Hub state must be re-established post-reconnect.
Treat every connect event as a fresh start: re-subscribe, re-authenticate (if first-message auth), re-join rooms.
Acks
Socket.IO has acks:
# emit with ack callback
sio.emit("get_price", "BTC-USD", callback=lambda data: print("ack:", data))
The server sends back an ack response. Useful for request-response patterns over WebSocket (much like HTTP, but on the persistent connection).
When to use these vs raw WS
Always use the protocol-aware library if the server uses Socket.IO or SignalR:
- It handles framing.
- It handles reconnection and ack.
- It handles transport fallback (long-polling if WS fails).
- It produces cleaner, more maintainable scraper code.
Manual decoding is a last resort.
Hands-on lab
On Catalog108: hit /challenges/api/websocket/socketio and inspect the responses. Note the Socket.IO-style framing. Use python-socketio (in client mode) to connect to a real Socket.IO server like https://socket.io (the project's own demo). Subscribe to an event, watch messages flow. The protocol mechanics are now muscle memory.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/api/websocket/socketioQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.