RabbitMQ for Complex Routing
When Redis lists aren't enough: RabbitMQ's exchanges, routing keys, fanout, and dead-letter queues. The heavyweight broker for complex pipelines.
What you’ll learn
- Distinguish RabbitMQ's exchange types and when each fits.
- Set up routing with topic/direct exchanges.
- Use dead-letter queues for failure handling.
RabbitMQ is the canonical AMQP broker. More features than Redis lists, more operational weight. For complex scraping pipelines with branching, conditional routing, multiple consumers, it's worth the cost.
When to reach for RabbitMQ
- Multi-stage pipelines with branching. Items routed by category, region, or quality go to different handlers.
- Fanout to multiple consumers. Same event triggers multiple downstream actions (scrape → analytics, scrape → search index, scrape → alert).
- Dead-letter queues with re-routing. Failed messages auto-routed for manual inspection.
- Cross-team or cross-service messaging. RabbitMQ supports many languages well.
For straightforward "queue of URLs, pop and fetch," Redis or Symfony Messenger over Redis is simpler.
The four moving parts
| Concept | What it does |
|---|---|
| Producer | Sends a message with a routing key |
| Exchange | Receives the message, decides which queue(s) get it |
| Queue | Holds messages until consumed |
| Consumer | Pulls and processes |
The exchange-routing-queue split is what makes RabbitMQ powerful. Redis lists are basically queue only.
Exchange types
-
Direct. Routes by exact match on routing key.
routing_key="us"goes to queues bound with"us". -
Topic. Routes by pattern matching.
routing_key="us.products.update"matches binding"us.*.update"or"#.update". -
Fanout. Routes to all bound queues, ignoring routing key. Pure broadcast.
-
Headers. Routes by header values (less common).
For scraping pipelines, topic exchanges are usually the right choice, flexible patterns like "region.category.action" route to specialized consumers.
A multi-stage scraping pipeline
Producer → [scrape exchange] → fetch_queue → fetch workers
→ parse_queue → parse workers (after fetch)
→ store_queue → store workers (after parse)
Connecting stages via exchanges:
# Python with aio-pika
import aio_pika
async def main():
conn = await aio_pika.connect_robust("amqp://guest:guest@localhost/")
channel = await conn.channel()
# Declare exchange + queues
scrape_ex = await channel.declare_exchange("scrape", type="topic", durable=True)
fetch_q = await channel.declare_queue("fetch", durable=True)
await fetch_q.bind(scrape_ex, routing_key="scrape.fetch.*")
# Publish
msg = aio_pika.Message(b'{"url": "..."}', delivery_mode=aio_pika.DeliveryMode.PERSISTENT)
await scrape_ex.publish(msg, routing_key="scrape.fetch.product")
# Consume
async def consumer():
queue = await channel.get_queue("fetch")
async with queue.iterator() as q:
async for message in q:
async with message.process(): # auto-ack on success, requeue on exception
process(message.body)
message.process() is the idiomatic ack pattern in aio-pika.
Dead-letter queues
For messages that fail repeatedly, route them to a "dead" queue for inspection:
fetch_q = await channel.declare_queue(
"fetch",
durable=True,
arguments={
"x-dead-letter-exchange": "scrape_dlx",
"x-dead-letter-routing-key": "fetch.dead",
"x-message-ttl": 60_000, # 60s; after this, message is "dead"
},
)
dead_ex = await channel.declare_exchange("scrape_dlx", type="direct", durable=True)
dead_q = await channel.declare_queue("dead", durable=True)
await dead_q.bind(dead_ex, "fetch.dead")
Messages rejected without requeue, or sitting unconsumed past TTL, flow to the dead queue. Operators inspect, fix, re-publish.
Symfony Messenger with RabbitMQ
framework:
messenger:
transports:
scrape: '%env(MESSENGER_TRANSPORT_DSN)%' # amqp://guest:guest@rabbitmq:5672/%2f/messages
routing:
App\Message\FetchMessage: scrape
failure_transport: scrape_failed
framework:
messenger:
transports:
scrape_failed:
dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
options:
exchange:
name: failed
type: direct
Symfony Messenger abstracts most of the AMQP details, declare a transport, route messages by class, consume. The exchange/queue ergonomics happen under the hood.
Manual ACK and requeue
For finer control, manual ack lets you decide:
async with queue.iterator(no_ack=False) as q:
async for message in q:
try:
process(message.body)
await message.ack()
except TransientError:
await message.nack(requeue=True) # back to queue
except FatalError:
await message.reject(requeue=False) # dead-letter
Useful when failure types matter: transient → retry, permanent → dead-letter.
Persistence and reliability
msg = aio_pika.Message(
b'...',
delivery_mode=aio_pika.DeliveryMode.PERSISTENT,
)
PERSISTENT writes to disk; survives broker restarts. The pair: durable queues + persistent messages + publisher confirms = exactly-the-data-survives-a-crash guarantees.
Without persistence, RabbitMQ crashes lose in-memory messages. For production scraping where every URL matters, always persistent.
Publisher confirms
For "did my publish succeed?" certainty:
channel.publisher_confirms = True
await exchange.publish(msg, routing_key="...")
# raises if broker didn't ack
Without confirms, publish returns immediately; if the broker is down, the message is lost silently. Confirms are slower (per-message round-trip) but make the producer side reliable.
Monitoring
RabbitMQ ships a management UI at http://broker:15672/. Shows:
- Queue depths and rates.
- Connections, channels, consumers.
- Exchange bindings.
- Dead-letter activity.
For production, the management UI is essential. Combine with Prometheus exporter for time-series.
Operational considerations
RabbitMQ is more demanding than Redis:
- Disk I/O. Persistent messages hit disk; SSD strongly recommended.
- Memory pressure. Large queues trigger flow control. Pin queue sizes or page to disk.
- Cluster topology. Multi-node clusters add complexity for replication, leader election.
For single-node scraping use, RabbitMQ is fine on a $20/month VPS. For multi-region clusters, ops gets real.
Hands-on lab
Set up a topic-exchange-based pipeline:
- Producer publishes
scrape.fetch.product,scrape.fetch.reviewmessages. - Two queues, one bound to
scrape.fetch.product, another toscrape.fetch.review. - Different worker fleets per queue.
- Add a dead-letter queue for failures.
Watch the management UI as you push messages. Within an hour you've built complex routing that would take many Redis-list hacks to replicate.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.