Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

4.11intermediate4 min read

Symfony Messenger, Async Jobs and Queues

Push scraping work off the main process into queue workers. Messenger is the Symfony component that makes distributed scraping straightforward.

What you’ll learn

  • Define a Message and a MessageHandler.
  • Configure transports for Redis, Doctrine, and in-memory.
  • Dispatch from a Console command, consume in a worker.

A real scraper rarely runs as one synchronous loop. You discover URLs, queue them, workers pick them up, fetch and parse, push results to another queue for enrichment, store. Messenger is how Symfony does this.

The three pieces

Piece Role
Message A plain DTO carrying job data
Handler The class that does the work
Transport Where messages live in transit (Redis, AMQP, Doctrine, in-memory)

A minimal message

<?php
// src/Message/ScrapeProductMessage.php
namespace App\Message;

final readonly class ScrapeProductMessage
{
  public function __construct(public string $url) {}
}

Just a DTO. No base class. No interface required. Plain PHP.

A minimal handler

<?php
// src/MessageHandler/ScrapeProductHandler.php
namespace App\MessageHandler;

use App\Message\ScrapeProductMessage;
use Symfony\Component\Messenger\Attribute\AsMessageHandler;
use Symfony\Contracts\HttpClientInterface;

#[AsMessageHandler]
final class ScrapeProductHandler
{
  public function __construct(
  private readonly HttpClientInterface $catalog108Client,
  ) {}

  public function __invoke(ScrapeProductMessage $msg): void
  {
  $resp = $this->catalog108Client->request('GET', $msg->url);
  // parse, persist, etc.
  }
}

#[AsMessageHandler] registers the class. The __invoke parameter type tells Messenger which message it handles. One handler per message.

Transport configuration

# config/packages/messenger.yaml
framework:
  messenger:
  transports:
  scrape: '%env(MESSENGER_TRANSPORT_DSN)%'
  scrape_failed: 'doctrine://default?queue_name=failed'

  routing:
  App\Message\ScrapeProductMessage: scrape

  failure_transport: scrape_failed

Then in .env:

MESSENGER_TRANSPORT_DSN=redis://localhost:6379/messages
# or doctrine://default
# or amqp://guest:guest@localhost:5672/%2f/messages

Common DSNs:

  • doctrine://default, Postgres/MySQL backed; zero new infrastructure if you already have Doctrine. Slower but reliable.
  • redis://..., fast, low-latency. Good default for medium scale.
  • amqp://..., RabbitMQ. Complex routing, fanout, retry exchanges.
  • in-memory://, for tests.

Dispatching from a command

public function __construct(
  private readonly MessageBusInterface $bus,
  private readonly HttpClientInterface $catalog108Client,
) {}

protected function execute(InputInterface $i, OutputInterface $o): int
{
  $resp = $this->catalog108Client->request('GET', '/products');
  $crawler = new Crawler($resp->getContent());
  foreach ($crawler->filter('.product-card a') as $a) {
  $url = $a->getAttribute('href');
  $this->bus->dispatch(new ScrapeProductMessage($url));
  }
  return Command::SUCCESS;
}

The listing command discovers URLs and queues fetches. The handler does the actual scraping. Each runs as a separate process, the discovery command is fast, the workers run in parallel.

Running workers

php bin/console messenger:consume scrape --limit=1000 --time-limit=3600

Useful flags:

  • --limit=N, handle N messages then exit.
  • --time-limit=N, stop after N seconds.
  • --memory-limit=128M, restart if memory exceeds.
  • --queues=high,low, only consume named queues.

Workers should be restartable. Run them under systemd/supervisord/Docker, set --limit and --time-limit so they exit periodically (avoiding memory leaks). The supervisor restarts them. This pattern is the standard production setup.

Retries and failure

Configure per-transport retries:

transports:
  scrape:
  dsn: '%env(MESSENGER_TRANSPORT_DSN)%'
  retry_strategy:
  max_retries: 3
  delay: 1000
  multiplier: 2
  max_delay: 60000

Failed messages (after retries exhausted) go to the failure_transport. Inspect with:

php bin/console messenger:failed:show
php bin/console messenger:failed:retry
php bin/console messenger:failed:remove 42

Manual retry, individual or bulk. Investigate before retrying, if all 50,000 failures have the same exception, fix the bug then bulk retry.

Middleware in Messenger

Messenger has its own middleware (different from HttpClient's). Common built-ins:

  • DoctrineTransactionMiddleware, wraps each handler in a transaction.
  • RouterContextMiddleware, preserves URL generation context across async handlers.
  • ValidationMiddleware, runs Validator constraints on the message.

Custom middleware can log every message, record metrics, or implement application-specific behavior.

Idempotency

Workers can retry. Handlers MUST be idempotent. Concrete: writing to Postgres with ON CONFLICT DO UPDATE is safe; appending to a log file twice is not. Design handlers to be safely re-runnable.

Sync vs async

For testing, route messages to the synchronous transport:

framework:
  messenger:
  transports:
  scrape: 'sync://'

Or use messenger.transport.sync per env. Tests run end-to-end (dispatch → handler) in one process; production routes to Redis/AMQP.

When NOT to use Messenger

  • Single-threaded one-off scripts: don't add the complexity.
  • Scraping one site nightly with a single Console command: cron the command directly.
  • Very high-throughput pure I/O without DB hits: a Scrapy/asyncio process might be leaner.

Messenger shines when:

  • You need to fan out work across multiple machines.
  • Different stages of the pipeline have different throughput.
  • Some jobs are slow (image downloads) and some are fast (URL discovery).

Hands-on lab

Build a two-stage scraper for Catalog108 /products:

  1. scrape:discover, Console command fetches /products, dispatches one ScrapeProductMessage per product URL.
  2. ScrapeProductHandler, handles each message: fetches the product page, persists via Doctrine.
  3. Run a worker: messenger:consume scrape --limit=100.

Watch the queue drain. Add a deliberate delay to the handler to feel the difference between dispatch and consumption.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /products

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Symfony Messenger, Async Jobs and Queues1 / 8

What does the `#[AsMessageHandler]` attribute do?

Score so far: 0 / 0