Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

4.16intermediate4 min read

Symfony Lock and Rate Limiter for Polite Scraping

Two Symfony components that turn 'be polite to the target' from intention into enforcement. Distributed locks for one-scraper-per-domain; rate limiters for request-per-second caps.

What you’ll learn

  • Use Symfony Lock to ensure only one scraper instance runs at a time.
  • Apply RateLimiter to throttle requests per domain.
  • Honour robots.txt programmatically alongside these controls.

Politeness is not a feeling. It's enforcement. Two Symfony components turn "we should be polite" into code: Lock ensures only one scraper runs at a time per target; RateLimiter caps requests per second.

The Lock component

A lock prevents concurrent execution. Two flavors matter for scraping:

  1. Per-machine locks, flock over a file. Cheap, simple.
  2. Distributed locks, Redis, Postgres, Memcached. Survive across machines.
use Symfony\Component\Lock\LockFactory;
use Symfony\Component\Lock\Store\RedisStore;

$store = new RedisStore($redis);
$factory = new LockFactory($store);

$lock = $factory->createLock('scrape-catalog108', ttl: 3600);
if (!$lock->acquire()) {
  return; // another instance is running
}
try {
  // do the scrape
} finally {
  $lock->release();
}

ttl: 3600 is the safety net, if the lock holder crashes, after an hour Redis expires the key and another worker can take over. Choose the TTL > the expected scrape duration.

Symfony configuration

# config/packages/lock.yaml
framework:
  lock:
  scraping: '%env(REDIS_URL)%'

Inject:

public function __construct(
  #[Target('scraping.lock.store')] private LockFactory $factory,
) {}

When to use which lock

Lock type Use when
flock (file) Single machine; same Unix host
Redis Multiple machines, same Redis cluster
Doctrine (Postgres) You already have Postgres; no extra infra
Combined (lock.factory) Auto-pick based on configured stores

For distributed scraping, Redis is the default. Postgres works fine if Redis isn't around.

The RateLimiter component

Caps how often an operation can happen. Two algorithms ship out of the box:

  1. Fixed window, N per period. Simple. Allows bursts at window edges.
  2. Token bucket, N tokens replenished at rate R. Smoother.

Configure:

framework:
  rate_limiter:
  catalog108_scrape:
  policy: 'token_bucket'
  limit: 60
  rate: { interval: '1 minute', amount: 60 }

Inject:

public function __construct(
  private RateLimiterFactory $catalog108ScrapeLimiter,
) {}

public function fetch(string $url): string
{
  $limiter = $this->catalog108ScrapeLimiter->create('catalog108');
  $limiter->consume()->wait();  // blocks until a token is available
  return $this->http->request('GET', $url)->getContent();
}

consume()->wait() blocks for as long as needed to acquire a token. For non-blocking, check isAccepted() and either retry later or re-queue.

Per-domain limiters

Different domains, different limits:

$limiter = $this->factory->create($domain);

The limiter key namespaces tokens by domain. One bucket per host. The factory pools the configurations.

Wait vs queue

For Messenger handlers, blocking via wait() is usually wrong, you're tying up a worker. Better: check if the token is available; if not, re-dispatch the message with a delay.

public function __invoke(ScrapeProductMessage $msg): void
{
  $limiter = $this->factory->create($this->host($msg->url));
  if (!$limiter->consume()->isAccepted()) {
  $this->bus->dispatch(
  $msg,
  [new DelayStamp(2000)],  // try again in 2 seconds
  );
  return;
  }
  // do the fetch
}

This pattern keeps workers free for other work while honouring the limit.

robots.txt, read it, enforce it

There's no first-party Symfony component for robots.txt, but it's two lines with spatie/robots-txt:

use Spatie\Robots\Robots;

$robots = Robots::create('CatalogScraper/1.0');
if (!$robots->mayIndex('https://practice.scrapingcentral.com/account/dashboard')) {
  $this->logger->info('robots.txt disallows; skipping', ['url' => $url]);
  return;
}

Cache the robots.txt fetch per domain (it doesn't change often). A small middleware-style decorator on HttpClient can enforce it automatically.

Combining the three

A polite scrape combines all three:

public function scrape(string $url): ?string
{
  $host = parse_url($url, PHP_URL_HOST);

  // 1. robots.txt
  if (!$this->robots->mayIndex($url)) return null;

  // 2. per-host rate limit
  $limiter = $this->limiterFactory->create($host);
  $limiter->consume()->wait();

  // 3. per-domain lock (one scraper at a time)
  $lock = $this->lockFactory->createLock("scrape-$host", ttl: 3600);
  if (!$lock->acquire(blocking: true)) return null;

  try {
  return $this->http->request('GET', $url)->getContent();
  } finally {
  $lock->release();
  }
}

This wrapper is the contract: respects robots, respects rate, ensures only one scraper of this host runs. A 5-line abstraction with serious safety properties.

Rate limit signaling, the 429 loop

If a target returns 429 with a Retry-After header, that's the source of truth. Override your limiter for that window:

if ($response->getStatusCode() === 429) {
  $retryAfter = (int) ($response->getHeaders()['retry-after'][0] ?? 60);
  sleep($retryAfter);  // or re-queue with delay
}

The target is telling you to slow down. Listen.

When not to use these

For one-off scripts: skip. Lock and RateLimiter are infrastructure. A 50-line script doesn't need them.

For low-volume scrapers running once a day: rate-limiting is overkill. Just sleep a second between requests.

For systems that absolutely cannot block on locks: design for at-least-once delivery and use UPSERT for idempotency instead of mutual exclusion.

Hands-on lab

In your Symfony project:

  1. Configure a catalog108_scrape rate limiter at 30 req/min.
  2. Configure a Redis-backed lock store.
  3. Wrap HttpClient calls in a service that: checks robots.txt, consumes a rate-limit token, acquires a per-host lock.
  4. Run two messenger:consume workers in parallel. Watch them coordinate, only one should fetch from the target at a time.

That coordination, achieved without any scraper-specific code, is what Symfony's infrastructure components give you.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /robots.txt

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

Symfony Lock and Rate Limiter for Polite Scraping1 / 8

What does setting `ttl: 3600` on a Symfony Lock provide?

Score so far: 0 / 0