Python Concurrency Control: Semaphores and Rate Limits, Production, Scale & Career

Bound concurrency, enforce request rates, honour 429 backoff. Three primitives that turn an async scraper into a polite, well-behaved one.

Async makes parallel fetches easy. Too easy. A naive asyncio.gather fires every request at once, which overwhelms the target, your network stack, and your patience when the bans start.

Three primitives turn a free-for-all into politeness: semaphores (concurrency cap), rate limiters (requests/sec), and backoff (retry strategy).

Semaphore, concurrency cap

import asyncio
import httpx

sem = asyncio.Semaphore(8)

async def fetch(client, url):
  async with sem:
  r = await client.get(url)
  return r.json()

Semaphore(8) allows 8 concurrent acquirers. When all are held, the next async with sem: waits. This is the cheapest, most effective concurrency control.

Per-host semaphores are common, different limits for different targets:

class HostLimiter:
  def __init__(self):
  self.sems = {}

  def get(self, host, limit=5):
  if host not in self.sems:
  self.sems[host] = asyncio.Semaphore(limit)
  return self.sems[host]

limiter = HostLimiter()

async def fetch(client, url):
  host = urlparse(url).netloc
  async with limiter.get(host):
  return await client.get(url)

Rate limit, requests per second

Semaphore caps parallel in-flight; rate limit caps requests per unit time. They're different. A scraper with semaphore=20 but each request taking 100ms still hammers a target at 200 req/sec.

aiolimiter provides a token-bucket limiter:

from aiolimiter import AsyncLimiter

limiter = AsyncLimiter(max_rate=30, time_period=60)  # 30/minute

async def fetch(client, url):
  async with limiter:
  return await client.get(url)

Tokens accrue at max_rate / time_period. Each async with limiter: consumes one. When empty, the coroutine waits.

For per-host rate limits, keep a dict of limiters:

class HostRateLimiter:
  def __init__(self, rate_per_min=30):
  self.rate = rate_per_min
  self.limiters = {}

  def get(self, host):
  if host not in self.limiters:
  self.limiters[host] = AsyncLimiter(self.rate, 60)
  return self.limiters[host]

Backoff on 429/5xx

import asyncio
import random
from httpx import HTTPStatusError

async def fetch_with_retry(client, url, max_retries=3):
  for attempt in range(max_retries + 1):
  try:
  r = await client.get(url)
  if r.status_code == 429:
  retry_after = int(r.headers.get("Retry-After", 60))
  await asyncio.sleep(retry_after)
  continue
  if 500 <= r.status_code < 600:
  raise HTTPStatusError(f"server {r.status_code}", request=r.request, response=r)
  r.raise_for_status()
  return r
  except (HTTPStatusError, httpx.TransportError) as e:
  if attempt == max_retries:
  raise
  delay = (2 ** attempt) + random.uniform(0, 1)
  await asyncio.sleep(delay)
  raise RuntimeError("unreachable")

Three behaviors layered:

429 with Retry-After. Honour the server's hint. No exponential backoff needed.
5xx. Server failure; retry with exponential backoff + jitter.
Transport error. Network glitch; same retry strategy.

Jitter (random.uniform) prevents the thundering herd, without it, every concurrent failed request retries at the same instant.

Combining all three

from urllib.parse import urlparse
from aiolimiter import AsyncLimiter

sem = asyncio.Semaphore(20)
rate_limiters: dict[str, AsyncLimiter] = {}

def limiter_for(host: str) -> AsyncLimiter:
  if host not in rate_limiters:
  rate_limiters[host] = AsyncLimiter(30, 60)
  return rate_limiters[host]

async def polite_fetch(client, url):
  host = urlparse(url).netloc
  async with sem, limiter_for(host):
  return await fetch_with_retry(client, url)

Concurrency-bounded, rate-limited, retry-capable. The base pattern for production async scrapers.

Tenacity, a higher-level retry library

For more elaborate retry policies, tenacity is excellent:

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

@retry(
  stop=stop_after_attempt(3),
  wait=wait_exponential(multiplier=1, min=1, max=60),
  retry=retry_if_exception_type(httpx.TransportError),
  reraise=True,
)
async def fetch(client, url):
  r = await client.get(url)
  r.raise_for_status()
  return r

Composable retry rules, async support, clean syntax. For multi-condition retries (different policies per exception type), tenacity beats hand-rolled.

Honoring server hints

Two response headers are gold:

Retry-After, wait this long before retrying. Could be seconds or HTTP date.
X-RateLimit-Remaining / X-RateLimit-Reset, current quota. Slow down before you hit zero.

remaining = int(r.headers.get("X-RateLimit-Remaining", 999))
if remaining < 5:
  reset = int(r.headers.get("X-RateLimit-Reset", 0))
  sleep_for = max(reset - time.time(), 1)
  await asyncio.sleep(sleep_for)

This kind of proactive backoff keeps you in good standing instead of constantly getting 429'd.

Avoid concurrency-via-sleep

Anti-pattern:

# DON'T
async def fetch_all(urls):
  results = []
  for url in urls:
  results.append(await fetch(url))
  await asyncio.sleep(0.1)
  return results

This is sequential with a delay, not concurrent. Replace with gather + semaphore + rate limiter.

Hands-on lab

Against /api/products on Catalog108:

Build an async fetcher that retries on 429/5xx with exponential backoff + jitter.
Cap parallelism at 8 with a Semaphore.
Rate-limit to 30 requests/minute with aiolimiter.
Hammer 200 URLs.
Watch the logs: most requests should succeed first try, a few should retry.

Vary the rate to 100/min and the concurrency to 50. Watch the success rate drop as you push the target harder. That's the calibration loop every production scraper goes through.

Python Concurrency Control: Semaphores and Rate Limits

What you’ll learn

Semaphore, concurrency cap

Rate limit, requests per second

Backoff on 429/5xx

Combining all three

Tenacity, a higher-level retry library

Honoring server hints

Avoid concurrency-via-sleep

Hands-on lab

Hands-on lab

Quiz, check your understanding

What's the difference between a Semaphore and a rate limiter for async scrapers?