Python Concurrency Control: Semaphores and Rate Limits
Bound concurrency, enforce request rates, honour 429 backoff. Three primitives that turn an async scraper into a polite, well-behaved one.
What you’ll learn
- Use asyncio.Semaphore to cap parallelism.
- Apply a token-bucket rate limit with aiolimiter.
- Implement exponential backoff with jitter on 429/5xx.
Async makes parallel fetches easy. Too easy. A naive asyncio.gather fires every request at once, which overwhelms the target, your network stack, and your patience when the bans start.
Three primitives turn a free-for-all into politeness: semaphores (concurrency cap), rate limiters (requests/sec), and backoff (retry strategy).
Semaphore, concurrency cap
import asyncio
import httpx
sem = asyncio.Semaphore(8)
async def fetch(client, url):
async with sem:
r = await client.get(url)
return r.json()
Semaphore(8) allows 8 concurrent acquirers. When all are held, the next async with sem: waits. This is the cheapest, most effective concurrency control.
Per-host semaphores are common, different limits for different targets:
class HostLimiter:
def __init__(self):
self.sems = {}
def get(self, host, limit=5):
if host not in self.sems:
self.sems[host] = asyncio.Semaphore(limit)
return self.sems[host]
limiter = HostLimiter()
async def fetch(client, url):
host = urlparse(url).netloc
async with limiter.get(host):
return await client.get(url)
Rate limit, requests per second
Semaphore caps parallel in-flight; rate limit caps requests per unit time. They're different. A scraper with semaphore=20 but each request taking 100ms still hammers a target at 200 req/sec.
aiolimiter provides a token-bucket limiter:
from aiolimiter import AsyncLimiter
limiter = AsyncLimiter(max_rate=30, time_period=60) # 30/minute
async def fetch(client, url):
async with limiter:
return await client.get(url)
Tokens accrue at max_rate / time_period. Each async with limiter: consumes one. When empty, the coroutine waits.
For per-host rate limits, keep a dict of limiters:
class HostRateLimiter:
def __init__(self, rate_per_min=30):
self.rate = rate_per_min
self.limiters = {}
def get(self, host):
if host not in self.limiters:
self.limiters[host] = AsyncLimiter(self.rate, 60)
return self.limiters[host]
Backoff on 429/5xx
import asyncio
import random
from httpx import HTTPStatusError
async def fetch_with_retry(client, url, max_retries=3):
for attempt in range(max_retries + 1):
try:
r = await client.get(url)
if r.status_code == 429:
retry_after = int(r.headers.get("Retry-After", 60))
await asyncio.sleep(retry_after)
continue
if 500 <= r.status_code < 600:
raise HTTPStatusError(f"server {r.status_code}", request=r.request, response=r)
r.raise_for_status()
return r
except (HTTPStatusError, httpx.TransportError) as e:
if attempt == max_retries:
raise
delay = (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(delay)
raise RuntimeError("unreachable")
Three behaviors layered:
- 429 with Retry-After. Honour the server's hint. No exponential backoff needed.
- 5xx. Server failure; retry with exponential backoff + jitter.
- Transport error. Network glitch; same retry strategy.
Jitter (random.uniform) prevents the thundering herd, without it, every concurrent failed request retries at the same instant.
Combining all three
from urllib.parse import urlparse
from aiolimiter import AsyncLimiter
sem = asyncio.Semaphore(20)
rate_limiters: dict[str, AsyncLimiter] = {}
def limiter_for(host: str) -> AsyncLimiter:
if host not in rate_limiters:
rate_limiters[host] = AsyncLimiter(30, 60)
return rate_limiters[host]
async def polite_fetch(client, url):
host = urlparse(url).netloc
async with sem, limiter_for(host):
return await fetch_with_retry(client, url)
Concurrency-bounded, rate-limited, retry-capable. The base pattern for production async scrapers.
Tenacity, a higher-level retry library
For more elaborate retry policies, tenacity is excellent:
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=60),
retry=retry_if_exception_type(httpx.TransportError),
reraise=True,
)
async def fetch(client, url):
r = await client.get(url)
r.raise_for_status()
return r
Composable retry rules, async support, clean syntax. For multi-condition retries (different policies per exception type), tenacity beats hand-rolled.
Honoring server hints
Two response headers are gold:
Retry-After, wait this long before retrying. Could be seconds or HTTP date.X-RateLimit-Remaining/X-RateLimit-Reset, current quota. Slow down before you hit zero.
remaining = int(r.headers.get("X-RateLimit-Remaining", 999))
if remaining < 5:
reset = int(r.headers.get("X-RateLimit-Reset", 0))
sleep_for = max(reset - time.time(), 1)
await asyncio.sleep(sleep_for)
This kind of proactive backoff keeps you in good standing instead of constantly getting 429'd.
Avoid concurrency-via-sleep
Anti-pattern:
# DON'T
async def fetch_all(urls):
results = []
for url in urls:
results.append(await fetch(url))
await asyncio.sleep(0.1)
return results
This is sequential with a delay, not concurrent. Replace with gather + semaphore + rate limiter.
Hands-on lab
Against /api/products on Catalog108:
- Build an async fetcher that retries on 429/5xx with exponential backoff + jitter.
- Cap parallelism at 8 with a Semaphore.
- Rate-limit to 30 requests/minute with aiolimiter.
- Hammer 200 URLs.
- Watch the logs: most requests should succeed first try, a few should retry.
Vary the rate to 100/min and the concurrency to 50. Watch the success rate drop as you push the target harder. That's the calibration loop every production scraper goes through.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/api/productsQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.