Timeouts, Retries, Exponential Backoff
Real networks fail. Real scrapers handle it. Learn to set timeouts properly, retry transient failures, and back off exponentially without hammering a struggling server.
What you’ll learn
- Set connect and read timeouts independently.
- Distinguish transient errors (retry) from permanent ones (don't).
- Implement exponential backoff with jitter.
- Use `urllib3.Retry` adapters on a `requests.Session` for production-grade resilience.
Networks are flaky. Servers crash, DNS slows, packets get lost. A scraper that doesn't anticipate failure isn't a scraper, it's a script that worked once in dev. This lesson is about making your scraper robust without making it abusive.
Always set a timeout
By default, requests has no timeout. A non-responsive server will block your process forever. Every production call must have a timeout:
import requests
r = requests.get(url, timeout=10)
The timeout argument supports a tuple for finer control:
r = requests.get(url, timeout=(5, 30))
# (connect_timeout, read_timeout)
- Connect timeout, how long to wait for the TCP/TLS handshake. 5 seconds is plenty.
- Read timeout, how long to wait for the server to send each chunk of the response body. 30 seconds is reasonable for most pages; tune up for slow APIs.
A single value applies to both. Use the tuple form when you have wildly different expectations (fast connect, slow query).
What can go wrong
Common exceptions you should know:
| Exception | When | Should you retry? |
|---|---|---|
requests.ConnectionError |
DNS fail, refused connection, network dead | Yes (transient) |
requests.Timeout |
Connect or read timeout exceeded | Yes |
requests.HTTPError |
4xx / 5xx (after raise_for_status) |
Depends, see below |
requests.TooManyRedirects |
Redirect loop | No (permanent) |
JSONDecodeError |
Body wasn't JSON | No |
For HTTP errors, the rule of thumb:
- 5xx (server errors), retry. The server is temporarily struggling.
- 429 (Too Many Requests), retry, AFTER a longer backoff (often the server sends a
Retry-Afterheader). - 408 (Request Timeout), retry.
- 4xx (client errors), DON'T retry. 404 means the page doesn't exist; 403 means you're not allowed. Retrying won't change that.
A hand-rolled retry loop
import time, random, requests
def fetch(url, max_attempts=4):
for attempt in range(max_attempts):
try:
r = requests.get(url, timeout=10)
if r.status_code < 400:
return r
if r.status_code in (429, 500, 502, 503, 504):
# Transient, wait and retry
wait = 2 ** attempt + random.uniform(0, 1)
print(f"attempt {attempt+1}: {r.status_code}, retrying in {wait:.1f}s")
time.sleep(wait)
continue
# Permanent, don't retry
r.raise_for_status()
except (requests.ConnectionError, requests.Timeout) as e:
wait = 2 ** attempt + random.uniform(0, 1)
print(f"attempt {attempt+1}: {e}, retrying in {wait:.1f}s")
time.sleep(wait)
raise RuntimeError(f"failed after {max_attempts} attempts: {url}")
Three things to notice:
- Exponential backoff, wait grows as
2^attempt: 1s, 2s, 4s, 8s. Prevents hammering a struggling server. - Jitter, small random addition. If a thousand clients all back off in lockstep, they reconverge in lockstep too. Jitter spreads the herd out.
- Different policy by status, 5xx and 429 retry; 4xx (other than 408/429) bail.
The right way: urllib3.Retry on a Session adapter
requests ships with retry support, you just have to wire it up:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
retry = Retry(
total=4,
backoff_factor=1.0, # 1s, 2s, 4s, 8s
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET", "HEAD"], # Don't auto-retry POSTs by default
respect_retry_after_header=True,
)
s = requests.Session()
adapter = HTTPAdapter(max_retries=retry)
s.mount("https://", adapter)
s.mount("http://", adapter)
# Now every s.get() automatically retries on transient errors
r = s.get("https://practice.scrapingcentral.com/challenges/api/rest/flaky", timeout=10)
This is the production pattern. It handles retries before your code ever sees the failure, you call s.get() once and either get a successful response or a final exception. The Retry object also honours the server's Retry-After: N header for 429 / 503 responses.
Why POST retries need care
The allowed_methods=["GET", "HEAD"] argument is deliberate. Retrying a POST risks duplicate side effects (Lesson 1.3). If you DO want to retry POSTs, ensure the operation is idempotent on the server side, or use an idempotency key.
Backoff math: how much is enough?
attempt 1 fails → wait 1s
attempt 2 fails → wait 2s
attempt 3 fails → wait 4s
attempt 4 fails → wait 8s
total wait: 15s before final failure
For most public sites, 3-4 attempts spread over 5-30 seconds is the right curve. Much longer and you're holding a worker on a doomed request; much shorter and you're hammering a server that's already in trouble.
Add jitter, always
Without jitter, distributed scrapers retry in synchronised waves. The server gets quiet → many clients retry simultaneously → server crashes again → repeat. Jitter smooths this out:
import random
wait = backoff_factor * (2 ** attempt) * (0.5 + random.random())
# Roughly: half-to-full-and-a-half of the nominal backoff
This is "decorrelated jitter," common in AWS clients. urllib3.Retry adds its own jitter automatically when configured properly.
Per-request timeout vs per-session
Retry controls retry behaviour, but you still pass timeout= on every call. They're orthogonal:
timeout, how long to wait for ONE attempt.Retry, how many attempts to make and how to space them.
Pass both. Always.
A complete robust fetch helper
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def build_session():
s = requests.Session()
s.headers["User-Agent"] = "Mozilla/5.0 ..."
retry = Retry(
total=4,
backoff_factor=1.0,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET", "HEAD"],
respect_retry_after_header=True,
)
adapter = HTTPAdapter(max_retries=retry, pool_connections=10, pool_maxsize=10)
s.mount("https://", adapter)
s.mount("http://", adapter)
return s
s = build_session()
r = s.get("https://practice.scrapingcentral.com/challenges/api/rest/flaky", timeout=10)
r.raise_for_status()
This is the boilerplate that should open every non-trivial scraper.
Hands-on lab
The /challenges/api/rest/flaky endpoint fails ~50% of requests with random 5xx codes. Hit it 20 times without retries, count successes. Then hit it 20 times with a 4-attempt retry policy, count successes. Notice how much more reliable the second loop is, and how much variation in total time per request you see.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/api/rest/flakyQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.