SSL Verification, Proxies, Authentication
Three production concerns: TLS certificate handling, routing requests through proxies, and authenticating to protected endpoints with Basic, Bearer, and Digest schemes.
What you’ll learn
- Configure SSL verification correctly, and know when to disable it safely.
- Route Python `requests` traffic through HTTP, HTTPS, and SOCKS proxies.
- Use HTTP Basic, Digest, and Bearer-token authentication.
- Combine all three on a single `Session`.
Three things you'll touch in any production scraper: TLS, proxies, and auth. They're independent concerns, but they share the same Session object, so it's worth covering them together.
SSL/TLS verification
By default, requests verifies the server's TLS certificate against the system trust store. This is correct and you should leave it alone:
r = requests.get("https://practice.scrapingcentral.com/", verify=True) # default
When something fails (SSLError: certificate verify failed), the temptation is verify=False. Resist it. Three better options first:
- Update your CA bundle. Outdated
certifior system roots cause spurious failures.pip install --upgrade certifi. - Pass a custom CA bundle. If the site uses a private CA (corporate intranet), point
verifyat its PEM file:verify="/path/to/ca.pem". - Use SNI correctly. Some servers require Server Name Indication,
requestssends it by default, but some old proxies strip it.
If you truly must disable verification (one-off internal test against a self-signed cert), silence the resulting warning explicitly:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
r = requests.get(url, verify=False)
Never verify=False against a production target. You lose all protection against man-in-the-middle, and any credentials you send are at risk.
Proxies
A proxy is an intermediate server that forwards your requests. Scrapers use them for:
- Geographic targeting, appear to come from a specific country.
- IP rotation, distribute requests across many IPs to avoid rate limits.
- Anonymity, hide your origin IP.
Configuring proxies
proxies = {
"http": "http://user:pass@proxy.example.com:8000",
"https": "http://user:pass@proxy.example.com:8000",
}
r = requests.get("https://practice.scrapingcentral.com/products", proxies=proxies)
Note the slightly counterintuitive shape: the keys are the destination protocol (the scheme of the URL you're fetching), and the values are the proxy URL itself (which is usually http:// even when proxying HTTPS traffic). Both keys should be set unless you only ever fetch one protocol.
To set proxies once for an entire session:
s = requests.Session()
s.proxies.update({"http": "...", "https": "..."})
SOCKS proxies
Requires the PySocks extra:
pip install "requests[socks]"
proxies = {
"http": "socks5h://user:pass@proxy.example.com:1080",
"https": "socks5h://user:pass@proxy.example.com:1080",
}
The socks5h:// (note the h) means "resolve DNS through the proxy too", the right choice for most scraping, because it ensures DNS leaks don't reveal your origin.
Proxy rotation
For high-volume scraping, you want a pool:
import random
PROXIES = [
"http://user:pass@p1.example.com:8000",
"http://user:pass@p2.example.com:8000",
"http://user:pass@p3.example.com:8000",
]
def fetch(url):
proxy = random.choice(PROXIES)
return requests.get(url, proxies={"http": proxy, "https": proxy}, timeout=15)
Commercial proxy services (Bright Data, Oxylabs, Smartproxy, etc.) usually give you ONE endpoint and rotate IPs server-side. That's covered in the Production sub-path.
Verifying the proxy works
A quick sanity check, httpbin.org/ip returns the source IP the server sees:
r = requests.get("https://httpbin.org/ip", proxies=proxies)
print(r.json()) # {"origin": "203.0.113.42"}
If origin is your real IP, the proxy isn't being used; if it's the proxy's IP, you're good.
Authentication
Three HTTP auth schemes you'll meet on static-scraping targets:
HTTP Basic Auth
Username and password, base64-encoded in the Authorization header:
r = requests.get("https://practice.scrapingcentral.com/challenges/api/auth/basic",
auth=("student", "practice123"))
auth=(user, pass) is shorthand for HTTPBasicAuth(user, pass). The credentials are base64-encoded, not encrypted, Basic over plain HTTP is plaintext-equivalent. Always use HTTPS.
HTTP Digest Auth
Older, challenge-response scheme; cryptographically slightly stronger than Basic but still mostly historical:
from requests.auth import HTTPDigestAuth
r = requests.get(url, auth=HTTPDigestAuth(user, password))
Used by some legacy systems, certain camera/IoT firmwares, and a few enterprise APIs. Rare in modern web scraping.
Bearer tokens
The dominant pattern in modern APIs:
headers = {"Authorization": "Bearer eyJhbGciOiJIUzI1NiIs..."}
r = requests.get("https://practice.scrapingcentral.com/api/products", headers=headers)
There's no built-in HTTPBearerAuth because it's just a header. Where the token comes from is API-specific, login response JSON, OAuth flow, manual paste from DevTools. The API sub-path covers token flows (OAuth, JWT, refresh) exhaustively.
Combining all three
Production scrapers usually have all three concerns on the same Session:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
s = requests.Session()
s.headers["User-Agent"] = "Mozilla/5.0 ..."
s.headers["Authorization"] = f"Bearer {api_token}"
s.proxies = {
"http": "http://user:pass@proxy.example.com:8000",
"https": "http://user:pass@proxy.example.com:8000",
}
s.verify = True # default, keep it on
retry = Retry(total=4, backoff_factor=1.0, status_forcelist=[429, 500, 502, 503, 504])
s.mount("https://", HTTPAdapter(max_retries=retry))
r = s.get("https://practice.scrapingcentral.com/api/products", timeout=10)
That's a production-grade scraper Session: realistic UA, bearer auth, proxy routing, retry policy, TLS verification, sensible timeouts. Every later technique just adds to this base.
Hands-on lab
The /challenges/api/auth/basic endpoint returns 401 without credentials. Hit it without auth, confirm 401. Then pass auth=("student", "practice123"), confirm 200 and inspect the JSON body. If you have a proxy available (free public ones are unreliable, so this is optional), retry through the proxy and verify your source IP changed via /api/auth/me or httpbin.org/ip.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/api/auth/basicQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.