SSL Verification, Proxies, Authentication, Static Scraping

Three production concerns: TLS certificate handling, routing requests through proxies, and authenticating to protected endpoints with Basic, Bearer, and Digest schemes.

Three things you'll touch in any production scraper: TLS, proxies, and auth. They're independent concerns, but they share the same Session object, so it's worth covering them together.

SSL/TLS verification

By default, requests verifies the server's TLS certificate against the system trust store. This is correct and you should leave it alone:

r = requests.get("https://practice.scrapingcentral.com/", verify=True)  # default

When something fails (SSLError: certificate verify failed), the temptation is verify=False. Resist it. Three better options first:

Update your CA bundle. Outdated certifi or system roots cause spurious failures. pip install --upgrade certifi.
Pass a custom CA bundle. If the site uses a private CA (corporate intranet), point verify at its PEM file: verify="/path/to/ca.pem".
Use SNI correctly. Some servers require Server Name Indication, requests sends it by default, but some old proxies strip it.

If you truly must disable verification (one-off internal test against a self-signed cert), silence the resulting warning explicitly:

import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
r = requests.get(url, verify=False)

Never verify=False against a production target. You lose all protection against man-in-the-middle, and any credentials you send are at risk.

Proxies

A proxy is an intermediate server that forwards your requests. Scrapers use them for:

Geographic targeting, appear to come from a specific country.
IP rotation, distribute requests across many IPs to avoid rate limits.
Anonymity, hide your origin IP.

Configuring proxies

proxies = {
  "http":  "http://user:pass@proxy.example.com:8000",
  "https": "http://user:pass@proxy.example.com:8000",
}
r = requests.get("https://practice.scrapingcentral.com/products", proxies=proxies)

Note the slightly counterintuitive shape: the keys are the destination protocol (the scheme of the URL you're fetching), and the values are the proxy URL itself (which is usually http:// even when proxying HTTPS traffic). Both keys should be set unless you only ever fetch one protocol.

To set proxies once for an entire session:

s = requests.Session()
s.proxies.update({"http": "...", "https": "..."})

SOCKS proxies

Requires the PySocks extra:

pip install "requests[socks]"

proxies = {
  "http":  "socks5h://user:pass@proxy.example.com:1080",
  "https": "socks5h://user:pass@proxy.example.com:1080",
}

The socks5h:// (note the h) means "resolve DNS through the proxy too", the right choice for most scraping, because it ensures DNS leaks don't reveal your origin.

Proxy rotation

For high-volume scraping, you want a pool:

import random

PROXIES = [
  "http://user:pass@p1.example.com:8000",
  "http://user:pass@p2.example.com:8000",
  "http://user:pass@p3.example.com:8000",
]

def fetch(url):
  proxy = random.choice(PROXIES)
  return requests.get(url, proxies={"http": proxy, "https": proxy}, timeout=15)

Commercial proxy services (Bright Data, Oxylabs, Smartproxy, etc.) usually give you ONE endpoint and rotate IPs server-side. That's covered in the Production sub-path.

Verifying the proxy works

A quick sanity check, httpbin.org/ip returns the source IP the server sees:

r = requests.get("https://httpbin.org/ip", proxies=proxies)
print(r.json())  # {"origin": "203.0.113.42"}

If origin is your real IP, the proxy isn't being used; if it's the proxy's IP, you're good.

Authentication

Three HTTP auth schemes you'll meet on static-scraping targets:

HTTP Basic Auth

Username and password, base64-encoded in the Authorization header:

r = requests.get("https://practice.scrapingcentral.com/challenges/api/auth/basic",
  auth=("student", "practice123"))

auth=(user, pass) is shorthand for HTTPBasicAuth(user, pass). The credentials are base64-encoded, not encrypted, Basic over plain HTTP is plaintext-equivalent. Always use HTTPS.

HTTP Digest Auth

Older, challenge-response scheme; cryptographically slightly stronger than Basic but still mostly historical:

from requests.auth import HTTPDigestAuth
r = requests.get(url, auth=HTTPDigestAuth(user, password))

Used by some legacy systems, certain camera/IoT firmwares, and a few enterprise APIs. Rare in modern web scraping.

Bearer tokens

The dominant pattern in modern APIs:

headers = {"Authorization": "Bearer eyJhbGciOiJIUzI1NiIs..."}
r = requests.get("https://practice.scrapingcentral.com/api/products", headers=headers)

There's no built-in HTTPBearerAuth because it's just a header. Where the token comes from is API-specific, login response JSON, OAuth flow, manual paste from DevTools. The API sub-path covers token flows (OAuth, JWT, refresh) exhaustively.

Combining all three

Production scrapers usually have all three concerns on the same Session:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

s = requests.Session()
s.headers["User-Agent"] = "Mozilla/5.0 ..."
s.headers["Authorization"] = f"Bearer {api_token}"

s.proxies = {
  "http":  "http://user:pass@proxy.example.com:8000",
  "https": "http://user:pass@proxy.example.com:8000",
}

s.verify = True  # default, keep it on

retry = Retry(total=4, backoff_factor=1.0, status_forcelist=[429, 500, 502, 503, 504])
s.mount("https://", HTTPAdapter(max_retries=retry))

r = s.get("https://practice.scrapingcentral.com/api/products", timeout=10)

That's a production-grade scraper Session: realistic UA, bearer auth, proxy routing, retry policy, TLS verification, sensible timeouts. Every later technique just adds to this base.

Hands-on lab

The /challenges/api/auth/basic endpoint returns 401 without credentials. Hit it without auth, confirm 401. Then pass auth=("student", "practice123"), confirm 200 and inspect the JSON body. If you have a proxy available (free public ones are unreliable, so this is optional), retry through the proxy and verify your source IP changed via /api/auth/me or httpbin.org/ip.

SSL Verification, Proxies, Authentication

What you’ll learn

SSL/TLS verification

Proxies

Configuring proxies

SOCKS proxies

Proxy rotation

Verifying the proxy works

Authentication

HTTP Basic Auth

HTTP Digest Auth

Bearer tokens

Combining all three

Hands-on lab

Hands-on lab

Quiz, check your understanding

When SHOULD you set `verify=False` in production code?