GET Requests, Query Parameters, Headers, Static Scraping

Anatomy of an HTTP GET request: URLs, query strings, headers, and how to control them precisely in Python's requests library.

GET is the verb your scraper will send most. It says "give me this resource." Everything that distinguishes one GET from another lives in the URL itself and in the request headers.

URL anatomy

A URL is more structured than it looks:

https://practice.scrapingcentral.com/products?page=2&category=kitchen#top
└─┬─┘  └────────────┬─────────────┘└───┬───┘└────────┬──────────┘└─┬─┘
scheme  hostname  path  query string  fragment

Scheme, https:// or http://.
Hostname, what DNS resolves.
Path, what resource on the server.
Query string, ?key=value&key=value pairs passed to the server.
Fragment, #anchor is browser-only, never sent to the server.

Your scraper controls all of these. The query string is where most scraping variations live: page numbers, search terms, filter values, sort orders.

Build query strings the right way

Don't do this:

# Wrong
url = "https://practice.scrapingcentral.com/products?page=" + str(page)

String concatenation breaks on special characters (&, =, spaces, non-ASCII). Use the params= argument:

import requests

params = {"page": 2, "category": "kitchen"}
r = requests.get("https://practice.scrapingcentral.com/products", params=params)
print(r.url)
# → https://practice.scrapingcentral.com/products?page=2&category=kitchen

requests URL-encodes values for you. params={"q": "yellow mug"} becomes ?q=yellow+mug correctly.

Lists and None are handled too:

params = {"tag": ["ceramic", "kitchen"], "color": None}
# → ?tag=ceramic&tag=kitchen  (None values are dropped)

Headers: what your request says about itself

Headers are key-value pairs sent before the body. They tell the server who you are, what you can accept, and where you came from:

headers = {
  "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
  "Accept": "text/html,application/xhtml+xml",
  "Accept-Language": "en-US,en;q=0.9",
  "Referer": "https://practice.scrapingcentral.com/",
}
r = requests.get("https://practice.scrapingcentral.com/products", headers=headers)

The headers that matter for scrapers:

Header	Purpose	Why scrapers care
`User-Agent`	Identifies the client	Some sites block default Python UA; many serve different HTML by UA
`Accept`	What content types you'll accept	Forces JSON vs HTML in content-negotiated endpoints
`Accept-Language`	Preferred language	Many sites serve translated content based on this
`Referer`	The page you came from	Some sites reject requests without a Referer
`Cookie`	Session/auth state	Covered in Lesson 1.4

By default, requests sends a User-Agent like python-requests/2.31.0. That's an instant tell. Lesson 1.5 covers User-Agent strategy in depth, but for now: set a realistic browser UA on every scraper.

What requests actually sent

Before debugging weird responses, check what you sent:

r = requests.get(url, params=params, headers=headers)
print(r.request.url)
print(r.request.headers)

r.request is the prepared request object, exactly the bytes that went on the wire. If the URL looks wrong, your params are wrong. If a header is missing, your headers dict is wrong. This is the first place to look when a scraper returns surprising output.

Inspect the response too

print(r.status_code)  # 200
print(r.headers)  # dict-like, response headers
print(r.headers["Content-Type"])
print(r.encoding)  # the encoding requests is using to decode .text
print(len(r.content))  # body length in bytes
print(r.elapsed)  # how long the round-trip took

r.elapsed is gold for performance debugging. If one request takes 5 seconds and others take 200ms, you have a slow path to investigate.

Following redirects

By default, requests follows redirects automatically:

r = requests.get("https://practice.scrapingcentral.com/products")
print(r.history)  # list of intermediate responses (e.g. 301, 302)
print(r.url)  # final URL after redirects

To see them in action, disable auto-follow:

r = requests.get(url, allow_redirects=False)
print(r.status_code, r.headers.get("Location"))

Useful when you want to detect redirect chains, capture cookies set mid-redirect, or stop at the first hop.

A realistic GET scraper

import requests

headers = {
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
  "Accept-Language": "en-US,en;q=0.9",
}

for page in range(1, 6):
  r = requests.get(
  "https://practice.scrapingcentral.com/products",
  params={"page": page},
  headers=headers,
  timeout=10,
  )
  r.raise_for_status()
  print(f"page {page}: {len(r.content)} bytes, took {r.elapsed.total_seconds():.2f}s")

That's a polite, observable, debuggable scraper loop. Add parsing and you're done.

Hands-on lab

Hit https://practice.scrapingcentral.com/products with three different ?page= values (1, 2, 3) and confirm the HTML differs each time. Then add a ?category=kitchen filter and verify the response changes. Print r.request.url for each call so you can see exactly what was sent.

GET Requests, Query Parameters, Headers

What you’ll learn

URL anatomy

Build query strings the right way

Headers: what your request says about itself

What requests actually sent

Inspect the response too

Following redirects

A realistic GET scraper

Hands-on lab

Hands-on lab

Quiz, check your understanding

In the URL `https://example.com/search?q=mug&page=2#results`, which part is NEVER sent to the server?