GET Requests, Query Parameters, Headers
Anatomy of an HTTP GET request: URLs, query strings, headers, and how to control them precisely in Python's requests library.
What you’ll learn
- Distinguish path, query string, and fragment in a URL.
- Pass query parameters via the `params=` dict instead of string concatenation.
- Set request headers (User-Agent, Accept, Referer) explicitly.
- Inspect outgoing requests and incoming responses for debugging.
GET is the verb your scraper will send most. It says "give me this resource." Everything that distinguishes one GET from another lives in the URL itself and in the request headers.
URL anatomy
A URL is more structured than it looks:
https://practice.scrapingcentral.com/products?page=2&category=kitchen#top
└─┬─┘ └────────────┬─────────────┘└───┬───┘└────────┬──────────┘└─┬─┘
scheme hostname path query string fragment
- Scheme,
https://orhttp://. - Hostname, what DNS resolves.
- Path, what resource on the server.
- Query string,
?key=value&key=valuepairs passed to the server. - Fragment,
#anchoris browser-only, never sent to the server.
Your scraper controls all of these. The query string is where most scraping variations live: page numbers, search terms, filter values, sort orders.
Build query strings the right way
Don't do this:
# Wrong
url = "https://practice.scrapingcentral.com/products?page=" + str(page)
String concatenation breaks on special characters (&, =, spaces, non-ASCII). Use the params= argument:
import requests
params = {"page": 2, "category": "kitchen"}
r = requests.get("https://practice.scrapingcentral.com/products", params=params)
print(r.url)
# → https://practice.scrapingcentral.com/products?page=2&category=kitchen
requests URL-encodes values for you. params={"q": "yellow mug"} becomes ?q=yellow+mug correctly.
Lists and None are handled too:
params = {"tag": ["ceramic", "kitchen"], "color": None}
# → ?tag=ceramic&tag=kitchen (None values are dropped)
Headers: what your request says about itself
Headers are key-value pairs sent before the body. They tell the server who you are, what you can accept, and where you came from:
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
"Accept": "text/html,application/xhtml+xml",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://practice.scrapingcentral.com/",
}
r = requests.get("https://practice.scrapingcentral.com/products", headers=headers)
The headers that matter for scrapers:
| Header | Purpose | Why scrapers care |
|---|---|---|
User-Agent |
Identifies the client | Some sites block default Python UA; many serve different HTML by UA |
Accept |
What content types you'll accept | Forces JSON vs HTML in content-negotiated endpoints |
Accept-Language |
Preferred language | Many sites serve translated content based on this |
Referer |
The page you came from | Some sites reject requests without a Referer |
Cookie |
Session/auth state | Covered in Lesson 1.4 |
By default, requests sends a User-Agent like python-requests/2.31.0. That's an instant tell. Lesson 1.5 covers User-Agent strategy in depth, but for now: set a realistic browser UA on every scraper.
What requests actually sent
Before debugging weird responses, check what you sent:
r = requests.get(url, params=params, headers=headers)
print(r.request.url)
print(r.request.headers)
r.request is the prepared request object, exactly the bytes that went on the wire. If the URL looks wrong, your params are wrong. If a header is missing, your headers dict is wrong. This is the first place to look when a scraper returns surprising output.
Inspect the response too
print(r.status_code) # 200
print(r.headers) # dict-like, response headers
print(r.headers["Content-Type"])
print(r.encoding) # the encoding requests is using to decode .text
print(len(r.content)) # body length in bytes
print(r.elapsed) # how long the round-trip took
r.elapsed is gold for performance debugging. If one request takes 5 seconds and others take 200ms, you have a slow path to investigate.
Following redirects
By default, requests follows redirects automatically:
r = requests.get("https://practice.scrapingcentral.com/products")
print(r.history) # list of intermediate responses (e.g. 301, 302)
print(r.url) # final URL after redirects
To see them in action, disable auto-follow:
r = requests.get(url, allow_redirects=False)
print(r.status_code, r.headers.get("Location"))
Useful when you want to detect redirect chains, capture cookies set mid-redirect, or stop at the first hop.
A realistic GET scraper
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
}
for page in range(1, 6):
r = requests.get(
"https://practice.scrapingcentral.com/products",
params={"page": page},
headers=headers,
timeout=10,
)
r.raise_for_status()
print(f"page {page}: {len(r.content)} bytes, took {r.elapsed.total_seconds():.2f}s")
That's a polite, observable, debuggable scraper loop. Add parsing and you're done.
Hands-on lab
Hit https://practice.scrapingcentral.com/products with three different ?page= values (1, 2, 3) and confirm the HTML differs each time. Then add a ?category=kitchen filter and verify the response changes. Print r.request.url for each call so you can see exactly what was sent.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/productsQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.