Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

1.2beginner4 min read

GET Requests, Query Parameters, Headers

Anatomy of an HTTP GET request: URLs, query strings, headers, and how to control them precisely in Python's requests library.

What you’ll learn

  • Distinguish path, query string, and fragment in a URL.
  • Pass query parameters via the `params=` dict instead of string concatenation.
  • Set request headers (User-Agent, Accept, Referer) explicitly.
  • Inspect outgoing requests and incoming responses for debugging.

GET is the verb your scraper will send most. It says "give me this resource." Everything that distinguishes one GET from another lives in the URL itself and in the request headers.

URL anatomy

A URL is more structured than it looks:

https://practice.scrapingcentral.com/products?page=2&category=kitchen#top
└─┬─┘  └────────────┬─────────────┘└───┬───┘└────────┬──────────┘└─┬─┘
scheme  hostname  path  query string  fragment
  • Scheme, https:// or http://.
  • Hostname, what DNS resolves.
  • Path, what resource on the server.
  • Query string, ?key=value&key=value pairs passed to the server.
  • Fragment, #anchor is browser-only, never sent to the server.

Your scraper controls all of these. The query string is where most scraping variations live: page numbers, search terms, filter values, sort orders.

Build query strings the right way

Don't do this:

# Wrong
url = "https://practice.scrapingcentral.com/products?page=" + str(page)

String concatenation breaks on special characters (&, =, spaces, non-ASCII). Use the params= argument:

import requests

params = {"page": 2, "category": "kitchen"}
r = requests.get("https://practice.scrapingcentral.com/products", params=params)
print(r.url)
# → https://practice.scrapingcentral.com/products?page=2&category=kitchen

requests URL-encodes values for you. params={"q": "yellow mug"} becomes ?q=yellow+mug correctly.

Lists and None are handled too:

params = {"tag": ["ceramic", "kitchen"], "color": None}
# → ?tag=ceramic&tag=kitchen  (None values are dropped)

Headers: what your request says about itself

Headers are key-value pairs sent before the body. They tell the server who you are, what you can accept, and where you came from:

headers = {
  "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
  "Accept": "text/html,application/xhtml+xml",
  "Accept-Language": "en-US,en;q=0.9",
  "Referer": "https://practice.scrapingcentral.com/",
}
r = requests.get("https://practice.scrapingcentral.com/products", headers=headers)

The headers that matter for scrapers:

Header Purpose Why scrapers care
User-Agent Identifies the client Some sites block default Python UA; many serve different HTML by UA
Accept What content types you'll accept Forces JSON vs HTML in content-negotiated endpoints
Accept-Language Preferred language Many sites serve translated content based on this
Referer The page you came from Some sites reject requests without a Referer
Cookie Session/auth state Covered in Lesson 1.4

By default, requests sends a User-Agent like python-requests/2.31.0. That's an instant tell. Lesson 1.5 covers User-Agent strategy in depth, but for now: set a realistic browser UA on every scraper.

What requests actually sent

Before debugging weird responses, check what you sent:

r = requests.get(url, params=params, headers=headers)
print(r.request.url)
print(r.request.headers)

r.request is the prepared request object, exactly the bytes that went on the wire. If the URL looks wrong, your params are wrong. If a header is missing, your headers dict is wrong. This is the first place to look when a scraper returns surprising output.

Inspect the response too

print(r.status_code)  # 200
print(r.headers)  # dict-like, response headers
print(r.headers["Content-Type"])
print(r.encoding)  # the encoding requests is using to decode .text
print(len(r.content))  # body length in bytes
print(r.elapsed)  # how long the round-trip took

r.elapsed is gold for performance debugging. If one request takes 5 seconds and others take 200ms, you have a slow path to investigate.

Following redirects

By default, requests follows redirects automatically:

r = requests.get("https://practice.scrapingcentral.com/products")
print(r.history)  # list of intermediate responses (e.g. 301, 302)
print(r.url)  # final URL after redirects

To see them in action, disable auto-follow:

r = requests.get(url, allow_redirects=False)
print(r.status_code, r.headers.get("Location"))

Useful when you want to detect redirect chains, capture cookies set mid-redirect, or stop at the first hop.

A realistic GET scraper

import requests

headers = {
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
  "Accept-Language": "en-US,en;q=0.9",
}

for page in range(1, 6):
  r = requests.get(
  "https://practice.scrapingcentral.com/products",
  params={"page": page},
  headers=headers,
  timeout=10,
  )
  r.raise_for_status()
  print(f"page {page}: {len(r.content)} bytes, took {r.elapsed.total_seconds():.2f}s")

That's a polite, observable, debuggable scraper loop. Add parsing and you're done.

Hands-on lab

Hit https://practice.scrapingcentral.com/products with three different ?page= values (1, 2, 3) and confirm the HTML differs each time. Then add a ?category=kitchen filter and verify the response changes. Print r.request.url for each call so you can see exactly what was sent.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /products

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

GET Requests, Query Parameters, Headers1 / 8

In the URL `https://example.com/search?q=mug&page=2#results`, which part is NEVER sent to the server?

Score so far: 0 / 0