Pagination, The 5 Common Patterns and How to Detect Them, Static Scraping

Every paginated site uses one of five patterns: numbered, offset, cursor, load-more, or unknown-end. Identify which, scrape it correctly, stop at the right time.

Pagination is where scrapers most often go wrong: missing pages, infinite loops, duplicate items, off-by-one bugs. The good news: there are only five common patterns. Once you can spot which one a site uses, the scraping is mechanical.

The five patterns

Pattern	URL shape	Stop condition	Catalog108 lab
Numbered	`?page=N`	Last page known or empty list	`/challenges/static/pagination/numbered`
Offset / limit	`?offset=20&limit=20`	`total` field or empty page	`/challenges/static/pagination/offset`
Cursor	`?cursor=abc123`	`next_cursor` is empty/null	`/challenges/static/pagination/cursor`
Load-more (HTTP)	"Load more" link triggers a GET	No more "next" link	`/challenges/static/pagination/load-more-http`
Unknown end	Any of above, but total not exposed	Empty page or duplicate detection	`/challenges/static/pagination/unknown-end`

Pattern 1: Numbered pagination

The classic. ?page=1, ?page=2, etc. The page renders a list of page numbers OR a "Next" link.

import requests
from bs4 import BeautifulSoup

BASE = "https://practice.scrapingcentral.com"
all_items = []

for page in range(1, 1000):  # 1000 is a safety upper bound, not the real limit
  r = requests.get(f"{BASE}/challenges/static/pagination/numbered", params={"page": page}, timeout=10)
  r.raise_for_status()
  soup = BeautifulSoup(r.content, "lxml")
  items = soup.select(".item")
  if not items:
  break
  all_items.extend(item.get_text(strip=True) for item in items)
  # Optional faster exit: check for visible "Next" link
  if not soup.select_one("a.next"):
  break

print(len(all_items))

Two stop conditions:

Empty list of items, clearest signal we've gone past the end.
No "next" link, anti-overshoot, useful when items per page are inconsistent.

Always have an outer upper-bound loop too (range(1, 1000)). If both stop signals fail (bug, layout change), at least your loop terminates.

Pattern 2: Offset / limit

Often used in API-style URLs:

offset = 0
limit = 20

while True:
  r = requests.get(f"{BASE}/challenges/static/pagination/offset",
  params={"offset": offset, "limit": limit})
  items = r.json()["items"]
  if not items:
  break
  all_items.extend(items)
  offset += limit

If the response includes a total count, use it for early termination and a progress bar:

data = r.json()
total = data["total"]
print(f"{offset + len(items)}/{total}")
if offset + len(items) >= total:
  break

Pattern 3: Cursor-based

The server gives you an opaque next_cursor token. You send it back on the next request; you stop when you get an empty/null cursor.

cursor = None
while True:
  params = {}
  if cursor is not None:
  params["cursor"] = cursor
  r = requests.get(f"{BASE}/challenges/static/pagination/cursor", params=params)
  data = r.json()
  all_items.extend(data["items"])
  cursor = data.get("next_cursor")
  if not cursor:
  break

Don't parse or guess the cursor format. Treat it as opaque. The server is the source of truth on "what comes next."

Cursor pagination is the most reliable for huge or rapidly-changing datasets, adding or removing items mid-scrape doesn't shift your position the way offset pagination would.

Pattern 4: Load-more (HTTP)

Some pages have a "Load more" button that does a regular HTTP GET (no JS) for the next chunk. The trick is finding that URL:

url = f"{BASE}/challenges/static/pagination/load-more-http"
while url:
  r = requests.get(url)
  soup = BeautifulSoup(r.content, "lxml")
  all_items.extend(it.get_text(strip=True) for it in soup.select(".item"))
  next_btn = soup.select_one("a.load-more")
  url = next_btn["href"] if next_btn else None

You follow the rendered "Load more" link until it disappears. Often the URL contains an offset, cursor, or session ID, let the server set it, don't construct it yourself.

If the button is client-side JS-only (no <a href>), it's not actually static pagination, it's an XHR. Open DevTools → Network → click "Load more" once, find the request, and replicate. That's API-scraping territory (Sub-Path 4).

Pattern 5: Unknown end

No page count, no total, no cursor that explicitly says "this is the last." You only know you're done when the page returns nothing new.

Two failure modes to avoid:

Infinite loop if the server returns the same data on overshoot (?page=999 returns page 1 content).
Missing the last page if your stop signal triggers prematurely.

The robust approach: track a fingerprint of what you've seen:

seen_first_item = None
page = 1
while page < 10000:  # outer safety
  r = requests.get(url, params={"page": page})
  soup = BeautifulSoup(r.content, "lxml")
  items = soup.select(".item")
  if not items:
  break
  first = items[0].get_text(strip=True)
  if first == seen_first_item:
  break  # server is looping
  seen_first_item = first
  all_items.extend(it.get_text(strip=True) for it in items)
  page += 1

For really paranoid scrapes, dedupe at the data level (Lesson 1.33). Two same-content pages in a row is your stop signal.

How to identify which pattern from the page

Open DevTools, look at the URL when you click "Next" or scroll:

URL contains ?page=N → numbered.
URL contains ?offset=N → offset.
URL contains a random-looking cursor= value → cursor.
A button on the page does a GET to a deeper URL → load-more.
None of the above visible → look at the API the JS calls (DevTools → Network).

Follow rendered links vs. construct URLs

Two philosophies:

Construct: predict the URL pattern (?page=N) and increment. Faster, but assumes you know the pattern.
Follow: parse the "Next" link from the page itself. Slower (one extra parse per page) but works on weird formats, signed URLs, or session-tied cursors.

For unknown sites, follow first. Once you've confirmed the pattern, switch to construct for speed if needed.

Polite paging

Add a small delay between page requests, Lesson 1.28 covers polite scraping in depth. For now:

import time
for page in range(1, 100):
  ...
  time.sleep(0.5)  # half a second between page fetches

Even a 0.2s delay is enough to avoid hammering most servers. Total time on a 100-page scrape: 20 seconds added. Worth it.

A unified paging helper

def paginate_numbered(url, params=None, max_pages=10000, sleep=0.5):
  page = 1
  params = dict(params or {})
  while page < max_pages:
  params["page"] = page
  r = requests.get(url, params=params, timeout=10)
  r.raise_for_status()
  yield r
  page += 1
  time.sleep(sleep)

A generator that yields each response. The caller checks for emptiness and breaks. Reusable across most numbered-pagination sites.

Hands-on lab

All four pagination challenges have their own URLs at /challenges/static/pagination/. Pick one (start with numbered), identify the pattern from the URL when you click pagination links in your browser, then write the paging loop. Repeat with offset, cursor, and load-more-http. Then test your loop against unknown-end, it should terminate cleanly without an infinite loop.

Pagination, The 5 Common Patterns and How to Detect Them

What you’ll learn

The five patterns

Pattern 1: Numbered pagination

Pattern 2: Offset / limit

Pattern 3: Cursor-based

Pattern 4: Load-more (HTTP)

Pattern 5: Unknown end

How to identify which pattern from the page

Follow rendered links vs. construct URLs

Polite paging

A unified paging helper

Hands-on lab

Hands-on lab

Quiz, check your understanding

A site's URL changes to `/products?page=3` when you click 'Next'. Which pagination pattern is this?