Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Scraping Paginated APIs

Learn how to handle offset-based, page-based, and cursor-based pagination when scraping APIs with Python.

API Scraping · #4beginner2 min read
Share:WhatsAppLinkedIn

APIs rarely return all results at once. Instead they split data across pages. You need to iterate through each page to collect the complete dataset.

Page-Number Pagination

The simplest pattern, increment a page parameter:

import requests

all_posts = []
page = 1

while True:
    response = requests.get(
        "https://jsonplaceholder.typicode.com/posts",
        params={"_page": page, "_limit": 20},
        timeout=15,
    )
    response.raise_for_status()
    data = response.json()

    if not data:
        break

    all_posts.extend(data)
    print(f"Page {page}: got {len(data)} posts")
    page += 1

print(f"Total: {len(all_posts)} posts")

Offset-Based Pagination

Some APIs use offset and limit instead of page numbers:

import requests

all_items = []
offset = 0
limit = 50

while True:
    response = requests.get(
        "https://api.example.com/products",
        params={"offset": offset, "limit": limit},
        timeout=15,
    )
    response.raise_for_status()
    data = response.json()

    items = data.get("results", [])
    if not items:
        break

    all_items.extend(items)
    offset += limit

    # Stop if we've reached the total
    if offset >= data.get("total", float("inf")):
        break

print(f"Collected {len(all_items)} items")

Cursor-Based Pagination

Modern APIs (Twitter, Shopify, Stripe) use cursors. The response includes a token pointing to the next batch:

import requests

all_items = []
cursor = None

while True:
    params = {"limit": 100}
    if cursor:
        params["cursor"] = cursor

    response = requests.get(
        "https://api.example.com/orders",
        params=params,
        timeout=15,
    )
    response.raise_for_status()
    data = response.json()

    all_items.extend(data["items"])
    cursor = data.get("next_cursor")

    if not cursor:
        break

print(f"Total orders: {len(all_items)}")

Link-Header Pagination

Some APIs put the next URL in the Link HTTP header (GitHub does this):

import requests

url = "https://api.github.com/users/octocat/repos"
all_repos = []

while url:
    response = requests.get(url, params={"per_page": 30}, timeout=15)
    response.raise_for_status()
    all_repos.extend(response.json())

    # Parse Link header for next page
    link_header = response.headers.get("Link", "")
    url = None
    for part in link_header.split(","):
        if 'rel="next"' in part:
            url = part.split(";")[0].strip(" <>")
            break

print(f"Total repos: {len(all_repos)}")

Pagination Patterns at a Glance

Type Parameter Stop Condition
Page-number page=1,2,3... Empty response
Offset offset=0,50,100... offset >= total
Cursor cursor=abc123 No next_cursor
Link header URL in header No rel="next"

When scraping paginated APIs at scale, use ScraperAPI to handle proxy rotation and avoid hitting rate limits across thousands of paginated requests.

Next Steps

  • Handle rate limiting between pagination requests
  • Process responses with async HTTPX for speed
  • Store paginated results incrementally to avoid data loss