SERP-API-Specific Features: Async Searches, Search Archives, Location Lookups
Beyond the basic search call, providers ship features that change what's possible. Async batching, history archives, location helpers, and more.
What you’ll learn
- Use async batch submission for high-volume workloads.
- Query a provider's search archive for historical data.
- Look up valid location strings via a location-discovery endpoint.
- Take advantage of screenshots, HTML capture, and other provider extras.
The basic "submit a query, get JSON" is table stakes. Modern SERP-APIs ship additional features that change what kinds of scrapers you can build. This lesson tours the most useful ones.
Specifics vary by provider; the concepts are universal.
Async batch searches
Synchronous calls block until results arrive, 2–10 seconds per call. For 10k searches, that's 5–25 hours. Async batching:
- Submit many queries at once via a
/batchendpoint. - Provider returns an immediate
batch_id. - Provider runs queries in parallel internally.
- You poll
/batch/{id}/statusor get a webhook callback when done. - Download results via
/batch/{id}/results.
def submit_batch(queries: list[dict]) -> str:
r = requests.post(f"{API_URL}/batch", json={
"searches": queries,
"api_key": API_KEY,
})
return r.json()["batch_id"]
def wait_for_batch(batch_id: str) -> list[dict]:
while True:
status = requests.get(f"{API_URL}/batch/{batch_id}/status",
params={"api_key": API_KEY}).json()
if status["state"] == "completed":
break
time.sleep(5)
return requests.get(f"{API_URL}/batch/{batch_id}/results",
params={"api_key": API_KEY}).json()["searches"]
batch_id = submit_batch([
{"q": "iphone 15", "gl": "us"},
{"q": "samsung galaxy", "gl": "us"},
# ... 10k more
])
results = wait_for_batch(batch_id)
Batches typically complete 10–100x faster than sequential. Sometimes priced slightly differently, confirm with your provider.
Search archives
Some providers persist every result indefinitely (or for N days/months) and let you re-fetch:
# Re-fetch a previous result by search ID
def get_archived(search_id: str) -> dict:
return requests.get(f"{API_URL}/searches/{search_id}",
params={"api_key": API_KEY}).json()
Why it matters:
- Replay analyses. If you discover a new field you should have captured, re-read past results.
- Audit / compliance. Prove your scraper saw X on date Y.
- Cost optimization. Sometimes free or cheaper to re-fetch from archive than re-run the search.
- Comparison studies. "What did this query look like 6 months ago?"
Whether your provider supports it varies. Some only retain for 14 days; some indefinitely (with archive-specific pricing).
Location lookup endpoints
Trial-and-error on location= strings is painful. Many providers expose:
def lookup_locations(query: str) -> list[dict]:
return requests.get(f"{API_URL}/locations", params={
"q": query,
"api_key": API_KEY,
}).json()
print(lookup_locations("Chicago"))
# → [{"name": "Chicago, IL, United States", "canonical_name": "...",
# "google_id": "...", "country_code": "US"...}]
Use the canonical_name directly in subsequent search calls. Avoids typos and ensures the provider's internal location-resolution succeeds.
Screenshots and HTML capture
Beyond JSON, some providers offer:
- Screenshot of the SERP, PNG/JPEG, useful for audits or report screenshots.
- Full HTML, the raw page (sometimes JS-rendered). Useful when the JSON misses a niche feature.
r = requests.get(API_URL, params={
"q": "iphone 15",
"engine": "google",
"api_key": API_KEY,
"screenshot": "true",
"html": "true",
})
data = r.json()
# data["screenshot_url"] = "https://...png"
# data["html"] = "<html>...</html>"
Premium features, additional cost.
Schema inspection
Some providers offer a self-documenting JSON schema endpoint:
schema = requests.get(f"{API_URL}/schema/google").json()
# Returns the expected shape, field types, descriptions
Useful for generating typed clients (Pydantic, attrs) or documentation.
Bulk export and webhooks
For pipelines that need ongoing data flow:
- Webhooks, provider POSTs to your endpoint when batches complete.
- S3 / GCS export, large batches dropped into your storage bucket.
- Streaming endpoints, some providers offer SSE or similar for ongoing search streams.
Typical for enterprise tier; less common on starter plans.
Combined-engine searches
A few providers let you submit "search across multiple engines in one call":
data = requests.get(API_URL, params={
"q": "best vpn",
"engines": "google,bing,duckduckgo", # combined
"api_key": API_KEY,
}).json()
# data['searches'] = [{'engine': 'google', 'organic_results': ...}...]
Saves orchestration overhead but bills per engine.
Other useful extras
- Cached results. Some providers cache for N minutes; cheap re-fetches return the same data.
include_html=truefor SERP HTML if you want to do your own parsing.safe_searchparameter, filter explicit content.device_user_agent, supply your own UA for ultra-specific device emulation.uuleprecision, pass an exact location string.
Read your provider's docs end-to-end at least once. The features you don't know about are the ones you can't use.
When to use which feature
| Need | Feature |
|---|---|
| 10k+ queries per run | Async batches |
| Re-analyze old data | Search archive |
| Hyper-precise location targeting | Lat-lng or location lookup endpoint |
| Audit screenshots | Screenshot capture |
| Custom parsing | HTML capture |
| Continuous pipeline | Webhooks + S3 export |
A combined example
A nightly batch of 5k keywords with archive-backed re-analysis:
import requests, time
# 1. Submit batch
batch_id = requests.post(f"{API_URL}/batch", json={
"searches": [{"q": kw, "gl": "us", "hl": "en"} for kw in load_keywords()],
"api_key": API_KEY,
}).json()["batch_id"]
# 2. Wait
while True:
s = requests.get(f"{API_URL}/batch/{batch_id}/status",
params={"api_key": API_KEY}).json()
if s["state"] == "completed": break
time.sleep(15)
# 3. Process
results = requests.get(f"{API_URL}/batch/{batch_id}/results",
params={"api_key": API_KEY}).json()["searches"]
for r in results:
persist(r)
# 4. Later, re-fetch from archive to capture a new field
for search_id in stored_search_ids:
data = requests.get(f"{API_URL}/searches/{search_id}",
params={"api_key": API_KEY}).json()
update_with_new_field(data)
Hands-on lab
Conceptual lesson, feature availability depends on your provider. Action: read your provider's API reference end-to-end. Note three features you haven't used. Try one of them on a small batch. Most teams only use 10-20% of what their provider offers; expanding even slightly can unlock significant capability.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.