Scraping Dynamic Content Without a Browser
Extract data from JavaScript-heavy websites without using a browser. Discover hidden APIs, intercept XHR requests, and parse JSON responses.
Many websites load data dynamically using JavaScript. When you fetch the page with requests, the HTML is empty because the content has not been rendered yet. But you often do not need a browser, finding the underlying API is faster and more reliable.
Finding Hidden APIs
Modern websites typically load data from JSON APIs via XHR or Fetch requests. Here is how to find them:
- Open Chrome DevTools (F12)
- Go to the Network tab
- Filter by Fetch/XHR
- Interact with the page (scroll, click, search)
- Look for JSON responses containing the data you need
Scraping a JSON API Directly
Once you find the API endpoint, you can call it directly, no HTML parsing needed.
import requests
# This is the API endpoint discovered via DevTools
url = "https://quotes.toscrape.com/api/quotes"
params = {"page": 1}
all_quotes = []
has_next = True
while has_next:
response = requests.get(url, params=params)
data = response.json()
for quote in data["quotes"]:
all_quotes.append({
"text": quote["text"],
"author": quote["author"]["name"],
"tags": quote["tags"],
})
has_next = data["has_next"]
params["page"] += 1
print(f"Fetched {len(all_quotes)} quotes from the API")
Replicating XHR Requests
Sometimes API calls require specific headers or parameters. Copy them from DevTools.
import requests
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "application/json",
"Referer": "https://example.com/search",
"X-Requested-With": "XMLHttpRequest",
})
# Replicate the exact API call from DevTools
response = session.get(
"https://example.com/api/search",
params={"q": "python", "page": 1, "sort": "relevance"},
)
data = response.json()
for item in data.get("results", []):
print(item["title"])
Extracting Data from Embedded JSON
Some sites embed JSON data directly in the HTML inside <script> tags.
import requests
import json
import re
from bs4 import BeautifulSoup
response = requests.get("https://example.com/product/123")
soup = BeautifulSoup(response.text, "html.parser")
# Find script tags containing JSON data
for script in soup.select("script"):
text = script.string or ""
if "productData" in text:
# Extract the JSON object using regex
match = re.search(r"productData\s*=\s*(\{.*?\});", text, re.DOTALL)
if match:
product = json.loads(match.group(1))
print(f"Name: {product['name']}")
print(f"Price: {product['price']}")
break
Checking for __NEXT_DATA__ (Next.js Sites)
Next.js sites embed page data in a special script tag.
import requests
import json
from bs4 import BeautifulSoup
response = requests.get("https://example-nextjs-site.com/products")
soup = BeautifulSoup(response.text, "html.parser")
next_data = soup.select_one("script#__NEXT_DATA__")
if next_data:
data = json.loads(next_data.string)
props = data["props"]["pageProps"]
print(json.dumps(props, indent=2)[:500])
When You Really Need a Browser
Sometimes there is no hidden API, the content is rendered entirely client-side. In those cases:
- Use ScraperAPI with the
render=trueparameter to get rendered HTML without running your own browser. - Use ScrapingAnt which renders pages in a real browser and returns the final HTML.
- As a last resort, use Playwright or Selenium locally.
Tips
- Always check for hidden APIs first, they are faster and return cleaner data than scraping HTML.
- Look for
application/jsonresponses in the Network tab. - Check the page source for
<script type="application/ld+json">tags, these contain structured data (schema.org) that is easy to parse. - The
__NEXT_DATA__trick works on any Next.js site and gives you the full page data as JSON.
Next Steps
- Learn to use ScraperAPI for JavaScript rendering at scale
- Explore ScrapingAnt for browser-based scraping without infrastructure