Scraping with Zyte API
Use Zyte API (formerly Scrapy Cloud) for intelligent web scraping with automatic extraction, browser rendering, and anti-bot bypass.
Python Scraping · #26intermediate3 min read
Zyte (formerly Scrapinghub) is the company behind Scrapy. Their Zyte API provides intelligent web scraping with automatic data extraction, browser rendering, and anti-ban technology. It is particularly powerful when combined with Scrapy projects.
Installation
pip install zyte-api
Basic Usage
import requests
import json
ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"
response = requests.post(
"https://api.zyte.com/v1/extract",
auth=(ZYTE_API_KEY, ""),
json={
"url": "https://quotes.toscrape.com/",
"httpResponseBody": True,
},
)
data = response.json()
# The response body is base64-encoded
import base64
html = base64.b64decode(data["httpResponseBody"]).decode("utf-8")
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for quote in soup.select("div.quote"):
text = quote.select_one("span.text").get_text()
author = quote.select_one("small.author").get_text()
print(f"{author}: {text[:50]}...")
Using the Python Client
The official client simplifies authentication and response handling.
import asyncio
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI(api_key="YOUR_ZYTE_API_KEY")
result = await client.get({
"url": "https://quotes.toscrape.com/",
"httpResponseBody": True,
})
import base64
from bs4 import BeautifulSoup
html = base64.b64decode(result["httpResponseBody"]).decode()
soup = BeautifulSoup(html, "html.parser")
for quote in soup.select("div.quote")[:5]:
print(quote.select_one("span.text").get_text()[:60])
asyncio.run(main())
Browser Rendering
For JavaScript-heavy pages, enable browser rendering.
import requests
import base64
ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"
response = requests.post(
"https://api.zyte.com/v1/extract",
auth=(ZYTE_API_KEY, ""),
json={
"url": "https://example.com/spa-page",
"browserHtml": True, # Renders JavaScript
},
)
data = response.json()
rendered_html = data["browserHtml"]
from bs4 import BeautifulSoup
soup = BeautifulSoup(rendered_html, "html.parser")
print(soup.select_one("title").get_text())
Automatic Data Extraction
Zyte API can automatically extract structured product or article data without writing selectors.
import requests
import json
ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"
# Auto-extract product data
response = requests.post(
"https://api.zyte.com/v1/extract",
auth=(ZYTE_API_KEY, ""),
json={
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"product": True, # Auto-extract product fields
},
)
product = response.json().get("product", {})
print(json.dumps(product, indent=2))
# Returns structured data: name, price, description, images, etc.
Integrating Zyte API with Scrapy
# settings.py
DOWNLOAD_HANDLERS = {
"http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
"https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"
pip install scrapy-zyte-api
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes_zyte"
start_urls = ["https://quotes.toscrape.com/"]
custom_settings = {
"ZYTE_API_TRANSPARENT_MODE": True,
}
def parse(self, response):
for quote in response.css("div.quote"):
yield {
"text": quote.css("span.text::text").get(),
"author": quote.css("small.author::text").get(),
}
next_page = response.css("li.next a::attr(href)").get()
if next_page:
yield response.follow(next_page, self.parse)
Zyte API Features
| Feature | Description |
|---|---|
httpResponseBody |
Raw HTTP response |
browserHtml |
JavaScript-rendered HTML |
product |
Auto-extract product data |
article |
Auto-extract article data |
screenshot |
Page screenshots |
| Geo-targeting | Route through specific countries |
| Session management | Sticky sessions for multi-page flows |
Zyte vs ScraperAPI vs ScrapingAnt
| Feature | Zyte | ScraperAPI | ScrapingAnt |
|---|---|---|---|
| Scrapy integration | Native | Via middleware | Via middleware |
| Auto-extraction | Yes | No | No |
| Browser rendering | Yes | Yes | Yes (default) |
| Free tier | 1,000 URLs | 5,000 calls | 10,000 credits |
Tips
- Use auto-extraction (
product: True,article: True) when possible, it saves you from writing and maintaining selectors. browserHtmlcosts more thanhttpResponseBody, so only use it when the page requires JavaScript rendering.- For simpler use cases, ScraperAPI and ScrapingAnt offer faster setup with straightforward proxy APIs.
- Zyte's Scrapy integration is the most seamless if you are already using the Scrapy framework.
Next Steps
- Learn web scraping best practices and patterns to build reliable, maintainable scrapers