Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Scraping with Zyte API

Use Zyte API (formerly Scrapy Cloud) for intelligent web scraping with automatic extraction, browser rendering, and anti-bot bypass.

Python Scraping · #26intermediate3 min read
Share:WhatsAppLinkedIn

Zyte (formerly Scrapinghub) is the company behind Scrapy. Their Zyte API provides intelligent web scraping with automatic data extraction, browser rendering, and anti-ban technology. It is particularly powerful when combined with Scrapy projects.

Installation

pip install zyte-api

Basic Usage

import requests
import json

ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"

response = requests.post(
    "https://api.zyte.com/v1/extract",
    auth=(ZYTE_API_KEY, ""),
    json={
        "url": "https://quotes.toscrape.com/",
        "httpResponseBody": True,
    },
)

data = response.json()

# The response body is base64-encoded
import base64
html = base64.b64decode(data["httpResponseBody"]).decode("utf-8")

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")

for quote in soup.select("div.quote"):
    text = quote.select_one("span.text").get_text()
    author = quote.select_one("small.author").get_text()
    print(f"{author}: {text[:50]}...")

Using the Python Client

The official client simplifies authentication and response handling.

import asyncio
from zyte_api import AsyncZyteAPI


async def main():
    client = AsyncZyteAPI(api_key="YOUR_ZYTE_API_KEY")

    result = await client.get({
        "url": "https://quotes.toscrape.com/",
        "httpResponseBody": True,
    })

    import base64
    from bs4 import BeautifulSoup

    html = base64.b64decode(result["httpResponseBody"]).decode()
    soup = BeautifulSoup(html, "html.parser")

    for quote in soup.select("div.quote")[:5]:
        print(quote.select_one("span.text").get_text()[:60])


asyncio.run(main())

Browser Rendering

For JavaScript-heavy pages, enable browser rendering.

import requests
import base64

ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"

response = requests.post(
    "https://api.zyte.com/v1/extract",
    auth=(ZYTE_API_KEY, ""),
    json={
        "url": "https://example.com/spa-page",
        "browserHtml": True,  # Renders JavaScript
    },
)

data = response.json()
rendered_html = data["browserHtml"]

from bs4 import BeautifulSoup
soup = BeautifulSoup(rendered_html, "html.parser")
print(soup.select_one("title").get_text())

Automatic Data Extraction

Zyte API can automatically extract structured product or article data without writing selectors.

import requests
import json

ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"

# Auto-extract product data
response = requests.post(
    "https://api.zyte.com/v1/extract",
    auth=(ZYTE_API_KEY, ""),
    json={
        "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
        "product": True,  # Auto-extract product fields
    },
)

product = response.json().get("product", {})
print(json.dumps(product, indent=2))
# Returns structured data: name, price, description, images, etc.

Integrating Zyte API with Scrapy

# settings.py
DOWNLOAD_HANDLERS = {
    "http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
    "https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"
pip install scrapy-zyte-api
import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes_zyte"
    start_urls = ["https://quotes.toscrape.com/"]

    custom_settings = {
        "ZYTE_API_TRANSPARENT_MODE": True,
    }

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
            }

        next_page = response.css("li.next a::attr(href)").get()
        if next_page:
            yield response.follow(next_page, self.parse)

Zyte API Features

Feature Description
httpResponseBody Raw HTTP response
browserHtml JavaScript-rendered HTML
product Auto-extract product data
article Auto-extract article data
screenshot Page screenshots
Geo-targeting Route through specific countries
Session management Sticky sessions for multi-page flows

Zyte vs ScraperAPI vs ScrapingAnt

Feature Zyte ScraperAPI ScrapingAnt
Scrapy integration Native Via middleware Via middleware
Auto-extraction Yes No No
Browser rendering Yes Yes Yes (default)
Free tier 1,000 URLs 5,000 calls 10,000 credits

Tips

  • Use auto-extraction (product: True, article: True) when possible, it saves you from writing and maintaining selectors.
  • browserHtml costs more than httpResponseBody, so only use it when the page requires JavaScript rendering.
  • For simpler use cases, ScraperAPI and ScrapingAnt offer faster setup with straightforward proxy APIs.
  • Zyte's Scrapy integration is the most seamless if you are already using the Scrapy framework.

Next Steps

  • Learn web scraping best practices and patterns to build reliable, maintainable scrapers