Scraping with Zyte API - Python Scraping

Use Zyte API (formerly Scrapy Cloud) for intelligent web scraping with automatic extraction, browser rendering, and anti-bot bypass.

Zyte (formerly Scrapinghub) is the company behind Scrapy. Their Zyte API provides intelligent web scraping with automatic data extraction, browser rendering, and anti-ban technology. It is particularly powerful when combined with Scrapy projects.

Installation

pip install zyte-api

Basic Usage

import requests
import json

ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"

response = requests.post(
    "https://api.zyte.com/v1/extract",
    auth=(ZYTE_API_KEY, ""),
    json={
        "url": "https://quotes.toscrape.com/",
        "httpResponseBody": True,
    },
)

data = response.json()

# The response body is base64-encoded
import base64
html = base64.b64decode(data["httpResponseBody"]).decode("utf-8")

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")

for quote in soup.select("div.quote"):
    text = quote.select_one("span.text").get_text()
    author = quote.select_one("small.author").get_text()
    print(f"{author}: {text[:50]}...")

Using the Python Client

The official client simplifies authentication and response handling.

import asyncio
from zyte_api import AsyncZyteAPI


async def main():
    client = AsyncZyteAPI(api_key="YOUR_ZYTE_API_KEY")

    result = await client.get({
        "url": "https://quotes.toscrape.com/",
        "httpResponseBody": True,
    })

    import base64
    from bs4 import BeautifulSoup

    html = base64.b64decode(result["httpResponseBody"]).decode()
    soup = BeautifulSoup(html, "html.parser")

    for quote in soup.select("div.quote")[:5]:
        print(quote.select_one("span.text").get_text()[:60])


asyncio.run(main())

Browser Rendering

For JavaScript-heavy pages, enable browser rendering.

import requests
import base64

ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"

response = requests.post(
    "https://api.zyte.com/v1/extract",
    auth=(ZYTE_API_KEY, ""),
    json={
        "url": "https://example.com/spa-page",
        "browserHtml": True,  # Renders JavaScript
    },
)

data = response.json()
rendered_html = data["browserHtml"]

from bs4 import BeautifulSoup
soup = BeautifulSoup(rendered_html, "html.parser")
print(soup.select_one("title").get_text())

Automatic Data Extraction

Zyte API can automatically extract structured product or article data without writing selectors.

import requests
import json

ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"

# Auto-extract product data
response = requests.post(
    "https://api.zyte.com/v1/extract",
    auth=(ZYTE_API_KEY, ""),
    json={
        "url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
        "product": True,  # Auto-extract product fields
    },
)

product = response.json().get("product", {})
print(json.dumps(product, indent=2))
# Returns structured data: name, price, description, images, etc.

Integrating Zyte API with Scrapy

# settings.py
DOWNLOAD_HANDLERS = {
    "http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
    "https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
ZYTE_API_KEY = "YOUR_ZYTE_API_KEY"

pip install scrapy-zyte-api

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes_zyte"
    start_urls = ["https://quotes.toscrape.com/"]

    custom_settings = {
        "ZYTE_API_TRANSPARENT_MODE": True,
    }

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
            }

        next_page = response.css("li.next a::attr(href)").get()
        if next_page:
            yield response.follow(next_page, self.parse)

Zyte API Features

Feature	Description
`httpResponseBody`	Raw HTTP response
`browserHtml`	JavaScript-rendered HTML
`product`	Auto-extract product data
`article`	Auto-extract article data
`screenshot`	Page screenshots
Geo-targeting	Route through specific countries
Session management	Sticky sessions for multi-page flows

Zyte vs ScraperAPI vs ScrapingAnt

Feature	Zyte	ScraperAPI	ScrapingAnt
Scrapy integration	Native	Via middleware	Via middleware
Auto-extraction	Yes	No	No
Browser rendering	Yes	Yes	Yes (default)
Free tier	1,000 URLs	5,000 calls	10,000 credits

Tips

Use auto-extraction (product: True, article: True) when possible, it saves you from writing and maintaining selectors.
browserHtml costs more than httpResponseBody, so only use it when the page requires JavaScript rendering.
For simpler use cases, ScraperAPI and ScrapingAnt offer faster setup with straightforward proxy APIs.
Zyte's Scrapy integration is the most seamless if you are already using the Scrapy framework.

Next Steps

Learn web scraping best practices and patterns to build reliable, maintainable scrapers