How to Scrape Trustpilot Reviews

A complete guide to scraping Trustpilot reviews and ratings using Python. Extract business reviews for sentiment analysis and reputation monitoring.

Trustpilot hosts millions of business reviews, making it a key source for reputation monitoring and competitive analysis. Here is how to scrape it.

What Data to Extract

Star ratings, Overall and per-review scores (1-5)
Review text, Full written reviews
Reviewer info, Name, location, review count
Review date, When the review was posted
Company response, Whether the business replied
Verification status, Whether the review is verified

Basic Scraping Approach

Trustpilot pages are relatively well-structured and much of the data is in the initial HTML.

import requests
from bs4 import BeautifulSoup

url = "https://www.trustpilot.com/review/example.com"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}

resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, "html.parser")

reviews = soup.select("article[data-service-review-card-paper]")
for review in reviews:
    rating = review.select_one("div[data-service-review-rating]")
    text = review.select_one("p[data-service-review-text-typography]")
    print(rating.get("data-service-review-rating") if rating else "N/A")
    print(text.text.strip() if text else "N/A")

Scaling with ScraperAPI

For scraping multiple businesses or thousands of reviews, use ScraperAPI to avoid blocks.

API_KEY = "YOUR_SCRAPERAPI_KEY"

for page in range(1, 11):
    url = f"https://www.trustpilot.com/review/example.com?page={page}"
    resp = requests.get(
        f"http://api.scraperapi.com?api_key={API_KEY}&url={url}"
    )
    # Parse reviews from each page

Handling Pagination

Trustpilot uses simple page-number pagination. Each page shows 20 reviews. Just increment the page query parameter.

JSON-LD Data

Trustpilot embeds structured data in JSON-LD format, which is easier to parse than HTML.

import json

script = soup.find("script", type="application/ld+json")
if script:
    data = json.loads(script.string)
    print(data.get("aggregateRating"))

Best Practices

Use JSON-LD first, It contains clean, structured review data
Paginate through all reviews, Do not just scrape the first page
Track review dates, Useful for sentiment trends over time
Use ScrapingAnt for volume, Residential proxies help avoid detection
Respect rate limits, Add 2-3 second delays between requests