Guide
How to Scrape Trustpilot Reviews
A complete guide to scraping Trustpilot reviews and ratings using Python. Extract business reviews for sentiment analysis and reputation monitoring.
Trustpilot hosts millions of business reviews, making it a key source for reputation monitoring and competitive analysis. Here is how to scrape it.
What Data to Extract
- Star ratings, Overall and per-review scores (1-5)
- Review text, Full written reviews
- Reviewer info, Name, location, review count
- Review date, When the review was posted
- Company response, Whether the business replied
- Verification status, Whether the review is verified
Basic Scraping Approach
Trustpilot pages are relatively well-structured and much of the data is in the initial HTML.
import requests
from bs4 import BeautifulSoup
url = "https://www.trustpilot.com/review/example.com"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}
resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, "html.parser")
reviews = soup.select("article[data-service-review-card-paper]")
for review in reviews:
rating = review.select_one("div[data-service-review-rating]")
text = review.select_one("p[data-service-review-text-typography]")
print(rating.get("data-service-review-rating") if rating else "N/A")
print(text.text.strip() if text else "N/A")
Scaling with ScraperAPI
For scraping multiple businesses or thousands of reviews, use ScraperAPI to avoid blocks.
API_KEY = "YOUR_SCRAPERAPI_KEY"
for page in range(1, 11):
url = f"https://www.trustpilot.com/review/example.com?page={page}"
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={url}"
)
# Parse reviews from each page
Handling Pagination
Trustpilot uses simple page-number pagination. Each page shows 20 reviews. Just increment the page query parameter.
JSON-LD Data
Trustpilot embeds structured data in JSON-LD format, which is easier to parse than HTML.
import json
script = soup.find("script", type="application/ld+json")
if script:
data = json.loads(script.string)
print(data.get("aggregateRating"))
Best Practices
- Use JSON-LD first, It contains clean, structured review data
- Paginate through all reviews, Do not just scrape the first page
- Track review dates, Useful for sentiment trends over time
- Use ScrapingAnt for volume, Residential proxies help avoid detection
- Respect rate limits, Add 2-3 second delays between requests