Guide
How to Scrape Glassdoor Reviews and Salaries
Learn how to scrape Glassdoor company reviews, salary data, and interview questions using Python and scraping APIs.
Glassdoor is a valuable source of company reviews, salary benchmarks, and interview data. Scraping it effectively requires dealing with login walls and dynamic content.
Useful Data on Glassdoor
- Company reviews, Employee ratings and written reviews
- Salary reports, Pay ranges by role and location
- Interview questions, Common questions and difficulty ratings
- Company info, Size, revenue, industry, headquarters
- Job listings, Open positions and requirements
The Challenge
Glassdoor is one of the harder sites to scrape:
- Login required for most content
- Heavy JavaScript rendering, Data loads dynamically
- Aggressive bot detection, CAPTCHAs and IP blocking
- Rate limiting, Strict request limits
Recommended Approach: ScraperAPI
Given these challenges, using ScraperAPI is the most practical approach. It handles rendering, proxies, and anti-bot bypass.
import requests
from bs4 import BeautifulSoup
API_KEY = "YOUR_SCRAPERAPI_KEY"
company_url = "https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm"
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={company_url}&render=true"
)
soup = BeautifulSoup(resp.text, "html.parser")
Extracting Review Data
Reviews typically contain:
# Example structure (actual selectors may vary)
reviews = soup.select("[data-test='reviewsList'] li")
for review in reviews:
rating = review.select_one(".ratingNumber")
title = review.select_one(".reviewLink")
pros = review.select_one("[data-test='pros']")
cons = review.select_one("[data-test='cons']")
Salary Data Extraction
Glassdoor salary pages show pay ranges by role. The data is often embedded in JSON within the page source.
| Data Point | Notes |
|---|---|
| Base pay range | Median, low, high |
| Total compensation | Including bonuses, stock |
| Pay by experience | Entry, mid, senior |
| Location adjustment | Pay varies by city |
Alternative: ScrapingAnt
ScrapingAnt also handles Glassdoor well, with built-in JavaScript rendering and residential proxy support.
Best Practices
- Use rendered scraping, Glassdoor content is JavaScript-heavy
- Handle pagination, Reviews span many pages
- Aggregate data, Individual reviews are noisy; look for patterns across many reviews
- Cache responses, Avoid re-scraping the same pages
- Stay ethical, Do not use scraped data to identify individual reviewers