How to Scrape Glassdoor Reviews and Salaries

Learn how to scrape Glassdoor company reviews, salary data, and interview questions using Python and scraping APIs.

Glassdoor is a valuable source of company reviews, salary benchmarks, and interview data. Scraping it effectively requires dealing with login walls and dynamic content.

Useful Data on Glassdoor

Company reviews, Employee ratings and written reviews
Salary reports, Pay ranges by role and location
Interview questions, Common questions and difficulty ratings
Company info, Size, revenue, industry, headquarters
Job listings, Open positions and requirements

The Challenge

Glassdoor is one of the harder sites to scrape:

Login required for most content
Heavy JavaScript rendering, Data loads dynamically
Aggressive bot detection, CAPTCHAs and IP blocking
Rate limiting, Strict request limits

Recommended Approach: ScraperAPI

Given these challenges, using ScraperAPI is the most practical approach. It handles rendering, proxies, and anti-bot bypass.

import requests
from bs4 import BeautifulSoup

API_KEY = "YOUR_SCRAPERAPI_KEY"
company_url = "https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm"

resp = requests.get(
    f"http://api.scraperapi.com?api_key={API_KEY}&url={company_url}&render=true"
)
soup = BeautifulSoup(resp.text, "html.parser")

Extracting Review Data

Reviews typically contain:

# Example structure (actual selectors may vary)
reviews = soup.select("[data-test='reviewsList'] li")
for review in reviews:
    rating = review.select_one(".ratingNumber")
    title = review.select_one(".reviewLink")
    pros = review.select_one("[data-test='pros']")
    cons = review.select_one("[data-test='cons']")

Salary Data Extraction

Glassdoor salary pages show pay ranges by role. The data is often embedded in JSON within the page source.

Data Point	Notes
Base pay range	Median, low, high
Total compensation	Including bonuses, stock
Pay by experience	Entry, mid, senior
Location adjustment	Pay varies by city

Alternative: ScrapingAnt

ScrapingAnt also handles Glassdoor well, with built-in JavaScript rendering and residential proxy support.

Best Practices

Use rendered scraping, Glassdoor content is JavaScript-heavy
Handle pagination, Reviews span many pages
Aggregate data, Individual reviews are noisy; look for patterns across many reviews
Cache responses, Avoid re-scraping the same pages
Stay ethical, Do not use scraped data to identify individual reviewers