Guide
How to Scrape Indeed Job Listings
A practical guide to scraping Indeed job listings with Python, covering techniques for extracting job titles, salaries, descriptions, and company data.
Indeed is the largest job search engine, making it a valuable data source for recruitment analytics, salary research, and market intelligence. Here is how to scrape it effectively.
Challenges
Indeed has moderate anti-bot protections:
- Rate limiting on repeated requests
- CAPTCHA challenges for suspicious traffic
- Dynamic JavaScript rendering for some content
- IP-based blocking for heavy scrapers
Setting Up
pip install requests beautifulsoup4
Scraping Indeed with ScraperAPI
import requests
from bs4 import BeautifulSoup
import json
import time
API_KEY = "YOUR_SCRAPERAPI_KEY"
def scrape_indeed(query, location, pages=3):
all_jobs = []
for page in range(0, pages * 10, 10):
url = f"https://www.indeed.com/jobs?q={query}&l={location}&start={page}"
response = requests.get("https://api.scraperapi.com", params={
"api_key": API_KEY,
"url": url,
"render": "true"
})
soup = BeautifulSoup(response.text, "html.parser")
job_cards = soup.select(".job_seen_beacon")
for card in job_cards:
title_el = card.select_one("h2.jobTitle a span")
company_el = card.select_one("[data-testid='company-name']")
location_el = card.select_one("[data-testid='text-location']")
salary_el = card.select_one(".salary-snippet-container")
job = {
"title": title_el.text.strip() if title_el else "N/A",
"company": company_el.text.strip() if company_el else "N/A",
"location": location_el.text.strip() if location_el else "N/A",
"salary": salary_el.text.strip() if salary_el else "Not listed",
}
all_jobs.append(job)
time.sleep(2) # Respectful delay between pages
return all_jobs
jobs = scrape_indeed("python developer", "New York, NY")
for job in jobs:
print(f"{job['title']} at {job['company']} - {job['location']}")
print(f" Salary: {job['salary']}")
Scraping Job Descriptions
To get full job descriptions, you need to visit each individual job page:
def get_job_description(job_url):
response = requests.get("https://api.scraperapi.com", params={
"api_key": API_KEY,
"url": job_url,
"render": "true"
})
soup = BeautifulSoup(response.text, "html.parser")
description = soup.select_one("#jobDescriptionText")
if description:
return description.text.strip()
return "Description not available"
Using ScrapingAnt Alternative
import requests
from bs4 import BeautifulSoup
response = requests.get("https://api.scrapingant.com/v2/general", params={
"x-api-key": "YOUR_SCRAPINGANT_KEY",
"url": "https://www.indeed.com/jobs?q=data+engineer&l=Remote",
"browser": "true"
})
html = response.json()["content"]
soup = BeautifulSoup(html, "html.parser")
# Parse as shown above
Best Practices
- Add delays between requests (2-5 seconds minimum)
- Use a scraping API for proxy rotation and anti-bot bypass
- Cache results to minimize redundant requests
- Respect Indeed's robots.txt directives
- Consider Indeed's Publisher API for legitimate affiliate use cases
Verdict
Indeed scraping is straightforward with the right tools. ScraperAPI with JavaScript rendering handles Indeed's protections well, delivering consistent results. ScrapingAnt is equally capable as an alternative. Always scrape responsibly and consider official APIs first.