Guide
How to Scrape Craigslist Listings
Step-by-step guide to scraping Craigslist listings for housing, jobs, and sales data. Includes Python code and anti-blocking techniques.
Craigslist remains one of the largest classified ad platforms. Scraping it is useful for market research, rental analysis, and deal hunting.
What You Can Scrape
- Housing listings, Rent prices, locations, square footage
- Job postings, Titles, salaries, company info
- For sale items, Prices, descriptions, images
- Services, Local service provider listings
- Community posts, Events and activities
Basic Scraping with Python
Craigslist pages are relatively simple HTML, making them easy to parse.
import requests
from bs4 import BeautifulSoup
url = "https://newyork.craigslist.org/search/apa#search=1~gallery~0~0"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"}
resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, "html.parser")
listings = soup.select(".cl-static-search-result")
for listing in listings:
title = listing.select_one(".title")
price = listing.select_one(".price")
print(title.text.strip() if title else "N/A", price.text.strip() if price else "N/A")
Scaling Up with ScraperAPI
For scraping multiple cities or categories, use ScraperAPI to handle proxy rotation and avoid IP bans.
API_KEY = "YOUR_SCRAPERAPI_KEY"
cities = ["newyork", "losangeles", "chicago", "houston"]
for city in cities:
url = f"https://{city}.craigslist.org/search/apa"
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={url}"
)
# Parse each city's results
Handling Pagination
Craigslist uses offset-based pagination. Increment the s parameter by 120 for each page.
for offset in range(0, 480, 120):
url = f"https://newyork.craigslist.org/search/apa?s={offset}"
# Fetch and parse each page
Challenges and Solutions
| Challenge | Solution |
|---|---|
| IP blocking | Use rotating proxies via ScrapingAnt |
| Geo-restrictions | Access city-specific subdomains |
| Stale listings | Filter by date, check for removal |
| Phone number images | OCR tools for image-based contact info |
Best Practices
- Add delays between requests, Craigslist blocks aggressive scrapers quickly
- Scrape during off-peak hours, Lower chance of rate limiting
- Deduplicate listings, The same item may appear in multiple searches
- Respect Craigslist's ToS, Do not republish their content or scrape for spam purposes