Guide
How to Scrape Reddit Posts and Comments
Learn how to scrape Reddit posts, comments, and subreddit data using Python. Covers the official API, old.reddit.com, and third-party tools.
Reddit is a goldmine for sentiment analysis, market research, and trend monitoring. Here is how to extract Reddit data effectively.
Method 1: Reddit's Official API (via PRAW)
The simplest and most reliable approach uses Reddit's official API through the PRAW library.
import praw
reddit = praw.Reddit(
client_id="YOUR_CLIENT_ID",
client_secret="YOUR_CLIENT_SECRET",
user_agent="scraper:v1.0 (by /u/yourusername)"
)
subreddit = reddit.subreddit("webdev")
for post in subreddit.hot(limit=25):
print(post.title, post.score, post.num_comments)
Pros: Official, stable, well-documented. Cons: Rate limited to 100 requests per minute. API access policies tightened significantly since 2023.
Method 2: JSON Endpoints
Reddit serves JSON data when you append .json to any URL.
import requests
url = "https://www.reddit.com/r/python/hot.json"
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers)
data = response.json()
for post in data["data"]["children"]:
print(post["data"]["title"])
This method is simple but rate-limited and may require proxy rotation for large-scale collection.
Method 3: Old Reddit + Scraping API
For bulk scraping without API limitations, combine old.reddit.com (which is lighter and easier to parse) with a scraping service like ScraperAPI.
import requests
from bs4 import BeautifulSoup
API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://old.reddit.com/r/datascience/"
resp = requests.get(f"http://api.scraperapi.com?api_key={API_KEY}&url={url}")
soup = BeautifulSoup(resp.text, "html.parser")
What Data to Extract
| Data Point | Source |
|---|---|
| Post title and body | Post page or API |
| Comments and threads | Comment API endpoint |
| Upvotes and scores | JSON data |
| User profiles | Profile pages |
| Subreddit metadata | About page |
Best Practices
- Prefer the official API when it meets your needs
- Use old.reddit.com for HTML scraping, it is much simpler to parse
- Rotate proxies with ScrapingAnt for large-scale jobs
- Store data incrementally, Reddit threads grow over time
- Respect robots.txt and rate limits