How to Scrape Reddit Posts and Comments

Learn how to scrape Reddit posts, comments, and subreddit data using Python. Covers the official API, old.reddit.com, and third-party tools.

Reddit is a goldmine for sentiment analysis, market research, and trend monitoring. Here is how to extract Reddit data effectively.

Method 1: Reddit's Official API (via PRAW)

The simplest and most reliable approach uses Reddit's official API through the PRAW library.

import praw

reddit = praw.Reddit(
    client_id="YOUR_CLIENT_ID",
    client_secret="YOUR_CLIENT_SECRET",
    user_agent="scraper:v1.0 (by /u/yourusername)"
)

subreddit = reddit.subreddit("webdev")
for post in subreddit.hot(limit=25):
    print(post.title, post.score, post.num_comments)

Pros: Official, stable, well-documented. Cons: Rate limited to 100 requests per minute. API access policies tightened significantly since 2023.

Method 2: JSON Endpoints

Reddit serves JSON data when you append .json to any URL.

import requests

url = "https://www.reddit.com/r/python/hot.json"
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers)
data = response.json()

for post in data["data"]["children"]:
    print(post["data"]["title"])

This method is simple but rate-limited and may require proxy rotation for large-scale collection.

Method 3: Old Reddit + Scraping API

For bulk scraping without API limitations, combine old.reddit.com (which is lighter and easier to parse) with a scraping service like ScraperAPI.

import requests
from bs4 import BeautifulSoup

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://old.reddit.com/r/datascience/"

resp = requests.get(f"http://api.scraperapi.com?api_key={API_KEY}&url={url}")
soup = BeautifulSoup(resp.text, "html.parser")

What Data to Extract

Data Point	Source
Post title and body	Post page or API
Comments and threads	Comment API endpoint
Upvotes and scores	JSON data
User profiles	Profile pages
Subreddit metadata	About page

Best Practices

Prefer the official API when it meets your needs
Use old.reddit.com for HTML scraping, it is much simpler to parse
Rotate proxies with ScrapingAnt for large-scale jobs
Store data incrementally, Reddit threads grow over time
Respect robots.txt and rate limits