Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Guide

How to Scrape Reddit Posts and Comments

Learn how to scrape Reddit posts, comments, and subreddit data using Python. Covers the official API, old.reddit.com, and third-party tools.

Reddit is a goldmine for sentiment analysis, market research, and trend monitoring. Here is how to extract Reddit data effectively.

Method 1: Reddit's Official API (via PRAW)

The simplest and most reliable approach uses Reddit's official API through the PRAW library.

import praw

reddit = praw.Reddit(
    client_id="YOUR_CLIENT_ID",
    client_secret="YOUR_CLIENT_SECRET",
    user_agent="scraper:v1.0 (by /u/yourusername)"
)

subreddit = reddit.subreddit("webdev")
for post in subreddit.hot(limit=25):
    print(post.title, post.score, post.num_comments)

Pros: Official, stable, well-documented. Cons: Rate limited to 100 requests per minute. API access policies tightened significantly since 2023.

Method 2: JSON Endpoints

Reddit serves JSON data when you append .json to any URL.

import requests

url = "https://www.reddit.com/r/python/hot.json"
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers)
data = response.json()

for post in data["data"]["children"]:
    print(post["data"]["title"])

This method is simple but rate-limited and may require proxy rotation for large-scale collection.

Method 3: Old Reddit + Scraping API

For bulk scraping without API limitations, combine old.reddit.com (which is lighter and easier to parse) with a scraping service like ScraperAPI.

import requests
from bs4 import BeautifulSoup

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = "https://old.reddit.com/r/datascience/"

resp = requests.get(f"http://api.scraperapi.com?api_key={API_KEY}&url={url}")
soup = BeautifulSoup(resp.text, "html.parser")

What Data to Extract

Data Point Source
Post title and body Post page or API
Comments and threads Comment API endpoint
Upvotes and scores JSON data
User profiles Profile pages
Subreddit metadata About page

Best Practices

  1. Prefer the official API when it meets your needs
  2. Use old.reddit.com for HTML scraping, it is much simpler to parse
  3. Rotate proxies with ScrapingAnt for large-scale jobs
  4. Store data incrementally, Reddit threads grow over time
  5. Respect robots.txt and rate limits