Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Scraping Social Media APIs

Learn techniques for extracting data from social media platforms using their official APIs and alternative approaches.

API Scraping · #13advanced3 min read
Share:WhatsAppLinkedIn

Social media platforms are rich data sources for market research, sentiment analysis, and trend monitoring. Each platform has different API access levels and restrictions.

Reddit API (Most Scraper-Friendly)

Reddit provides generous free API access. Use the .json suffix or the official API:

import requests

headers = {"User-Agent": "ScrapingCentral/1.0 (educational)"}

# Append .json to any Reddit URL
url = "https://www.reddit.com/r/python/hot.json"
params = {"limit": 10}

response = requests.get(url, headers=headers, params=params, timeout=15)
response.raise_for_status()

posts = response.json()["data"]["children"]
for post in posts:
    p = post["data"]
    print(f"[{p['score']:>5} pts] {p['title'][:60]}")
    print(f"          r/{p['subreddit']} | {p['num_comments']} comments")
    print()

Reddit with PRAW (Official Wrapper)

import praw

reddit = praw.Reddit(
    client_id="YOUR_CLIENT_ID",
    client_secret="YOUR_CLIENT_SECRET",
    user_agent="ScrapingCentral/1.0",
)

subreddit = reddit.subreddit("webdev")
for post in subreddit.hot(limit=5):
    print(f"[{post.score}] {post.title}")
    # Access comments
    post.comments.replace_more(limit=0)
    for comment in post.comments[:3]:
        print(f"  -> {comment.body[:80]}")
pip install praw

Twitter/X API v2

Twitter now requires a developer account. The free tier is limited but functional:

import requests

bearer_token = "YOUR_BEARER_TOKEN"
headers = {"Authorization": f"Bearer {bearer_token}"}

# Search recent tweets
url = "https://api.twitter.com/2/tweets/search/recent"
params = {
    "query": "web scraping python -is:retweet lang:en",
    "max_results": 10,
    "tweet.fields": "created_at,public_metrics",
}

response = requests.get(url, headers=headers, params=params, timeout=15)
response.raise_for_status()

for tweet in response.json().get("data", []):
    metrics = tweet["public_metrics"]
    print(f"[{metrics['like_count']} likes] {tweet['text'][:80]}...")

YouTube Data API

import requests

api_key = "YOUR_YOUTUBE_API_KEY"
url = "https://www.googleapis.com/youtube/v3/search"
params = {
    "part": "snippet",
    "q": "python web scraping tutorial",
    "type": "video",
    "maxResults": 5,
    "key": api_key,
    "order": "viewCount",
}

response = requests.get(url, params=params, timeout=15)
videos = response.json().get("items", [])

for video in videos:
    title = video["snippet"]["title"]
    video_id = video["id"]["videoId"]
    print(f"{title}")
    print(f"  https://youtube.com/watch?v={video_id}\n")

API Access Comparison

Platform Free Tier Rate Limits Auth Required
Reddit Generous 60 req/min User-Agent only (basic)
Twitter/X Very limited 100 tweets/month (free) OAuth 2.0 Bearer
YouTube 10,000 units/day Per-endpoint quotas API Key
GitHub Generous 60/hr unauth, 5000/hr auth Optional (token)
LinkedIn Restricted Varies by product OAuth 2.0

Ethical Considerations

  • Respect rate limits, social platforms actively ban scrapers
  • Check Terms of Service, some platforms prohibit scraping
  • Avoid personal data, be cautious with user information (GDPR, CCPA)
  • Use official APIs when available instead of scraping the frontend

For social media sites that block direct API access, ScrapingAnt provides headless browser rendering that can load JavaScript-heavy social feeds.

Next Steps

  • Build a data pipeline for continuous social media monitoring
  • Compare API scraping vs HTML scraping approaches
  • Process and clean social media data with pandas