Scraping Social Media APIs
Learn techniques for extracting data from social media platforms using their official APIs and alternative approaches.
API Scraping · #13advanced3 min read
Social media platforms are rich data sources for market research, sentiment analysis, and trend monitoring. Each platform has different API access levels and restrictions.
Reddit API (Most Scraper-Friendly)
Reddit provides generous free API access. Use the .json suffix or the official API:
import requests
headers = {"User-Agent": "ScrapingCentral/1.0 (educational)"}
# Append .json to any Reddit URL
url = "https://www.reddit.com/r/python/hot.json"
params = {"limit": 10}
response = requests.get(url, headers=headers, params=params, timeout=15)
response.raise_for_status()
posts = response.json()["data"]["children"]
for post in posts:
p = post["data"]
print(f"[{p['score']:>5} pts] {p['title'][:60]}")
print(f" r/{p['subreddit']} | {p['num_comments']} comments")
print()
Reddit with PRAW (Official Wrapper)
import praw
reddit = praw.Reddit(
client_id="YOUR_CLIENT_ID",
client_secret="YOUR_CLIENT_SECRET",
user_agent="ScrapingCentral/1.0",
)
subreddit = reddit.subreddit("webdev")
for post in subreddit.hot(limit=5):
print(f"[{post.score}] {post.title}")
# Access comments
post.comments.replace_more(limit=0)
for comment in post.comments[:3]:
print(f" -> {comment.body[:80]}")
pip install praw
Twitter/X API v2
Twitter now requires a developer account. The free tier is limited but functional:
import requests
bearer_token = "YOUR_BEARER_TOKEN"
headers = {"Authorization": f"Bearer {bearer_token}"}
# Search recent tweets
url = "https://api.twitter.com/2/tweets/search/recent"
params = {
"query": "web scraping python -is:retweet lang:en",
"max_results": 10,
"tweet.fields": "created_at,public_metrics",
}
response = requests.get(url, headers=headers, params=params, timeout=15)
response.raise_for_status()
for tweet in response.json().get("data", []):
metrics = tweet["public_metrics"]
print(f"[{metrics['like_count']} likes] {tweet['text'][:80]}...")
YouTube Data API
import requests
api_key = "YOUR_YOUTUBE_API_KEY"
url = "https://www.googleapis.com/youtube/v3/search"
params = {
"part": "snippet",
"q": "python web scraping tutorial",
"type": "video",
"maxResults": 5,
"key": api_key,
"order": "viewCount",
}
response = requests.get(url, params=params, timeout=15)
videos = response.json().get("items", [])
for video in videos:
title = video["snippet"]["title"]
video_id = video["id"]["videoId"]
print(f"{title}")
print(f" https://youtube.com/watch?v={video_id}\n")
API Access Comparison
| Platform | Free Tier | Rate Limits | Auth Required |
|---|---|---|---|
| Generous | 60 req/min | User-Agent only (basic) | |
| Twitter/X | Very limited | 100 tweets/month (free) | OAuth 2.0 Bearer |
| YouTube | 10,000 units/day | Per-endpoint quotas | API Key |
| GitHub | Generous | 60/hr unauth, 5000/hr auth | Optional (token) |
| Restricted | Varies by product | OAuth 2.0 |
Ethical Considerations
- Respect rate limits, social platforms actively ban scrapers
- Check Terms of Service, some platforms prohibit scraping
- Avoid personal data, be cautious with user information (GDPR, CCPA)
- Use official APIs when available instead of scraping the frontend
For social media sites that block direct API access, ScrapingAnt provides headless browser rendering that can load JavaScript-heavy social feeds.
Next Steps
- Build a data pipeline for continuous social media monitoring
- Compare API scraping vs HTML scraping approaches
- Process and clean social media data with pandas