Guide
How to Scrape LinkedIn Profiles (Legally)
A guide to scraping LinkedIn data legally and ethically, covering API options, public data access, and the legal landscape around LinkedIn scraping.
LinkedIn scraping is one of the most requested yet legally complex scraping tasks. The 2022 hiQ Labs v. LinkedIn Supreme Court precedent and subsequent rulings have clarified some boundaries, but caution is still essential.
Legal Status
The legal landscape around LinkedIn scraping has evolved:
- Public profiles can be accessed and scraped (hiQ Labs v. LinkedIn ruling)
- Private/logged-in data is protected and scraping it may violate the CFAA
- LinkedIn's Terms of Service prohibit automated access, which creates civil liability risk
- GDPR and privacy laws apply to personal data in the EU and other jurisdictions
Always consult legal counsel before scraping LinkedIn at scale.
Method 1: LinkedIn's Official API
The safest approach is LinkedIn's official API, though it has limited data access:
import requests
# LinkedIn API requires OAuth 2.0 authentication
headers = {
"Authorization": "Bearer YOUR_ACCESS_TOKEN",
"Content-Type": "application/json"
}
# Get your own profile
response = requests.get(
"https://api.linkedin.com/v2/me",
headers=headers
)
print(response.json())
LinkedIn's API only provides limited profile data and requires explicit user consent via OAuth. This is suitable for applications where users share their own data.
Method 2: Public Profile Scraping
For public profile data, you can scrape the publicly visible HTML using a scraping API:
import requests
from bs4 import BeautifulSoup
# Use ScraperAPI with rendering for LinkedIn's JS-heavy pages
response = requests.get("https://api.scraperapi.com", params={
"api_key": "YOUR_SCRAPERAPI_KEY",
"url": "https://www.linkedin.com/in/public-profile-slug",
"render": "true",
"country_code": "us"
})
soup = BeautifulSoup(response.text, "html.parser")
name = soup.select_one("h1")
headline = soup.select_one(".text-body-medium")
if name:
print(f"Name: {name.text.strip()}")
if headline:
print(f"Headline: {headline.text.strip()}")
Method 3: Google Cache Approach
Scrape LinkedIn profiles indexed by Google to avoid hitting LinkedIn directly:
import requests
# Search Google for LinkedIn profiles
response = requests.get("https://api.scraperapi.com/structured/google/search", params={
"api_key": "YOUR_SCRAPERAPI_KEY",
"query": 'site:linkedin.com/in/ "software engineer" "San Francisco"',
"num": "10"
})
results = response.json()
for result in results.get("organic_results", []):
print(f"{result['title']} - {result['link']}")
Best Practices
- Only scrape public data visible without logging in
- Respect robots.txt and rate limits
- Do not store personal data without a legitimate purpose
- Comply with GDPR if processing EU residents' data
- Use a scraping API like ScraperAPI or ScrapingAnt to handle anti-bot protections
Verdict
LinkedIn scraping should be approached with caution. Use the official API when possible, limit scraping to public data, and always consult legal counsel. When you do scrape, ScraperAPI's rendering and proxy capabilities help ensure reliable access to public profiles.