How to Scrape LinkedIn Profiles (Legally)

A guide to scraping LinkedIn data legally and ethically, covering API options, public data access, and the legal landscape around LinkedIn scraping.

LinkedIn scraping is one of the most requested yet legally complex scraping tasks. The 2022 hiQ Labs v. LinkedIn Supreme Court precedent and subsequent rulings have clarified some boundaries, but caution is still essential.

Legal Status

The legal landscape around LinkedIn scraping has evolved:

Public profiles can be accessed and scraped (hiQ Labs v. LinkedIn ruling)
Private/logged-in data is protected and scraping it may violate the CFAA
LinkedIn's Terms of Service prohibit automated access, which creates civil liability risk
GDPR and privacy laws apply to personal data in the EU and other jurisdictions

Always consult legal counsel before scraping LinkedIn at scale.

Method 1: LinkedIn's Official API

The safest approach is LinkedIn's official API, though it has limited data access:

import requests

# LinkedIn API requires OAuth 2.0 authentication
headers = {
    "Authorization": "Bearer YOUR_ACCESS_TOKEN",
    "Content-Type": "application/json"
}

# Get your own profile
response = requests.get(
    "https://api.linkedin.com/v2/me",
    headers=headers
)
print(response.json())

LinkedIn's API only provides limited profile data and requires explicit user consent via OAuth. This is suitable for applications where users share their own data.

Method 2: Public Profile Scraping

For public profile data, you can scrape the publicly visible HTML using a scraping API:

import requests
from bs4 import BeautifulSoup

# Use ScraperAPI with rendering for LinkedIn's JS-heavy pages
response = requests.get("https://api.scraperapi.com", params={
    "api_key": "YOUR_SCRAPERAPI_KEY",
    "url": "https://www.linkedin.com/in/public-profile-slug",
    "render": "true",
    "country_code": "us"
})

soup = BeautifulSoup(response.text, "html.parser")

name = soup.select_one("h1")
headline = soup.select_one(".text-body-medium")

if name:
    print(f"Name: {name.text.strip()}")
if headline:
    print(f"Headline: {headline.text.strip()}")

Method 3: Google Cache Approach

Scrape LinkedIn profiles indexed by Google to avoid hitting LinkedIn directly:

import requests

# Search Google for LinkedIn profiles
response = requests.get("https://api.scraperapi.com/structured/google/search", params={
    "api_key": "YOUR_SCRAPERAPI_KEY",
    "query": 'site:linkedin.com/in/ "software engineer" "San Francisco"',
    "num": "10"
})

results = response.json()
for result in results.get("organic_results", []):
    print(f"{result['title']} - {result['link']}")

Best Practices

Only scrape public data visible without logging in
Respect robots.txt and rate limits
Do not store personal data without a legitimate purpose
Comply with GDPR if processing EU residents' data
Use a scraping API like ScraperAPI or ScrapingAnt to handle anti-bot protections

Verdict

LinkedIn scraping should be approached with caution. Use the official API when possible, limit scraping to public data, and always consult legal counsel. When you do scrape, ScraperAPI's rendering and proxy capabilities help ensure reliable access to public profiles.