The Legal Landscape: hiQ v. LinkedIn, CFAA, GDPR
The landmark cases and statutes that shape what scraping is and isn't legally OK in 2026. Not legal advice; a working compass.
What you’ll learn
- Summarize the hiQ v. LinkedIn ruling and its limits.
- Explain Van Buren v. United States's narrowing of the CFAA.
- Recognize when GDPR makes a scrape risky.
This is not legal advice. It is an engineer's summary of the legal landscape for context. For any project with real exposure, consult a lawyer who practices in your jurisdiction.
The legal status of scraping in 2026 is more clarified than it was a decade ago, but it's still layered, jurisdiction-dependent, and case-by-case. A scraping engineer should know the headline cases and frameworks well enough to spot trouble before code is written.
The four legal frameworks
Scraping touches four distinct legal areas, sometimes simultaneously:
| Framework | Roughly governs | Where |
|---|---|---|
| Computer-misuse statutes (CFAA, CMA, etc.) | Unauthorized access to computer systems | USA, UK, most of EU plus more |
| Contract law / Terms of Service | Agreements you've assented to | Universal |
| Copyright | Original creative works | Universal (with national twists) |
| Privacy / data protection (GDPR, CCPA) | Personal data | EU, California, increasingly elsewhere with extraterritorial reach |
A single scrape can implicate one, several, or all of these.
hiQ Labs v. LinkedIn, the modern landmark
Background: hiQ scraped publicly accessible LinkedIn profiles to build HR analytics. LinkedIn sent cease-and-desist and blocked IPs, citing the CFAA.
Ruling: The 9th Circuit (2019, reaffirmed 2022) held that scraping publicly accessible data does not violate the CFAA. "Authorization" under the CFAA contemplates access controls; data on a public webpage isn't behind one.
The narrower point: scraping public data is not CFAA-actionable. The case did NOT bless:
- Scraping behind login.
- Bypassing access controls.
- Scraping in violation of contracts (LinkedIn eventually won on different grounds, including breach of contract via the ToS hiQ accepted by registering accounts, and California Penal Code 502).
What hiQ tells you: public data → CFAA usually inapplicable. Anything else → other claims (contract, state computer-misuse statutes, privacy) remain live.
CFAA after Van Buren
Van Buren v. United States (2021) narrowed the CFAA further. The Supreme Court held that "exceeds authorized access" means accessing parts of a system you're not permitted to enter, NOT merely violating ToS on a system you do have legitimate access to.
Translation: a police officer who looks up a license plate for personal reasons (ToS violation) doesn't violate the CFAA, because he had legitimate access to the database.
For scraping:
- Scraping a public page you're allowed to see, even contrary to the site's ToS, is probably not a CFAA violation post-Van Buren.
- Bypassing login / paywall / IP-block / CAPTCHA-style access controls IS likely an "exceeds authorized access" scenario.
- The CFAA has been substantially narrowed, but it isn't dead, it's just sharper at the right target (real circumvention of access controls).
GDPR, the privacy hammer
The EU General Data Protection Regulation (2018) applies to personal data, anything identifying an individual. Names, emails, phone numbers, photos, behavior patterns, device IDs.
For scrapers, GDPR matters when:
- You scrape personal data of EU residents.
- You commercially process personal data (and scraping for commercial purposes IS processing).
- You retain it.
Key requirements:
- Lawful basis (Article 6). "Legitimate interest" is most relevant for scraping but requires a balancing test: your interest vs the data subject's privacy interest.
- Minimization: collect only what you need.
- Transparency: in principle, you should inform the data subject. In practice this is often impractical for scraped data.
- Rectification & deletion rights: you must be able to honor requests.
GDPR has extraterritorial reach, applies to any company processing EU residents' personal data, regardless of where the company is based.
Fines can be substantial (up to 4% of global revenue or €20M, whichever higher). They are real and have been levied for unconsented commercial data processing.
CCPA (California) and other privacy regimes
CCPA / CPRA in California adds similar requirements for California residents. Other US states (Virginia, Colorado, Connecticut, etc.) have followed with similar laws. India's DPDP Act, Brazil's LGPD, Canada's PIPEDA, same general pattern.
The trend is clear: privacy regimes spread globally. Scraping personal data without a real plan to handle rights requests is increasingly risky.
State-level statutes (US)
Beyond federal CFAA:
- California Penal Code 502, broader than CFAA in some respects; LinkedIn won against hiQ partly on this.
- Various state computer-trespass statutes, vary widely.
For US-targeted scraping at scale, state-level claims often matter more than federal ones.
EU computer-misuse equivalents
- UK Computer Misuse Act 1990.
- Germany's § 202a StGB (Ausspähen von Daten).
- France's Loi Godfrain.
The core principle, "don't bypass technical access controls", is fairly universal across these. Variations exist in scope and penalty.
What "publicly accessible" actually means
The phrase looks simple, but courts probe it carefully:
- A page anyone can load with a GET request → clearly public.
- A page that requires a free account → arguably public, arguably not (depends on jurisdiction and ToS).
- A page behind a paywall → not public.
- An API requiring a key → not public.
- Data exposed in JS but not rendered → fuzzy.
Public is most reliable where: no login, no paywall, no obvious access control, served without authentication.
Concrete risk levels (rough; not legal advice)
| Scenario | Approx. risk |
|---|---|
| Scraping a single public e-commerce site for personal price-comparison | Low |
| Same site at high volume causing detectable cost | Medium |
| Scraping behind a free login | Medium-high (ToS-driven) |
| Scraping personal data on EU residents at any volume | High |
| Scraping after circumventing CAPTCHA / paywall | High |
| Scraping for commercial gain in violation of stated ToS | Medium-high |
| Reselling scraped copyrighted content | High |
| Scraping facts and aggregating into a new product | Low-medium |
Practical principles
- Public data, no login, no circumvention → strongest position.
- Don't scrape personal data unless you have a lawful basis and a plan to honor rights requests.
- Don't bypass access controls, paywalls, CAPTCHAs, IP blocks. This is the line CFAA-equivalents are written to defend.
- Read robots.txt; honor it where reasonable.
- Disclose yourself in User-Agent. "MyScraper/1.0 (mailto:contact@example.com)" both improves your legal posture and lets target sites reach you instead of blocking.
- Rate-limit politely. Aggressive scraping looks like attack.
- Don't redistribute copyrighted prose / images. Scrape facts, use them in your own expression.
- For anything commercial, talk to a lawyer once before scaling.
What's changing in 2026
A few trends:
- More state-level privacy laws in the US.
- AI-training scraping cases are reshaping copyright thinking (NYT v. OpenAI, etc.).
- Some sites are starting to label what they consider in their ToS more carefully.
- EU AI Act intersects with scraping for training purposes.
- Continued ambiguity at the edges; courts haven't fully caught up.
What to try
Take a scraping project you've considered. For each of the four frameworks (CFAA / contract / copyright / privacy), write one sentence about your posture. If any sentence is "I have no idea," that's where you need to think harder or get advice.
Honest posture sounds like: "Public e-commerce data, no login, no PII, rate-limited politely, output is facts only. Likely low risk; would still get a legal review before commercializing."
Vague posture sounds like: "Should be fine." Don't ship from there.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.