Handling Honeypot Traps
Learn how to identify and avoid honeypot traps that websites use to detect and block web scrapers.
Anti-Detection · #12intermediate3 min read
Honeypots are hidden elements placed on web pages specifically to catch scrapers. A human user never sees or interacts with them, but a scraper that blindly follows every link or fills every field will fall right into the trap.
Types of Honeypot Traps
1. Hidden Links
Invisible links that only crawlers follow:
<!-- Hidden via CSS - humans never see this -->
<a href="/trap-page" style="display: none;">Click here</a>
<a href="/fake-data" class="hidden-link">More info</a>
2. Hidden Form Fields
Extra form fields invisible to users:
<form action="/search">
<input type="text" name="query" />
<!-- Honeypot field - should remain empty -->
<input type="text" name="website" style="display: none;" />
<button type="submit">Search</button>
</form>
3. Fake Data
Pages filled with fake product listings, prices, or contact info that only scrapers would collect.
Detecting Honeypots with BeautifulSoup
Before following links, check if they are hidden:
from bs4 import BeautifulSoup
import requests
response = requests.get("https://example.com/products", timeout=15)
soup = BeautifulSoup(response.text, "html.parser")
safe_links = []
for link in soup.find_all("a", href=True):
# Check for display:none in inline styles
style = link.get("style", "")
if "display: none" in style or "display:none" in style:
print(f"HONEYPOT (inline style): {link['href']}")
continue
# Check for hidden classes
classes = link.get("class", [])
suspicious_classes = {"hidden", "invisible", "d-none", "hide", "noshow"}
if suspicious_classes.intersection(set(classes)):
print(f"HONEYPOT (class): {link['href']}")
continue
# Check parent elements for hidden styles
parent = link.parent
if parent and "display: none" in parent.get("style", ""):
print(f"HONEYPOT (parent hidden): {link['href']}")
continue
safe_links.append(link["href"])
print(f"\nSafe links to follow: {len(safe_links)}")
Detecting Honeypot Form Fields
from bs4 import BeautifulSoup
def get_safe_form_fields(html: str) -> dict:
soup = BeautifulSoup(html, "html.parser")
form = soup.find("form")
fields = {}
for inp in form.find_all("input"):
name = inp.get("name")
if not name:
continue
style = inp.get("style", "")
input_type = inp.get("type", "text")
# Skip honeypot fields
if "display: none" in style or "display:none" in style:
print(f"Skipping honeypot field: {name}")
continue
if input_type == "hidden" and name not in ("csrf_token", "_token"):
print(f"Skipping suspicious hidden field: {name}")
continue
fields[name] = inp.get("value", "")
return fields
Using Playwright for Accurate Visibility Checks
For JavaScript-rendered pages, use Playwright to check actual element visibility:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com/products")
links = page.query_selector_all("a[href]")
safe_links = []
for link in links:
if link.is_visible():
href = link.get_attribute("href")
safe_links.append(href)
else:
href = link.get_attribute("href")
print(f"Hidden link detected: {href}")
browser.close()
Prevention Tips
- Never follow links with
display: none,visibility: hidden, oropacity: 0 - Do not fill in form fields that are hidden from view
- Be suspicious of links with trap-like paths (
/trap,/honeypot,/fake) - Only extract data from visible, rendered elements
- Services like ScraperAPI and ScrapingAnt render pages in real browsers, making it easier to check element visibility