Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Handling Honeypot Traps

Learn how to identify and avoid honeypot traps that websites use to detect and block web scrapers.

Anti-Detection · #12intermediate3 min read
Share:WhatsAppLinkedIn

Honeypots are hidden elements placed on web pages specifically to catch scrapers. A human user never sees or interacts with them, but a scraper that blindly follows every link or fills every field will fall right into the trap.

Types of Honeypot Traps

1. Hidden Links

Invisible links that only crawlers follow:

<!-- Hidden via CSS - humans never see this -->
<a href="/trap-page" style="display: none;">Click here</a>
<a href="/fake-data" class="hidden-link">More info</a>

2. Hidden Form Fields

Extra form fields invisible to users:

<form action="/search">
    <input type="text" name="query" />
    <!-- Honeypot field - should remain empty -->
    <input type="text" name="website" style="display: none;" />
    <button type="submit">Search</button>
</form>

3. Fake Data

Pages filled with fake product listings, prices, or contact info that only scrapers would collect.

Detecting Honeypots with BeautifulSoup

Before following links, check if they are hidden:

from bs4 import BeautifulSoup
import requests

response = requests.get("https://example.com/products", timeout=15)
soup = BeautifulSoup(response.text, "html.parser")

safe_links = []

for link in soup.find_all("a", href=True):
    # Check for display:none in inline styles
    style = link.get("style", "")
    if "display: none" in style or "display:none" in style:
        print(f"HONEYPOT (inline style): {link['href']}")
        continue

    # Check for hidden classes
    classes = link.get("class", [])
    suspicious_classes = {"hidden", "invisible", "d-none", "hide", "noshow"}
    if suspicious_classes.intersection(set(classes)):
        print(f"HONEYPOT (class): {link['href']}")
        continue

    # Check parent elements for hidden styles
    parent = link.parent
    if parent and "display: none" in parent.get("style", ""):
        print(f"HONEYPOT (parent hidden): {link['href']}")
        continue

    safe_links.append(link["href"])

print(f"\nSafe links to follow: {len(safe_links)}")

Detecting Honeypot Form Fields

from bs4 import BeautifulSoup

def get_safe_form_fields(html: str) -> dict:
    soup = BeautifulSoup(html, "html.parser")
    form = soup.find("form")
    fields = {}

    for inp in form.find_all("input"):
        name = inp.get("name")
        if not name:
            continue

        style = inp.get("style", "")
        input_type = inp.get("type", "text")

        # Skip honeypot fields
        if "display: none" in style or "display:none" in style:
            print(f"Skipping honeypot field: {name}")
            continue
        if input_type == "hidden" and name not in ("csrf_token", "_token"):
            print(f"Skipping suspicious hidden field: {name}")
            continue

        fields[name] = inp.get("value", "")

    return fields

Using Playwright for Accurate Visibility Checks

For JavaScript-rendered pages, use Playwright to check actual element visibility:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com/products")

    links = page.query_selector_all("a[href]")
    safe_links = []

    for link in links:
        if link.is_visible():
            href = link.get_attribute("href")
            safe_links.append(href)
        else:
            href = link.get_attribute("href")
            print(f"Hidden link detected: {href}")

    browser.close()

Prevention Tips

  • Never follow links with display: none, visibility: hidden, or opacity: 0
  • Do not fill in form fields that are hidden from view
  • Be suspicious of links with trap-like paths (/trap, /honeypot, /fake)
  • Only extract data from visible, rendered elements
  • Services like ScraperAPI and ScrapingAnt render pages in real browsers, making it easier to check element visibility