Handling Honeypot Traps - Anti-Detection

Learn how to identify and avoid honeypot traps that websites use to detect and block web scrapers.

Honeypots are hidden elements placed on web pages specifically to catch scrapers. A human user never sees or interacts with them, but a scraper that blindly follows every link or fills every field will fall right into the trap.

Types of Honeypot Traps

1. Hidden Links

Invisible links that only crawlers follow:

<!-- Hidden via CSS - humans never see this -->
<a href="/trap-page" style="display: none;">Click here</a>
<a href="/fake-data" class="hidden-link">More info</a>

2. Hidden Form Fields

Extra form fields invisible to users:

<form action="/search">
    <input type="text" name="query" />
    <!-- Honeypot field - should remain empty -->
    <input type="text" name="website" style="display: none;" />
    <button type="submit">Search</button>
</form>

3. Fake Data

Pages filled with fake product listings, prices, or contact info that only scrapers would collect.

Detecting Honeypots with BeautifulSoup

Before following links, check if they are hidden:

from bs4 import BeautifulSoup
import requests

response = requests.get("https://example.com/products", timeout=15)
soup = BeautifulSoup(response.text, "html.parser")

safe_links = []

for link in soup.find_all("a", href=True):
    # Check for display:none in inline styles
    style = link.get("style", "")
    if "display: none" in style or "display:none" in style:
        print(f"HONEYPOT (inline style): {link['href']}")
        continue

    # Check for hidden classes
    classes = link.get("class", [])
    suspicious_classes = {"hidden", "invisible", "d-none", "hide", "noshow"}
    if suspicious_classes.intersection(set(classes)):
        print(f"HONEYPOT (class): {link['href']}")
        continue

    # Check parent elements for hidden styles
    parent = link.parent
    if parent and "display: none" in parent.get("style", ""):
        print(f"HONEYPOT (parent hidden): {link['href']}")
        continue

    safe_links.append(link["href"])

print(f"\nSafe links to follow: {len(safe_links)}")

Detecting Honeypot Form Fields

from bs4 import BeautifulSoup

def get_safe_form_fields(html: str) -> dict:
    soup = BeautifulSoup(html, "html.parser")
    form = soup.find("form")
    fields = {}

    for inp in form.find_all("input"):
        name = inp.get("name")
        if not name:
            continue

        style = inp.get("style", "")
        input_type = inp.get("type", "text")

        # Skip honeypot fields
        if "display: none" in style or "display:none" in style:
            print(f"Skipping honeypot field: {name}")
            continue
        if input_type == "hidden" and name not in ("csrf_token", "_token"):
            print(f"Skipping suspicious hidden field: {name}")
            continue

        fields[name] = inp.get("value", "")

    return fields

Using Playwright for Accurate Visibility Checks

For JavaScript-rendered pages, use Playwright to check actual element visibility:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com/products")

    links = page.query_selector_all("a[href]")
    safe_links = []

    for link in links:
        if link.is_visible():
            href = link.get_attribute("href")
            safe_links.append(href)
        else:
            href = link.get_attribute("href")
            print(f"Hidden link detected: {href}")

    browser.close()

Prevention Tips

Never follow links with display: none, visibility: hidden, or opacity: 0
Do not fill in form fields that are hidden from view
Be suspicious of links with trap-like paths (/trap, /honeypot, /fake)
Only extract data from visible, rendered elements
Services like ScraperAPI and ScrapingAnt render pages in real browsers, making it easier to check element visibility