Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

1.13beginner5 min read

BeautifulSoup: find, find_all, select

The three workhorse selection methods of BeautifulSoup, when to use each, and the small idioms that separate beginner from comfortable.

What you’ll learn

  • Use `find` and `find_all` with tag, attribute, and class filters.
  • Use `select` and `select_one` with CSS selectors.
  • Mix methods cleanly when one is more concise than the other.
  • Avoid the common `NoneType has no attribute` trap.

You've used these methods in passing. This lesson is the rigorous tour, every option, every gotcha, every idiom. After this, parsing decisions become muscle memory.

The three methods

Method Returns Selector style
find First match (or None) Tag name + attribute kwargs
find_all List of all matches (possibly empty) Tag name + attribute kwargs
select List of all matches (possibly empty) CSS selector string
select_one First match (or None) CSS selector string

find and find_all are BeautifulSoup's native API. select and select_one were added later (when CSS selectors became universal). Most working code uses both, find for simple cases, select when CSS is cleaner.

find and find_all, tag name + filters

soup.find("h1")  # first <h1>
soup.find("div", class_="product-card")  # first div with class product-card
soup.find("a", id="nav-home")  # first <a> with id="nav-home"
soup.find("input", attrs={"name": "csrf_token"})  # arbitrary attrs
soup.find_all("p")  # every <p>
soup.find_all("p", limit=3)  # first 3 <p>s
soup.find_all(["h1", "h2", "h3"])  # multiple tags
soup.find_all("div", class_="card", string="In stock")  # text content filter

Note class_ (with trailing underscore), class is a Python keyword. Same trick for for becomes for_.

The attrs={} form is essential when the attribute name has a hyphen (data-id, aria-label), those can't be Python kwargs:

soup.find("div", attrs={"data-id": "42"})
soup.find("button", attrs={"aria-label": "Close"})

Class matching has a quirk

HTML classes are space-separated; BeautifulSoup matches against ANY of them by default:

# HTML: <div class="card featured large">
soup.find("div", class_="card")  # matches
soup.find("div", class_="featured")  # matches
soup.find("div", class_="card featured")  # matches if both present (any order)

For exact-string class match, use a function or pass class_ with a regex/list:

import re
soup.find_all("div", class_=re.compile(r"^card$"))

select and select_one, CSS selectors

If you know CSS, you already know this:

soup.select("article.product-card h2")  # descendant
soup.select("article.product-card > h2")  # direct child only
soup.select("a[href^='/products/']")  # attribute starts-with
soup.select("a[href$='.pdf']")  # attribute ends-with
soup.select("a[href*='kitchen']")  # attribute contains
soup.select("li:nth-of-type(2)")  # nth match
soup.select("div.card:not(.disabled)")  # exclusion
soup.select("p ~ a")  # general sibling
soup.select("h2 + p")  # adjacent sibling

For anything non-trivial, select is shorter than the equivalent find_all chain.

Element methods after selection

el.name  # tag name as string ("div")
el.get_text(strip=True)  # all descendant text, with whitespace cleaned
el.get_text(" ", strip=True)  # join descendant text with single space
el.string  # only if there's exactly ONE text child; else None
el["class"]  # list of classes (because HTML class is multi-valued)
el["href"]  # string for single-valued attrs
el.get("href")  # safe version, returns None if missing
el.attrs  # full dict of attributes

el["href"] raises KeyError if missing. el.get("href") returns None. Always prefer .get() unless you know the attribute exists.

.string vs .get_text(), pick the right one

<p>Hello <b>world</b></p>
p.string  # None, p has multiple children (text + <b>)
p.get_text()  # "Hello world"
p.b.string  # "world", b has exactly one text child
p.b.get_text()  # "world"

.string is finicky. For 95% of scraping, just use .get_text(strip=True).

Combining select and find cleanly

A common pattern: select cards, then find inside each card:

for card in soup.select("article.product-card"):
  name  = card.find("h2").get_text(strip=True)
  price = card.find(class_="price").get_text(strip=True)
  link  = card.find("a")["href"]

card.find(...) is scoped to that card's subtree, exactly as you'd want.

The NoneType trap

The most common BeautifulSoup error:

title = soup.find("h1").get_text()
# AttributeError: 'NoneType' object has no attribute 'get_text'

find returned None because no <h1> exists, then you called a method on it. Three defensive patterns:

# 1. Check first
h1 = soup.find("h1")
title = h1.get_text(strip=True) if h1 else None

# 2. Walrus operator (Python 3.8+)
title = h1.get_text(strip=True) if (h1 := soup.find("h1")) else None

# 3. Helper
def safe_text(el):
  return el.get_text(strip=True) if el else None

title = safe_text(soup.find("h1"))

For production scrapers, the helper approach scales best.

Searching by text content

soup.find("a", string="Next page")  # exact string match
soup.find_all("a", string=re.compile(r"page"))  # regex
soup.find(string=re.compile(r"\$\d+"))  # find any text node matching

Useful when the structure is messy but you know the visible label.

Filtering by callable

The most powerful form, pass any function returning bool:

def is_external_link(tag):
  return (tag.name == "a"
  and tag.get("href", "").startswith("http")
  and "scrapingcentral.com" not in tag.get("href", ""))

external = soup.find_all(is_external_link)

Almost any custom matching logic fits in a callable. Use it when CSS gets convoluted.

find_parent, find_next_sibling, find_previous

Navigate the tree from a known anchor:

price_label = soup.find(string="Price")
# Find the value next to it
price_value = price_label.find_next("span")  # next <span> in document order
price_row  = price_label.find_parent("tr")  # enclosing table row

This pattern, "find the label, then walk to the value", is endlessly useful on layouts where the data has no class hook.

Performance tip

For a soup with thousands of nodes, find_all with strict filters is faster than broad select followed by Python-level filtering. If perf matters, prefer the most specific selector you can write.

Hands-on lab

Visit /challenges/static/lists/cards. Use select to find every card, then for each card use find to extract the title, subtitle, and any visible badge. Try doing the same task with find_all instead of select to feel the difference. Confirm both approaches yield the same data.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/static/lists/cards

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

BeautifulSoup: find, find_all, select1 / 8

What does `soup.find('h1')` return when no `<h1>` exists?

Score so far: 0 / 0