BeautifulSoup: find, find_all, select
The three workhorse selection methods of BeautifulSoup, when to use each, and the small idioms that separate beginner from comfortable.
What you’ll learn
- Use `find` and `find_all` with tag, attribute, and class filters.
- Use `select` and `select_one` with CSS selectors.
- Mix methods cleanly when one is more concise than the other.
- Avoid the common `NoneType has no attribute` trap.
You've used these methods in passing. This lesson is the rigorous tour, every option, every gotcha, every idiom. After this, parsing decisions become muscle memory.
The three methods
| Method | Returns | Selector style |
|---|---|---|
find |
First match (or None) |
Tag name + attribute kwargs |
find_all |
List of all matches (possibly empty) | Tag name + attribute kwargs |
select |
List of all matches (possibly empty) | CSS selector string |
select_one |
First match (or None) |
CSS selector string |
find and find_all are BeautifulSoup's native API. select and select_one were added later (when CSS selectors became universal). Most working code uses both, find for simple cases, select when CSS is cleaner.
find and find_all, tag name + filters
soup.find("h1") # first <h1>
soup.find("div", class_="product-card") # first div with class product-card
soup.find("a", id="nav-home") # first <a> with id="nav-home"
soup.find("input", attrs={"name": "csrf_token"}) # arbitrary attrs
soup.find_all("p") # every <p>
soup.find_all("p", limit=3) # first 3 <p>s
soup.find_all(["h1", "h2", "h3"]) # multiple tags
soup.find_all("div", class_="card", string="In stock") # text content filter
Note class_ (with trailing underscore), class is a Python keyword. Same trick for for becomes for_.
The attrs={} form is essential when the attribute name has a hyphen (data-id, aria-label), those can't be Python kwargs:
soup.find("div", attrs={"data-id": "42"})
soup.find("button", attrs={"aria-label": "Close"})
Class matching has a quirk
HTML classes are space-separated; BeautifulSoup matches against ANY of them by default:
# HTML: <div class="card featured large">
soup.find("div", class_="card") # matches
soup.find("div", class_="featured") # matches
soup.find("div", class_="card featured") # matches if both present (any order)
For exact-string class match, use a function or pass class_ with a regex/list:
import re
soup.find_all("div", class_=re.compile(r"^card$"))
select and select_one, CSS selectors
If you know CSS, you already know this:
soup.select("article.product-card h2") # descendant
soup.select("article.product-card > h2") # direct child only
soup.select("a[href^='/products/']") # attribute starts-with
soup.select("a[href$='.pdf']") # attribute ends-with
soup.select("a[href*='kitchen']") # attribute contains
soup.select("li:nth-of-type(2)") # nth match
soup.select("div.card:not(.disabled)") # exclusion
soup.select("p ~ a") # general sibling
soup.select("h2 + p") # adjacent sibling
For anything non-trivial, select is shorter than the equivalent find_all chain.
Element methods after selection
el.name # tag name as string ("div")
el.get_text(strip=True) # all descendant text, with whitespace cleaned
el.get_text(" ", strip=True) # join descendant text with single space
el.string # only if there's exactly ONE text child; else None
el["class"] # list of classes (because HTML class is multi-valued)
el["href"] # string for single-valued attrs
el.get("href") # safe version, returns None if missing
el.attrs # full dict of attributes
el["href"] raises KeyError if missing. el.get("href") returns None. Always prefer .get() unless you know the attribute exists.
.string vs .get_text(), pick the right one
<p>Hello <b>world</b></p>
p.string # None, p has multiple children (text + <b>)
p.get_text() # "Hello world"
p.b.string # "world", b has exactly one text child
p.b.get_text() # "world"
.string is finicky. For 95% of scraping, just use .get_text(strip=True).
Combining select and find cleanly
A common pattern: select cards, then find inside each card:
for card in soup.select("article.product-card"):
name = card.find("h2").get_text(strip=True)
price = card.find(class_="price").get_text(strip=True)
link = card.find("a")["href"]
card.find(...) is scoped to that card's subtree, exactly as you'd want.
The NoneType trap
The most common BeautifulSoup error:
title = soup.find("h1").get_text()
# AttributeError: 'NoneType' object has no attribute 'get_text'
find returned None because no <h1> exists, then you called a method on it. Three defensive patterns:
# 1. Check first
h1 = soup.find("h1")
title = h1.get_text(strip=True) if h1 else None
# 2. Walrus operator (Python 3.8+)
title = h1.get_text(strip=True) if (h1 := soup.find("h1")) else None
# 3. Helper
def safe_text(el):
return el.get_text(strip=True) if el else None
title = safe_text(soup.find("h1"))
For production scrapers, the helper approach scales best.
Searching by text content
soup.find("a", string="Next page") # exact string match
soup.find_all("a", string=re.compile(r"page")) # regex
soup.find(string=re.compile(r"\$\d+")) # find any text node matching
Useful when the structure is messy but you know the visible label.
Filtering by callable
The most powerful form, pass any function returning bool:
def is_external_link(tag):
return (tag.name == "a"
and tag.get("href", "").startswith("http")
and "scrapingcentral.com" not in tag.get("href", ""))
external = soup.find_all(is_external_link)
Almost any custom matching logic fits in a callable. Use it when CSS gets convoluted.
find_parent, find_next_sibling, find_previous
Navigate the tree from a known anchor:
price_label = soup.find(string="Price")
# Find the value next to it
price_value = price_label.find_next("span") # next <span> in document order
price_row = price_label.find_parent("tr") # enclosing table row
This pattern, "find the label, then walk to the value", is endlessly useful on layouts where the data has no class hook.
Performance tip
For a soup with thousands of nodes, find_all with strict filters is faster than broad select followed by Python-level filtering. If perf matters, prefer the most specific selector you can write.
Hands-on lab
Visit /challenges/static/lists/cards. Use select to find every card, then for each card use find to extract the title, subtitle, and any visible badge. Try doing the same task with find_all instead of select to feel the difference. Confirm both approaches yield the same data.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/static/lists/cardsQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.