BeautifulSoup Tree Navigation
Once you've found one element, you can walk to any other. Parents, children, siblings, next, previous, the navigation API that handles layouts with no clean selectors.
What you’ll learn
- Walk up to ancestors with `parent` and `find_parent`.
- Walk down to children with `children`, `descendants`, and `contents`.
- Move sideways with `next_sibling`, `previous_sibling`.
- Use `find_next` / `find_previous` to skip whitespace siblings cleanly.
Sometimes a page has no clean selector for the data you want, just a label, a layout, and an implicit relationship. "The price is the next cell after the label cell." "The author is the link in the parent of the byline." These are tree-walks, and BeautifulSoup has a complete API for them.
The seven navigation directions
Once you have an element, you can move in seven directions:
| Direction | Property / Method |
|---|---|
| Up | el.parent, el.parents, el.find_parent("tag") |
| Down (immediate) | el.children, el.contents |
| Down (all) | el.descendants |
| Sideways forward | el.next_sibling, el.next_siblings, el.find_next_sibling() |
| Sideways back | el.previous_sibling, el.previous_siblings, el.find_previous_sibling() |
| Anywhere forward | el.next_element, el.find_next() |
| Anywhere back | el.previous_element, el.find_previous() |
For each direction, BeautifulSoup gives you a "raw" version (next_sibling) and a "skip-whitespace" version (find_next_sibling). Pick the second one almost always, see below.
The whitespace-sibling trap
<ul>
<li>First</li>
<li>Second</li>
<li>Third</li>
</ul>
Looks like three <li> siblings. The DOM disagrees, between each </li> and the next <li>, there's a text node containing \n (newline + indentation). So:
first = soup.find("li")
print(first.next_sibling) # '\n ' (whitespace text node!)
print(first.next_sibling.next_sibling) # the second <li>
This trips up nearly every beginner. Use the find_next_sibling variant:
print(first.find_next_sibling()) # <li>Second</li>
print(first.find_next_sibling("li")) # explicit tag filter
find_* methods skip over plain whitespace text nodes automatically.
Going up: parents
price = soup.find(string="$14.99")
container = price.find_parent("article")
print(container["data-id"])
parent jumps one level. parents iterates all ancestors up to root. find_parent("tag") walks up to the first ancestor matching the filter, exactly the "wrap me in a container" pattern.
Useful when the data has no class but is anchored by a stable label:
label = soup.find("dt", string="Author")
value = label.find_next_sibling("dd").get_text(strip=True)
That handles <dl><dt>Author</dt><dd>Alice</dd></dl> reliably no matter what classes the page applies.
Going down: children, contents, descendants
ul = soup.find("ul")
# Direct children only (includes whitespace text nodes)
for child in ul.children:
print(repr(child))
# Same but as a Python list
print(ul.contents)
# All descendants, depth-first
for d in ul.descendants:
print(type(d).__name__, repr(d)[:50])
children is a generator. contents is the same data as a list. descendants is everything nested at any depth.
To filter children by tag, just do ul.find_all("li", recursive=False), the recursive=False keeps the search shallow.
Going sideways
header = soup.find("h2", string="Specifications")
# All sibling elements after the header, ignoring text whitespace
for sib in header.find_next_siblings():
print(sib.name, sib.get_text(strip=True)[:50])
find_next_siblings() (plural) gives you every following sibling. Useful for "collect all paragraphs after this heading until the next heading":
collected = []
for sib in header.find_next_siblings():
if sib.name in ("h1", "h2", "h3"):
break # next section
collected.append(sib.get_text(strip=True))
This is the canonical pattern for parsing semi-structured documentation.
find_next vs find_next_sibling
Subtle but important:
find_next_sibling, only looks at siblings (same parent).find_next, looks at every later element in document order, regardless of nesting.
<div>
<p>label</p>
<div>
<span>value</span>
</div>
</div>
label = soup.find("p", string="label")
label.find_next_sibling("span") # None, span isn't a sibling
label.find_next("span") # the span, searches anywhere after
When the label and value are in different containers, find_next is the tool.
Real example: scraping a definition list
<dl class="product-attributes">
<dt>Brand</dt><dd>Acme</dd>
<dt>Color</dt><dd>Yellow</dd>
<dt>Weight</dt><dd>340g</dd>
</dl>
attrs = {}
dl = soup.find("dl", class_="product-attributes")
for dt in dl.find_all("dt"):
dd = dt.find_next_sibling("dd")
attrs[dt.get_text(strip=True)] = dd.get_text(strip=True)
print(attrs)
# {'Brand': 'Acme', 'Color': 'Yellow', 'Weight': '340g'}
No CSS selector can capture "pair each <dt> with the <dd> immediately after it", you need sibling navigation.
Real example: nested table extraction
<table>
<tr>
<td>Apple</td>
<td>$1.20</td>
<td>
<table>
<tr><td>Granny Smith</td><td>Fuji</td></tr>
</table>
</td>
</tr>
</table>
soup.find("table").find_all("tr") returns BOTH the outer and inner <tr>, find_all is recursive by default. To restrict to direct children:
top_tr = soup.find("table").find("tr", recursive=False)
# or
top_tr = soup.find("table").find_all("tr", recursive=False)[0]
recursive=False is the unsung hero of nested-structure parsing.
NavigableString, the text-node type
Text nodes in BeautifulSoup are NavigableString instances, not plain str. They have most of the same navigation API as Tag:
price_label = soup.find(string="Price")
print(type(price_label)) # NavigableString
print(price_label.parent.name) # the element containing the text
print(price_label.find_next("span").get_text())
You can grab a known label text and walk from there, perfect when the layout has no classes but stable visible labels.
Comments and special nodes
from bs4 import Comment
comments = soup.find_all(string=lambda x: isinstance(x, Comment))
Useful only occasionally, e.g., legacy sites that hide canonical data in HTML comments. Worth knowing about.
A combined recipe
Suppose product specs are rendered as:
<section class="specs">
<h3>Specifications</h3>
<p>Brand: Acme</p>
<p>Weight: 340g</p>
<p>Color: Yellow</p>
<h3>Description</h3>
<p>Lorem ipsum...</p>
</section>
You want only the spec <p>s, not the description ones:
header = soup.find("h3", string="Specifications")
spec_paragraphs = []
for sib in header.find_next_siblings():
if sib.name == "h3":
break
if sib.name == "p":
spec_paragraphs.append(sib.get_text(strip=True))
print(spec_paragraphs)
CSS can't do this. Tree-walking handles it cleanly.
Hands-on lab
The /challenges/static/lists/nested page contains deeply nested list-of-list structures with no flat selector for every leaf. Use descendants, find_all with and without recursive=False, and sibling navigation to extract the leaf items along with their full parent-chain path. Compare your output to the page's visible structure.
Hands-on lab
Practice this lesson on Catalog108, our first-party scraping sandbox.
Open lab target →/challenges/static/lists/nestedQuiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.