CSS Selectors vs XPath - When to Use Which
Compare CSS selectors and XPath for web scraping. Learn the syntax, strengths, and best use cases for each approach.
Data Parsing · #2beginner3 min read
CSS selectors and XPath are the two main ways to target HTML elements when scraping. Both can locate any element on a page, but they differ in syntax, power, and readability.
Quick Comparison
| Feature | CSS Selectors | XPath |
|---|---|---|
| Syntax | Compact, familiar | Verbose, powerful |
| Direction | Top-down only | Any direction (up, down, sideways) |
| Text matching | Not supported | contains(text(), "...") |
| Library support | BeautifulSoup, Parsel, Scrapy | lxml, Parsel, Scrapy |
| Learning curve | Easy (web dev standard) | Moderate |
| Performance | Slightly faster | Slightly slower |
CSS Selectors Cheat Sheet
from bs4 import BeautifulSoup
html = """
<div class="container">
<h2 id="title">Products</h2>
<ul class="items">
<li class="item active" data-price="29.99">
<a href="/product/1">ScraperAPI</a>
<span class="badge">Popular</span>
</li>
<li class="item" data-price="19.99">
<a href="/product/2">ScrapingAnt</a>
</li>
<li class="item" data-price="39.99">
<a href="/product/3">Bright Data</a>
</li>
</ul>
</div>
"""
soup = BeautifulSoup(html, "lxml")
# By class
soup.select(".item") # All items
# By ID
soup.select_one("#title") # The h2
# By attribute
soup.select("li[data-price]") # All li with data-price
# Attribute value
soup.select('a[href^="/product"]') # href starts with /product
# Child combinator
soup.select("ul > li") # Direct children only
# Descendant
soup.select(".container a") # Any nested <a>
# Nth child
soup.select("li:nth-child(2)") # Second <li>
# Multiple classes
soup.select("li.item.active") # Has both classes
XPath Equivalents
from lxml import html
tree = html.fromstring(html_string)
# By class
tree.xpath('//li[contains(@class, "item")]')
# By ID
tree.xpath('//*[@id="title"]')
# By attribute
tree.xpath('//li[@data-price]')
# Attribute value
tree.xpath('//a[starts-with(@href, "/product")]')
# Child
tree.xpath('//ul/li')
# Descendant
tree.xpath('//div[@class="container"]//a')
# Position
tree.xpath('//li[2]') # Second <li> (1-indexed!)
# Text content (XPath exclusive!)
tree.xpath('//a[contains(text(), "Scraper")]')
# Parent traversal (XPath exclusive!)
tree.xpath('//span[@class="badge"]/parent::li/a/text()')
When CSS Selectors Win
CSS selectors are best for straightforward selection:
from bs4 import BeautifulSoup
import requests
response = requests.get("https://quotes.toscrape.com/", timeout=15)
soup = BeautifulSoup(response.text, "lxml")
# Clean, readable selectors
for quote in soup.select("div.quote"):
text = quote.select_one("span.text").text
author = quote.select_one("small.author").text
tags = [t.text for t in quote.select("a.tag")]
print(f"{author}: {text[:50]}... [{', '.join(tags)}]")
When XPath Wins
XPath shines when you need text-based or upward navigation:
from lxml import html
import requests
response = requests.get("https://quotes.toscrape.com/", timeout=15)
tree = html.fromstring(response.text)
# Find quotes containing the word "world"
world_quotes = tree.xpath(
'//span[@class="text"][contains(text(), "world")]/text()'
)
for q in world_quotes:
print(q[:80])
# Navigate UP: find the author of a specific tag
authors = tree.xpath(
'//a[@class="tag"][text()="love"]/ancestor::div[@class="quote"]'
'//small[@class="author"]/text()'
)
print(f"Authors who wrote about love: {authors}")
Recommendation
Start with CSS selectors for most tasks. Switch to XPath when you need to match by text content, navigate to parent elements, or use complex conditional logic.
Next Steps
- Deep dive into XPath expressions
- Parse complex HTML structures
- Extract structured data from tables