CSS Selectors vs XPath - When to Use Which - Data Parsing

Compare CSS selectors and XPath for web scraping. Learn the syntax, strengths, and best use cases for each approach.

CSS selectors and XPath are the two main ways to target HTML elements when scraping. Both can locate any element on a page, but they differ in syntax, power, and readability.

Quick Comparison

Feature	CSS Selectors	XPath
Syntax	Compact, familiar	Verbose, powerful
Direction	Top-down only	Any direction (up, down, sideways)
Text matching	Not supported	`contains(text(), "...")`
Library support	BeautifulSoup, Parsel, Scrapy	lxml, Parsel, Scrapy
Learning curve	Easy (web dev standard)	Moderate
Performance	Slightly faster	Slightly slower

CSS Selectors Cheat Sheet

from bs4 import BeautifulSoup

html = """
<div class="container">
  <h2 id="title">Products</h2>
  <ul class="items">
    <li class="item active" data-price="29.99">
      <a href="/product/1">ScraperAPI</a>
      <span class="badge">Popular</span>
    </li>
    <li class="item" data-price="19.99">
      <a href="/product/2">ScrapingAnt</a>
    </li>
    <li class="item" data-price="39.99">
      <a href="/product/3">Bright Data</a>
    </li>
  </ul>
</div>
"""

soup = BeautifulSoup(html, "lxml")

# By class
soup.select(".item")              # All items

# By ID
soup.select_one("#title")         # The h2

# By attribute
soup.select("li[data-price]")     # All li with data-price

# Attribute value
soup.select('a[href^="/product"]') # href starts with /product

# Child combinator
soup.select("ul > li")            # Direct children only

# Descendant
soup.select(".container a")       # Any nested <a>

# Nth child
soup.select("li:nth-child(2)")    # Second <li>

# Multiple classes
soup.select("li.item.active")     # Has both classes

XPath Equivalents

from lxml import html

tree = html.fromstring(html_string)

# By class
tree.xpath('//li[contains(@class, "item")]')

# By ID
tree.xpath('//*[@id="title"]')

# By attribute
tree.xpath('//li[@data-price]')

# Attribute value
tree.xpath('//a[starts-with(@href, "/product")]')

# Child
tree.xpath('//ul/li')

# Descendant
tree.xpath('//div[@class="container"]//a')

# Position
tree.xpath('//li[2]')             # Second <li> (1-indexed!)

# Text content (XPath exclusive!)
tree.xpath('//a[contains(text(), "Scraper")]')

# Parent traversal (XPath exclusive!)
tree.xpath('//span[@class="badge"]/parent::li/a/text()')

When CSS Selectors Win

CSS selectors are best for straightforward selection:

from bs4 import BeautifulSoup
import requests

response = requests.get("https://quotes.toscrape.com/", timeout=15)
soup = BeautifulSoup(response.text, "lxml")

# Clean, readable selectors
for quote in soup.select("div.quote"):
    text = quote.select_one("span.text").text
    author = quote.select_one("small.author").text
    tags = [t.text for t in quote.select("a.tag")]
    print(f"{author}: {text[:50]}... [{', '.join(tags)}]")

When XPath Wins

XPath shines when you need text-based or upward navigation:

from lxml import html
import requests

response = requests.get("https://quotes.toscrape.com/", timeout=15)
tree = html.fromstring(response.text)

# Find quotes containing the word "world"
world_quotes = tree.xpath(
    '//span[@class="text"][contains(text(), "world")]/text()'
)
for q in world_quotes:
    print(q[:80])

# Navigate UP: find the author of a specific tag
authors = tree.xpath(
    '//a[@class="tag"][text()="love"]/ancestor::div[@class="quote"]'
    '//small[@class="author"]/text()'
)
print(f"Authors who wrote about love: {authors}")

Recommendation

Start with CSS selectors for most tasks. Switch to XPath when you need to match by text content, navigate to parent elements, or use complex conditional logic.

Next Steps

Deep dive into XPath expressions
Parse complex HTML structures
Extract structured data from tables