Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

CSS Selectors vs XPath - When to Use Which

Compare CSS selectors and XPath for web scraping. Learn the syntax, strengths, and best use cases for each approach.

Data Parsing · #2beginner3 min read
Share:WhatsAppLinkedIn

CSS selectors and XPath are the two main ways to target HTML elements when scraping. Both can locate any element on a page, but they differ in syntax, power, and readability.

Quick Comparison

Feature CSS Selectors XPath
Syntax Compact, familiar Verbose, powerful
Direction Top-down only Any direction (up, down, sideways)
Text matching Not supported contains(text(), "...")
Library support BeautifulSoup, Parsel, Scrapy lxml, Parsel, Scrapy
Learning curve Easy (web dev standard) Moderate
Performance Slightly faster Slightly slower

CSS Selectors Cheat Sheet

from bs4 import BeautifulSoup

html = """
<div class="container">
  <h2 id="title">Products</h2>
  <ul class="items">
    <li class="item active" data-price="29.99">
      <a href="/product/1">ScraperAPI</a>
      <span class="badge">Popular</span>
    </li>
    <li class="item" data-price="19.99">
      <a href="/product/2">ScrapingAnt</a>
    </li>
    <li class="item" data-price="39.99">
      <a href="/product/3">Bright Data</a>
    </li>
  </ul>
</div>
"""

soup = BeautifulSoup(html, "lxml")

# By class
soup.select(".item")              # All items

# By ID
soup.select_one("#title")         # The h2

# By attribute
soup.select("li[data-price]")     # All li with data-price

# Attribute value
soup.select('a[href^="/product"]') # href starts with /product

# Child combinator
soup.select("ul > li")            # Direct children only

# Descendant
soup.select(".container a")       # Any nested <a>

# Nth child
soup.select("li:nth-child(2)")    # Second <li>

# Multiple classes
soup.select("li.item.active")     # Has both classes

XPath Equivalents

from lxml import html

tree = html.fromstring(html_string)

# By class
tree.xpath('//li[contains(@class, "item")]')

# By ID
tree.xpath('//*[@id="title"]')

# By attribute
tree.xpath('//li[@data-price]')

# Attribute value
tree.xpath('//a[starts-with(@href, "/product")]')

# Child
tree.xpath('//ul/li')

# Descendant
tree.xpath('//div[@class="container"]//a')

# Position
tree.xpath('//li[2]')             # Second <li> (1-indexed!)

# Text content (XPath exclusive!)
tree.xpath('//a[contains(text(), "Scraper")]')

# Parent traversal (XPath exclusive!)
tree.xpath('//span[@class="badge"]/parent::li/a/text()')

When CSS Selectors Win

CSS selectors are best for straightforward selection:

from bs4 import BeautifulSoup
import requests

response = requests.get("https://quotes.toscrape.com/", timeout=15)
soup = BeautifulSoup(response.text, "lxml")

# Clean, readable selectors
for quote in soup.select("div.quote"):
    text = quote.select_one("span.text").text
    author = quote.select_one("small.author").text
    tags = [t.text for t in quote.select("a.tag")]
    print(f"{author}: {text[:50]}... [{', '.join(tags)}]")

When XPath Wins

XPath shines when you need text-based or upward navigation:

from lxml import html
import requests

response = requests.get("https://quotes.toscrape.com/", timeout=15)
tree = html.fromstring(response.text)

# Find quotes containing the word "world"
world_quotes = tree.xpath(
    '//span[@class="text"][contains(text(), "world")]/text()'
)
for q in world_quotes:
    print(q[:80])

# Navigate UP: find the author of a specific tag
authors = tree.xpath(
    '//a[@class="tag"][text()="love"]/ancestor::div[@class="quote"]'
    '//small[@class="author"]/text()'
)
print(f"Authors who wrote about love: {authors}")

Recommendation

Start with CSS selectors for most tasks. Switch to XPath when you need to match by text content, navigate to parent elements, or use complex conditional logic.

Next Steps

  • Deep dive into XPath expressions
  • Parse complex HTML structures
  • Extract structured data from tables