Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

F6beginner5 min read

CSS Selectors, Complete Reference

Every CSS selector you need to know, organised by what you'll actually use them for in scrapers.

What you’ll learn

  • Write selectors that target elements by tag, class, id, attribute, and combinations of all four.
  • Use combinators (descendant, child, sibling) correctly.
  • Use position pseudo-classes (`:nth-child`, `:first-of-type`, `:last-child`) to pick specific items in a list.
  • Write resilient selectors that survive minor markup changes.

CSS selectors are the universal language of "find me this element on the page." Every scraping library supports them (BeautifulSoup.select(), lxml.cssselect, Symfony DomCrawler's filter(), Playwright's locators). Learn them once, use them everywhere.

The five basics

Selector Matches Example
tag All elements of that tag a (every link)
.class Elements with that class .product
#id The element with that id #main-banner
[attr] Elements that have the attribute [data-id]
[attr="value"] Elements with the attribute set to value [data-id="42"]

Combine them with no spacing:

a.btn.primary  an <a> that has both .btn and .primary
input[type="email"]  an <input> with type=email
div#root.dark-theme  the <div> that has id="root" AND class .dark-theme

The combinators

Combinator Symbol Meaning
Descendant (space) Any descendant, at any depth
Child > Direct child only
Adjacent sibling + The immediately-following sibling
General sibling ~ Any following sibling

Examples on a typical product page:

article.product .price  /* .price anywhere under article.product */
article.product > .price  /* .price ONLY if it's a direct child */
h2 + p  /* the <p> immediately after an <h2> */
h2 ~ p  /* any <p> after an <h2> at the same level */

The descendant combinator (space) is the most common; child (>) is what you reach for when nested duplicates trip you up.

Attribute selectors with operators

Beyond plain [attr="value"], the operators give you partial matching:

Selector Matches
[attr^="x"] Attribute starts with "x"
[attr$="x"] Attribute ends with "x"
[attr*="x"] Attribute contains "x"
[attr~="x"] Attribute is a space-separated list containing "x" (mostly for class)
[attr|="x"] Attribute equals "x" or starts with "x-" (mostly for lang)

Genuinely useful:

a[href^="https://"]  /* external links */
a[href$=".pdf"]  /* PDF downloads */
img[src*="cdn.example.com"]  /* images on a specific CDN */
input[name="csrf_token"]  /* the CSRF input, exact match */

Pseudo-classes for position

The position pseudo-classes are what let you say "the third product" or "every other row":

Pseudo-class Matches
:first-child First child of its parent
:last-child Last child of its parent
:nth-child(n) The Nth child (1-indexed)
:nth-child(2n) Every even-positioned child
:nth-child(2n+1) Every odd-positioned child
:nth-last-child(n) Nth counting from the end
:first-of-type First of that tag among siblings
:last-of-type Last of that tag among siblings
:nth-of-type(n) Nth of that tag, what you usually want

The :nth-child vs :nth-of-type distinction trips everyone up. :nth-child(1) means "the first child of the parent, if it happens to be this tag." :nth-of-type(1) means "the first sibling that IS this tag." When the parent has mixed children, you almost always want :nth-of-type.

Pseudo-classes that aren't position

Pseudo-class Matches Scraping use
:not(selector) Elements NOT matching the inner selector Exclude promoted items: .product:not(.sponsored)
:has(selector) Elements that contain a matching descendant tr:has(td.in-stock) (newer; supported in Playwright, lxml 5+)
:contains("text") Elements containing text (jQuery extension, BeautifulSoup string=, Playwright text=) Brittle, use sparingly
:empty Elements with no children Identify placeholder rows

:not() is the workhorse. :contains() is not standard CSS, different libraries spell it differently:

# BeautifulSoup
soup.select('h2:-soup-contains("Free shipping")')

# Playwright
page.locator("h2", has_text="Free shipping")

# lxml.cssselect doesn't support :contains at all, use XPath instead

Treat text-based matching as a last resort; prefer class, id, or data-*.

Putting it together: a real selector

Suppose you want to extract the price of every non-promoted product on a listing, but only those in stock:

article.product:not(.sponsored):has(.in-stock-badge) .price

Read it left to right:

  • article.product, every product card
  • :not(.sponsored), except the promoted ones
  • :has(.in-stock-badge), that contain an in-stock badge
  • .price, descendant .price

One line, ~40 characters, replaces 15 lines of imperative DOM-walking.

Writing resilient selectors

A selector that breaks the first time the site redesigns is technical debt. Two rules:

  1. Prefer semantic anchors. article.product, [data-product-id], and h1 are stable. div.css-1xyz9 (auto-generated by a CSS-in-JS framework) is not, that class name changes on every deploy.

  2. Anchor short, not long. .product .price survives more changes than body > div.layout > main > section.products > div.row > article.product > div.body > p.price. Each extra layer is a brittleness point.

The Catalog108 challenge pages deliberately use a mix of semantic and non-semantic markup so you can practise picking durable anchors.

Hands-on lab

Open practice.scrapingcentral.com/challenges/static/lists/cards and write a single CSS selector that grabs every card title. Then write a selector that picks only the cards marked "featured", and another that excludes them. Run all three with BeautifulSoup's select() and verify the counts. Then come back when you've read the XPath lesson and try the same in XPath, comparing the two against the same markup is the fastest way to internalize both.

Hands-on lab

Practice this lesson on Catalog108, our first-party scraping sandbox.

Open lab target → /challenges/static/lists/cards

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

CSS Selectors, Complete Reference1 / 8

Which selector picks elements with BOTH the 'btn' and 'primary' classes?

Score so far: 0 / 0