Scraping Lists, Cards, Repeating Patterns, Static Scraping

Card grids, list views, search results, the second-most-common HTML data pattern after tables. The systematic 'find the container, iterate items, extract per-item' approach.

If a page has the same visual unit repeated, product cards, search results, blog posts, comments, there's a wrapping element for each. The pattern is always: locate the wrapper, iterate, extract from inside each. This lesson is the disciplined version.

Step 1: find the wrapper

Open DevTools, click one of the repeating units, walk UP in the Elements panel until you find the smallest element that wraps the unit cleanly:

<div class="card-grid">
  <article class="product-card" data-product-id="42">
  <h2><a href="/products/yellow-mug">Yellow mug</a></h2>
  <p class="price">$14.99</p>
  <span class="badge">New</span>
  </article>
  <article class="product-card" data-product-id="43">
  ...
  </article>
</div>

Here, article.product-card is the wrapper. The class is intentional, the developer is marking these as repeating. Selectors that key on this class are usually stable across redesigns.

Step 2: iterate the wrapper, extract inside

import requests
from bs4 import BeautifulSoup

r = requests.get("https://practice.scrapingcentral.com/challenges/static/lists/cards")
soup = BeautifulSoup(r.content, "lxml")

cards = soup.select("article.product-card")
print(f"Found {len(cards)} cards")

products = []
for card in cards:
  products.append({
  "id":  card.get("data-product-id"),
  "name":  card.select_one("h2").get_text(strip=True),
  "url":  card.select_one("h2 a")["href"],
  "price": card.select_one(".price").get_text(strip=True),
  "badge": card.select_one(".badge").get_text(strip=True) if card.select_one(".badge") else None,
  })

Notice every selector inside the loop is scoped to card, not the full document. That's the cardinal rule. card.select_one(".price") looks only inside this card.

Step 3: handle missing fields gracefully

Not every card has every field. The badge field above is a perfect example, only "new" products have one. The defensive pattern:

def safe_text(el, selector, default=None):
  found = el.select_one(selector)
  return found.get_text(strip=True) if found else default

def safe_attr(el, selector, attr, default=None):
  found = el.select_one(selector)
  return found.get(attr, default) if found else default

Then your card extraction becomes:

products.append({
  "id":  card.get("data-product-id"),
  "name":  safe_text(card, "h2"),
  "url":  safe_attr(card, "h2 a", "href"),
  "price": safe_text(card, ".price"),
  "badge": safe_text(card, ".badge"),
})

Production scrapers always have a helper like this. It eliminates the AttributeError: NoneType bug class entirely.

Step 4: validate count before trusting data

Before parsing, confirm the wrapper actually matched what you expected:

cards = soup.select("article.product-card")
assert 12 <= len(cards) <= 24, f"Expected 12-24 cards, got {len(cards)}"

When site HTML changes silently, a typo in a class name, an A/B test serving a different layout, your selector returns 0 cards and you silently produce empty output. An assertion fails loudly and forces investigation.

Repeating but inconsistent structure

Some lists mix item types, a search results page might intersperse "promoted" rows with regular rows:

<ul class="results">
  <li class="result">Normal result</li>
  <li class="result promoted">Ad result</li>
  <li class="result">Normal result</li>
</ul>

Either extract everything and tag the type:

for li in soup.select("li.result"):
  item = {
  "type": "promoted" if "promoted" in li.get("class", []) else "normal",
  "text": li.get_text(strip=True),
  }

…or filter at the selector level:

normal_only = soup.select("li.result:not(.promoted)")

CSS's :not() is your friend.

Detecting the "no results" state

Some pages return a different DOM when the list is empty:

<div class="empty-state">No products match your filter.</div>

Always check both:

cards = soup.select("article.product-card")
if not cards:
  empty = soup.select_one(".empty-state")
  if empty:
  print("Empty result set:", empty.get_text(strip=True))
  else:
  # neither cards nor empty-state, the page may have changed
  raise RuntimeError("Page structure changed: neither cards nor empty state found")

Pulling out structured sub-data

Some cards have multiple types of nested info. Pull each into a sub-dict rather than flattening:

for card in cards:
  products.append({
  "name":  safe_text(card, "h2"),
  "price":  safe_text(card, ".price"),
  "tags":  [t.get_text(strip=True) for t in card.select(".tag")],
  "rating": {
  "stars":  safe_text(card, ".rating .stars"),
  "count":  safe_text(card, ".rating .count"),
  },
  })

For JSON output, nested dicts/arrays are fine. If you'll dump to CSV later, flatten with prefix (rating_stars, rating_count).

PHP: the same pattern in DomCrawler

$products = $crawler->filter('article.product-card')->each(function (Crawler $card) {
  return [
  'id'  => $card->attr('data-product-id'),
  'name'  => $card->filter('h2')->text(''),
  'url'  => $card->filter('h2 a')->attr('href', ''),
  'price' => $card->filter('.price')->text(''),
  'badge' => $card->filter('.badge')->count() > 0
  ? $card->filter('.badge')->text()
  : null,
  ];
});

text('') with a default avoids the InvalidArgumentException on missing nodes. ->count() > 0 is the explicit-existence check.

Pagination is the next layer

This lesson stops at one page. Most card lists span many pages, that's Lesson 1.23 (pagination patterns). The card-extraction code stays the same; you just wrap it in a paging loop.

A common debugging tactic

When extraction returns weirdly few results, print the actual HTML of one matched card:

print(cards[0].prettify()[:1000])

You'll often see the structure you thought you had isn't quite the structure on the page, maybe the class is different, maybe an outer container intercepts. Looking at one real card with formatted HTML is the fastest debug.

Hands-on lab

Scrape every card from /challenges/static/lists/cards. Build a list-of-dicts with at least name, price, and any badge or tag. Use safe-text helpers so missing fields don't crash. Assert that you got the expected count (or print it and verify it matches the visible card count in the browser).

Scraping Lists, Cards, Repeating Patterns

What you’ll learn

Step 1: find the wrapper

Step 2: iterate the wrapper, extract inside

Step 3: handle missing fields gracefully

Step 4: validate count before trusting data

Repeating but inconsistent structure

Detecting the "no results" state

Pulling out structured sub-data

PHP: the same pattern in DomCrawler

Pagination is the next layer

A common debugging tactic

Hands-on lab

Hands-on lab

Quiz, check your understanding

What is the cardinal rule for extraction inside a per-card loop?