Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Data Extraction

Techniques for extracting structured data from HTML, JSON, and APIs

#1

Getting Started with Web Scraping in Python

Learn the basics of web scraping with Python using the Requests library and BeautifulSoup. Your first scraper in 10 minutes.

beginner
beautifulsoupdata-extraction

#2

CSS Selectors for Web Scraping

Master CSS selectors to extract exactly the data you need. Classes, IDs, attributes, and advanced selector patterns.

beginner
beautifulsoupdata-extraction

#3

Handling Pagination in Web Scraping

Learn how to scrape paginated websites by following next-page links, handling page numbers, and collecting data across multiple pages.

beginner
beautifulsoupdata-extractionpagination

#4

Scraping with Scrapy Framework - Getting Started

Get started with Scrapy, the most powerful Python web scraping framework. Install Scrapy, create a project, and run your first spider.

beginner
scrapydata-extraction

#5

Scrapy Spiders and Items

Define structured data with Scrapy Items and build advanced spiders with CrawlSpider, SitemapSpider, and custom parsing logic.

intermediate
scrapydata-extraction

#6

Scrapy Middleware and Pipelines

Customize Scrapy's request/response flow with middleware and process scraped data using item pipelines for validation, cleaning, and storage.

intermediate
scrapydata-extraction

#7

Async Scraping with HTTPX and asyncio

Speed up your scrapers with async Python. Use HTTPX and asyncio to make concurrent HTTP requests and scrape pages in parallel.

intermediate
httpxasynciodata-extraction

#8

Scraping with aiohttp

Use aiohttp for high-performance async web scraping in Python. Learn session management, connection pooling, and concurrent page fetching.

intermediate
aiohttpasynciodata-extraction

#9

Storing Scraped Data in CSV and JSON

Save your scraped data to CSV and JSON files using Python's built-in modules. Learn best practices for data export, encoding, and file organization.

beginner
data-extractioncsvjson

#10

Storing Scraped Data in Databases (SQLite, PostgreSQL)

Store scraped data in SQLite and PostgreSQL databases. Learn schema design, upserts, and best practices for persistent scraping data storage.

intermediate
data-extractionsqlitepostgresqldatabases

#11

Error Handling and Retries in Scrapers

Build robust scrapers with proper error handling, automatic retries, exponential backoff, and graceful failure recovery.

intermediate
error-handlingretriesdata-extraction

#12

Scraping Behind Login/Authentication

Scrape websites that require login. Handle form-based authentication, session tokens, and authenticated API requests with Python.

intermediate
authenticationsessionsdata-extraction

#13

Handling Cookies and Sessions

Master cookie management and persistent sessions in Python web scraping. Handle session cookies, cookie jars, and cross-request state.

intermediate
sessionscookiesdata-extraction

#14

Scraping Dynamic Content Without a Browser

Extract data from JavaScript-heavy websites without using a browser. Discover hidden APIs, intercept XHR requests, and parse JSON responses.

intermediate
api-scrapingdata-extractiondynamic-content

#15

Using ScraperAPI with Python

Integrate ScraperAPI into your Python scrapers for automatic proxy rotation, CAPTCHA solving, and JavaScript rendering. Complete guide with code examples.

beginner
scraperapiproxy-rotationdata-extraction

#16

Using ScrapingAnt with Python

Integrate ScrapingAnt into your Python scrapers for headless browser rendering, proxy rotation, and anti-bot bypass. Complete tutorial with examples.

beginner
scrapingantproxy-rotationdata-extraction

#17

Web Scraping with lxml and XPath

Use lxml and XPath expressions for fast, powerful HTML parsing. Learn XPath syntax, axes, and functions for precise data extraction.

intermediate
lxmlxpathdata-extraction

#18

Extracting Data from HTML Tables

Scrape HTML tables from websites using BeautifulSoup and pandas. Handle complex tables with rowspan, colspan, and nested elements.

beginner
beautifulsouppandasdata-extraction

#19

Scraping Images and Files

Download images, PDFs, and other files while web scraping. Learn URL resolution, streaming downloads, and file organization best practices.

intermediate
beautifulsoupdata-extractionfile-download

#20

Building a Price Monitoring Scraper

Build a complete price monitoring scraper that tracks product prices over time, detects price drops, and sends alerts. A real-world scraping project.

intermediate
beautifulsoupdata-extractionproject

#21

Scraping Multiple Pages Concurrently

Speed up scraping with concurrent requests using threading, multiprocessing, and asyncio. Learn to balance speed with politeness.

intermediate
concurrencyasynciothreadingdata-extraction

#22

Scraping with Python and Regex

Use Python regular expressions to extract emails, phone numbers, prices, URLs, and other patterns from scraped web pages.

intermediate
regexdata-extraction

#23

Handling Different Encodings (UTF-8, ISO-8859)

Handle character encoding issues in web scraping. Detect, convert, and fix UTF-8, ISO-8859, and other encodings to avoid garbled text.

intermediate
encodingdata-extraction

#24

Scraping XML and RSS Feeds

Parse XML documents and RSS/Atom feeds with Python. Extract structured data from feeds using feedparser, lxml, and the xml.etree module.

beginner
xmlrssdata-extraction

#25

Building a News Aggregator Scraper

Build a complete news aggregator that collects articles from multiple sources using RSS feeds and web scraping. Deduplicate, categorize, and store results.

intermediate
beautifulsouprssdata-extractionproject

#26

Scraping with Zyte API

Use Zyte API (formerly Scrapy Cloud) for intelligent web scraping with automatic extraction, browser rendering, and anti-bot bypass.

intermediate
zyteproxy-rotationdata-extraction

#27

Web Scraping Best Practices and Patterns

Master web scraping best practices: respectful scraping, anti-detection, data quality, error recovery, project architecture, and legal considerations.

advanced
best-practicesdata-extractionarchitecture

#1

Introduction to Playwright for Web Scraping

Learn to scrape JavaScript-heavy websites using Playwright. Handles SPAs, lazy loading, and dynamic content.

intermediate
playwrightdata-extraction

#2

Selenium WebDriver Basics for Web Scraping

Learn the fundamentals of Selenium WebDriver for web scraping. Set up Chrome WebDriver, navigate pages, and extract data from dynamic websites.

beginner
seleniumdata-extractionwebdriver

#3

Playwright Advanced: Handling Popups and Dialogs

Master handling JavaScript alerts, confirm dialogs, popups, and new browser windows in Playwright for reliable web scraping.

intermediate
playwrightpopupsdialogsdata-extraction

#4

Playwright Waiting Strategies and Selectors

Learn Playwright's waiting strategies and powerful selector engine to build reliable scrapers that handle dynamic content loading.

intermediate
playwrightselectorswaitingdata-extraction

#5

Selenium: Handling JavaScript-Rendered Pages

Learn how to scrape JavaScript-rendered pages with Selenium. Handle dynamic content, AJAX calls, and single-page applications.

intermediate
seleniumjavascriptdata-extractiondynamic-content

#6

Taking Screenshots and PDFs with Playwright

Learn to capture full-page screenshots, element screenshots, and generate PDFs from web pages using Playwright.

beginner
playwrightscreenshotspdfdata-extraction

#7

Scraping Infinite Scroll Pages

Learn techniques to scrape infinite scroll pages using Playwright and Selenium. Handle lazy-loaded content and extract all data from endlessly scrolling websites.

intermediate
playwrightseleniuminfinite-scrolldata-extraction

#8

Handling Dropdowns, Forms, and Clicks

Learn how to interact with web forms, dropdowns, checkboxes, and buttons using Playwright and Selenium for effective web scraping.

beginner
playwrightseleniumformsinteractiondata-extraction

#10

Using Playwright with Proxies

Learn to configure Playwright with HTTP, SOCKS5, and rotating proxies for anonymous web scraping and IP rotation.

intermediate
playwrightproxiesip-rotationdata-extraction

#11

Using Selenium with Proxies

Configure Selenium WebDriver with HTTP, SOCKS, and authenticated proxies for anonymous and scalable web scraping.

intermediate
seleniumproxiesip-rotationdata-extraction

#12

Puppeteer Basics for Web Scraping

Get started with Puppeteer for web scraping in Node.js. Learn to launch headless Chrome, navigate pages, and extract data from dynamic websites.

beginner
puppeteernodejsdata-extraction

#14

Intercepting Network Requests with Playwright

Learn to intercept, modify, and block network requests in Playwright for faster scraping and direct API data extraction.

advanced
playwrightnetwork-interceptionapi-scrapingdata-extraction

#15

Scraping SPAs: React, Vue, and Angular Sites

Learn strategies for scraping single-page applications built with React, Vue, and Angular using browser automation tools.

advanced
playwrightseleniumspareactvueangulardata-extraction

#18

Scraping with Playwright in Python

A comprehensive guide to web scraping with Playwright in Python, covering sync and async APIs, data extraction patterns, and exporting results.

intermediate
playwrightpythondata-extractionasync

#1

Introduction to API Scraping

Learn what API scraping is, why it's more reliable than HTML scraping, and how to get started extracting data from web APIs.

beginner
apisdata-extractionweb-scraping

#4

Scraping Paginated APIs

Learn how to handle offset-based, page-based, and cursor-based pagination when scraping APIs with Python.

beginner
apispaginationdata-extraction

#5

Working with GraphQL APIs

Learn how to discover and scrape GraphQL APIs, craft queries, handle variables, and paginate through GraphQL endpoints.

intermediate
apisgraphqldata-extraction

#9

Scraping JSON APIs and Processing Responses

Learn how to scrape JSON APIs, navigate nested response structures, and extract exactly the data you need using Python.

beginner
apisjsondata-extractiondata-processing

#13

Scraping Social Media APIs

Learn techniques for extracting data from social media platforms using their official APIs and alternative approaches.

advanced
apissocial-mediatwitterredditdata-extraction

#1

HTML Parsing with BeautifulSoup - Complete Guide

Master HTML parsing with BeautifulSoup4 in Python. Learn to navigate the DOM, find elements, extract text, and handle attributes.

beginner
beautifulsouphtml-parsingdata-extraction

#3

XPath Expressions for Web Scraping

Master XPath expressions for precise element selection in web scraping. Learn axes, predicates, functions, and advanced patterns.

intermediate
xpathhtml-parsinglxmldata-extraction

#4

Parsing JSON Responses in Python

Learn to parse, navigate, and extract data from JSON API responses in Python using the json module, jmespath, and pandas.

beginner
jsondata-extractionpythonapis

#5

Using Regex for Data Extraction

Learn to use Python regular expressions to extract emails, URLs, prices, dates, and other patterns from scraped text.

intermediate
regexdata-extractiontext-parsing

#7

Extracting Structured Data from Unstructured HTML

Techniques for pulling structured records from messy, inconsistent HTML pages. Handle missing elements, variable layouts, and embedded metadata.

intermediate
html-parsingdata-extractionbeautifulsoupschema

#8

Parsing HTML Tables into DataFrames

Extract HTML tables from web pages and convert them into pandas DataFrames for analysis. Handle merged cells, multi-row headers, and nested tables.

beginner
html-parsingpandastablesdata-extraction

#10

Extracting Emails and Phone Numbers from Web Pages

Extract email addresses and phone numbers from scraped web pages using regex patterns, BeautifulSoup, and validation techniques.

beginner
regexdata-extractioncontact-infotext-parsing

#11

Parsing Dates and Prices from Scraped Text

Extract and normalize dates, prices, and currencies from messy scraped text using Python's dateutil, regex, and locale-aware parsing.

intermediate
text-parsingdatespricesdata-extraction

#12

Using jq and JSONPath for JSON Parsing

Master jq for command-line JSON processing and JSONPath for querying JSON in Python. Filter, transform, and extract data from API responses.

intermediate
jsonjqjsonpathdata-extractioncli

#1

Proxy Basics for Web Scraping

Understand proxy types, when to use them, and how to integrate proxies into your Python scrapers.

beginner
proxiesdata-extraction

#12

Handling Honeypot Traps

Learn how to identify and avoid honeypot traps that websites use to detect and block web scrapers.

intermediate
honeypotsanti-detectiondata-extraction