Data Extraction

Techniques for extracting structured data from HTML, JSON, and APIs

Getting Started with Web Scraping in Python

Learn the basics of web scraping with Python using the Requests library and BeautifulSoup. Your first scraper in 10 minutes.

beginner

beautifulsoupdata-extraction

CSS Selectors for Web Scraping

Master CSS selectors to extract exactly the data you need. Classes, IDs, attributes, and advanced selector patterns.

beginner

beautifulsoupdata-extraction

Handling Pagination in Web Scraping

Learn how to scrape paginated websites by following next-page links, handling page numbers, and collecting data across multiple pages.

beginner

beautifulsoupdata-extractionpagination

Scraping with Scrapy Framework - Getting Started

Get started with Scrapy, the most powerful Python web scraping framework. Install Scrapy, create a project, and run your first spider.

beginner

scrapydata-extraction

Scrapy Spiders and Items

Define structured data with Scrapy Items and build advanced spiders with CrawlSpider, SitemapSpider, and custom parsing logic.

intermediate

scrapydata-extraction

Scrapy Middleware and Pipelines

Customize Scrapy's request/response flow with middleware and process scraped data using item pipelines for validation, cleaning, and storage.

intermediate

scrapydata-extraction

Async Scraping with HTTPX and asyncio

Speed up your scrapers with async Python. Use HTTPX and asyncio to make concurrent HTTP requests and scrape pages in parallel.

intermediate

httpxasynciodata-extraction

Scraping with aiohttp

Use aiohttp for high-performance async web scraping in Python. Learn session management, connection pooling, and concurrent page fetching.

intermediate

aiohttpasynciodata-extraction

Storing Scraped Data in CSV and JSON

Save your scraped data to CSV and JSON files using Python's built-in modules. Learn best practices for data export, encoding, and file organization.

beginner

data-extractioncsvjson

#10

Storing Scraped Data in Databases (SQLite, PostgreSQL)

Store scraped data in SQLite and PostgreSQL databases. Learn schema design, upserts, and best practices for persistent scraping data storage.

intermediate

data-extractionsqlitepostgresqldatabases

#11

Error Handling and Retries in Scrapers

Build robust scrapers with proper error handling, automatic retries, exponential backoff, and graceful failure recovery.

intermediate

error-handlingretriesdata-extraction

#12

Scraping Behind Login/Authentication

Scrape websites that require login. Handle form-based authentication, session tokens, and authenticated API requests with Python.

intermediate

authenticationsessionsdata-extraction

#13

Handling Cookies and Sessions

Master cookie management and persistent sessions in Python web scraping. Handle session cookies, cookie jars, and cross-request state.

intermediate

sessionscookiesdata-extraction

#14

Scraping Dynamic Content Without a Browser

Extract data from JavaScript-heavy websites without using a browser. Discover hidden APIs, intercept XHR requests, and parse JSON responses.

intermediate

api-scrapingdata-extractiondynamic-content

#15

Using ScraperAPI with Python

Integrate ScraperAPI into your Python scrapers for automatic proxy rotation, CAPTCHA solving, and JavaScript rendering. Complete guide with code examples.

beginner

scraperapiproxy-rotationdata-extraction

#16

Using ScrapingAnt with Python

Integrate ScrapingAnt into your Python scrapers for headless browser rendering, proxy rotation, and anti-bot bypass. Complete tutorial with examples.

beginner

scrapingantproxy-rotationdata-extraction

#17

Web Scraping with lxml and XPath

Use lxml and XPath expressions for fast, powerful HTML parsing. Learn XPath syntax, axes, and functions for precise data extraction.

intermediate

lxmlxpathdata-extraction

#18

Extracting Data from HTML Tables

Scrape HTML tables from websites using BeautifulSoup and pandas. Handle complex tables with rowspan, colspan, and nested elements.

beginner

beautifulsouppandasdata-extraction

#19

Scraping Images and Files

Download images, PDFs, and other files while web scraping. Learn URL resolution, streaming downloads, and file organization best practices.

intermediate

beautifulsoupdata-extractionfile-download

#20

Building a Price Monitoring Scraper

Build a complete price monitoring scraper that tracks product prices over time, detects price drops, and sends alerts. A real-world scraping project.

intermediate

beautifulsoupdata-extractionproject

#21

Scraping Multiple Pages Concurrently

Speed up scraping with concurrent requests using threading, multiprocessing, and asyncio. Learn to balance speed with politeness.

intermediate

concurrencyasynciothreadingdata-extraction

#22

Scraping with Python and Regex

Use Python regular expressions to extract emails, phone numbers, prices, URLs, and other patterns from scraped web pages.

intermediate

regexdata-extraction

#23

Handling Different Encodings (UTF-8, ISO-8859)

Handle character encoding issues in web scraping. Detect, convert, and fix UTF-8, ISO-8859, and other encodings to avoid garbled text.

intermediate

encodingdata-extraction

#24

Scraping XML and RSS Feeds

Parse XML documents and RSS/Atom feeds with Python. Extract structured data from feeds using feedparser, lxml, and the xml.etree module.

beginner

xmlrssdata-extraction

#25

Building a News Aggregator Scraper

Build a complete news aggregator that collects articles from multiple sources using RSS feeds and web scraping. Deduplicate, categorize, and store results.

intermediate

beautifulsouprssdata-extractionproject

#26

Scraping with Zyte API

Use Zyte API (formerly Scrapy Cloud) for intelligent web scraping with automatic extraction, browser rendering, and anti-bot bypass.

intermediate

zyteproxy-rotationdata-extraction

#27

Web Scraping Best Practices and Patterns

Master web scraping best practices: respectful scraping, anti-detection, data quality, error recovery, project architecture, and legal considerations.

advanced

best-practicesdata-extractionarchitecture

Introduction to Playwright for Web Scraping

Learn to scrape JavaScript-heavy websites using Playwright. Handles SPAs, lazy loading, and dynamic content.

intermediate

playwrightdata-extraction

Selenium WebDriver Basics for Web Scraping

Learn the fundamentals of Selenium WebDriver for web scraping. Set up Chrome WebDriver, navigate pages, and extract data from dynamic websites.

beginner

seleniumdata-extractionwebdriver

Playwright Advanced: Handling Popups and Dialogs

Master handling JavaScript alerts, confirm dialogs, popups, and new browser windows in Playwright for reliable web scraping.

intermediate

playwrightpopupsdialogsdata-extraction

Playwright Waiting Strategies and Selectors

Learn Playwright's waiting strategies and powerful selector engine to build reliable scrapers that handle dynamic content loading.

intermediate

playwrightselectorswaitingdata-extraction

Selenium: Handling JavaScript-Rendered Pages

Learn how to scrape JavaScript-rendered pages with Selenium. Handle dynamic content, AJAX calls, and single-page applications.

intermediate

seleniumjavascriptdata-extractiondynamic-content

Taking Screenshots and PDFs with Playwright

Learn to capture full-page screenshots, element screenshots, and generate PDFs from web pages using Playwright.

beginner

playwrightscreenshotspdfdata-extraction

Scraping Infinite Scroll Pages

Learn techniques to scrape infinite scroll pages using Playwright and Selenium. Handle lazy-loaded content and extract all data from endlessly scrolling websites.

intermediate

playwrightseleniuminfinite-scrolldata-extraction

Handling Dropdowns, Forms, and Clicks

Learn how to interact with web forms, dropdowns, checkboxes, and buttons using Playwright and Selenium for effective web scraping.

beginner

playwrightseleniumformsinteractiondata-extraction

#10

Using Playwright with Proxies

Learn to configure Playwright with HTTP, SOCKS5, and rotating proxies for anonymous web scraping and IP rotation.

intermediate

playwrightproxiesip-rotationdata-extraction

#11

Using Selenium with Proxies

Configure Selenium WebDriver with HTTP, SOCKS, and authenticated proxies for anonymous and scalable web scraping.

intermediate

seleniumproxiesip-rotationdata-extraction

#12

Puppeteer Basics for Web Scraping

Get started with Puppeteer for web scraping in Node.js. Learn to launch headless Chrome, navigate pages, and extract data from dynamic websites.

beginner

puppeteernodejsdata-extraction

#14

Intercepting Network Requests with Playwright

Learn to intercept, modify, and block network requests in Playwright for faster scraping and direct API data extraction.

advanced

playwrightnetwork-interceptionapi-scrapingdata-extraction

#15

Scraping SPAs: React, Vue, and Angular Sites

Learn strategies for scraping single-page applications built with React, Vue, and Angular using browser automation tools.

advanced

playwrightseleniumspareactvueangulardata-extraction

#18

Scraping with Playwright in Python

A comprehensive guide to web scraping with Playwright in Python, covering sync and async APIs, data extraction patterns, and exporting results.

intermediate

playwrightpythondata-extractionasync

Introduction to API Scraping

Learn what API scraping is, why it's more reliable than HTML scraping, and how to get started extracting data from web APIs.

beginner

apisdata-extractionweb-scraping

Scraping Paginated APIs

Learn how to handle offset-based, page-based, and cursor-based pagination when scraping APIs with Python.

beginner

apispaginationdata-extraction

Working with GraphQL APIs

Learn how to discover and scrape GraphQL APIs, craft queries, handle variables, and paginate through GraphQL endpoints.

intermediate

apisgraphqldata-extraction

Scraping JSON APIs and Processing Responses

Learn how to scrape JSON APIs, navigate nested response structures, and extract exactly the data you need using Python.

beginner

apisjsondata-extractiondata-processing

#13

Scraping Social Media APIs

Learn techniques for extracting data from social media platforms using their official APIs and alternative approaches.

advanced

apissocial-mediatwitterredditdata-extraction

HTML Parsing with BeautifulSoup - Complete Guide

Master HTML parsing with BeautifulSoup4 in Python. Learn to navigate the DOM, find elements, extract text, and handle attributes.

beginner

beautifulsouphtml-parsingdata-extraction

XPath Expressions for Web Scraping

Master XPath expressions for precise element selection in web scraping. Learn axes, predicates, functions, and advanced patterns.

intermediate

xpathhtml-parsinglxmldata-extraction

Parsing JSON Responses in Python

Learn to parse, navigate, and extract data from JSON API responses in Python using the json module, jmespath, and pandas.

beginner

jsondata-extractionpythonapis

Using Regex for Data Extraction

Learn to use Python regular expressions to extract emails, URLs, prices, dates, and other patterns from scraped text.

intermediate

regexdata-extractiontext-parsing

Extracting Structured Data from Unstructured HTML

Techniques for pulling structured records from messy, inconsistent HTML pages. Handle missing elements, variable layouts, and embedded metadata.

intermediate

html-parsingdata-extractionbeautifulsoupschema

Parsing HTML Tables into DataFrames

Extract HTML tables from web pages and convert them into pandas DataFrames for analysis. Handle merged cells, multi-row headers, and nested tables.

beginner

html-parsingpandastablesdata-extraction

#10

Extracting Emails and Phone Numbers from Web Pages

Extract email addresses and phone numbers from scraped web pages using regex patterns, BeautifulSoup, and validation techniques.

beginner

regexdata-extractioncontact-infotext-parsing

#11

Parsing Dates and Prices from Scraped Text

Extract and normalize dates, prices, and currencies from messy scraped text using Python's dateutil, regex, and locale-aware parsing.

intermediate

text-parsingdatespricesdata-extraction

#12

Using jq and JSONPath for JSON Parsing

Master jq for command-line JSON processing and JSONPath for querying JSON in Python. Filter, transform, and extract data from API responses.

intermediate

jsonjqjsonpathdata-extractioncli

Proxy Basics for Web Scraping

Understand proxy types, when to use them, and how to integrate proxies into your Python scrapers.

beginner

proxiesdata-extraction

#12

Handling Honeypot Traps

Learn how to identify and avoid honeypot traps that websites use to detect and block web scrapers.

intermediate

honeypotsanti-detectiondata-extraction