Data Parsing
HTML parsing, JSON processing, regex patterns, and data cleaning techniques
15 articles
#1
HTML Parsing with BeautifulSoup - Complete Guide
Master HTML parsing with BeautifulSoup4 in Python. Learn to navigate the DOM, find elements, extract text, and handle attributes.
#2
CSS Selectors vs XPath - When to Use Which
Compare CSS selectors and XPath for web scraping. Learn the syntax, strengths, and best use cases for each approach.
#3
XPath Expressions for Web Scraping
Master XPath expressions for precise element selection in web scraping. Learn axes, predicates, functions, and advanced patterns.
#4
Parsing JSON Responses in Python
Learn to parse, navigate, and extract data from JSON API responses in Python using the json module, jmespath, and pandas.
#5
Using Regex for Data Extraction
Learn to use Python regular expressions to extract emails, URLs, prices, dates, and other patterns from scraped text.
#6
Data Cleaning After Scraping (pandas)
Clean, transform, and prepare scraped data for analysis using pandas. Handle missing values, duplicates, type conversions, and text normalization.
#7
Extracting Structured Data from Unstructured HTML
Techniques for pulling structured records from messy, inconsistent HTML pages. Handle missing elements, variable layouts, and embedded metadata.
#8
Parsing HTML Tables into DataFrames
Extract HTML tables from web pages and convert them into pandas DataFrames for analysis. Handle merged cells, multi-row headers, and nested tables.
#9
Handling Malformed HTML
Learn techniques for parsing broken, incomplete, and malformed HTML that you commonly encounter when web scraping.
#10
Extracting Emails and Phone Numbers from Web Pages
Extract email addresses and phone numbers from scraped web pages using regex patterns, BeautifulSoup, and validation techniques.
#11
Parsing Dates and Prices from Scraped Text
Extract and normalize dates, prices, and currencies from messy scraped text using Python's dateutil, regex, and locale-aware parsing.
#12
Using jq and JSONPath for JSON Parsing
Master jq for command-line JSON processing and JSONPath for querying JSON in Python. Filter, transform, and extract data from API responses.
#13
Deduplication of Scraped Data
Remove duplicate records from scraped datasets using exact matching, fuzzy matching, and content hashing techniques in Python.
#14
Normalizing and Validating Scraped Data
Ensure scraped data quality through normalization and validation using Pydantic models, custom validators, and pandas techniques.
#15
Converting Scraped Data to Different Formats (CSV, JSON, Excel, SQL)
Export scraped data to CSV, JSON, Excel, SQLite, and other formats using Python. Learn best practices for each format and when to use them.