Guide
Complete Guide to E-Commerce Scraping
Everything you need to know about scraping e-commerce websites. Covers Amazon, Shopify, eBay, and other platforms with practical techniques.
E-commerce scraping is the most common commercial use of web scraping. This guide covers techniques for extracting product data from any online store.
What E-Commerce Data to Scrape
| Data Point | Business Value |
|---|---|
| Product names | Catalog comparison |
| Prices | Competitive pricing |
| Reviews and ratings | Quality analysis |
| Product images | Visual comparison |
| Availability | Inventory monitoring |
| Descriptions | Content analysis |
| Categories | Taxonomy mapping |
| Seller info | Marketplace analysis |
Platform-Specific Approaches
Amazon
Amazon is the hardest e-commerce site to scrape. Use ScraperAPI with its built-in Amazon parser.
import requests
API_KEY = "YOUR_SCRAPERAPI_KEY"
asin = "B0XXXXXXXXX"
url = f"https://www.amazon.com/dp/{asin}"
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={url}&autoparse=true"
)
product = resp.json() # Structured product data
Shopify Stores
The easiest targets, use the built-in JSON API.
resp = requests.get("https://store.com/products.json?limit=250")
products = resp.json()["products"]
eBay
eBay data is accessible but paginated. Use completed listings for pricing intelligence.
Walmart
Requires JavaScript rendering and anti-bot bypass. Use ScraperAPI.
Common E-Commerce Scraping Challenges
- Anti-bot protection, Most major platforms actively block scrapers
- Dynamic pricing, Prices change based on location, time, and user
- Infinite scroll, Product listings load as you scroll
- A/B testing, Different users see different layouts
- Rate limiting, Aggressive request limits
Building a Price Monitor
import requests
import json
from datetime import datetime
API_KEY = "YOUR_SCRAPERAPI_KEY"
def track_price(product_url):
resp = requests.get(
f"http://api.scraperapi.com?api_key={API_KEY}&url={product_url}&render=true"
)
# Extract price from response
# Store with timestamp
record = {
"url": product_url,
"timestamp": datetime.now().isoformat(),
"price": extracted_price,
}
with open("price_history.jsonl", "a") as f:
f.write(json.dumps(record) + "\n")
Tools for E-Commerce Scraping
| Tool | Best For | Pricing |
|---|---|---|
| ScraperAPI | All platforms | From $49/mo |
| ScrapingAnt | Budget scraping | From $19/mo |
| Scrapy | Custom crawlers | Free |
| Playwright | JS-heavy stores | Free |
Best Practices
- Use structured data, JSON-LD, microdata, and product APIs are more reliable than HTML parsing
- Monitor for layout changes, E-commerce sites redesign frequently
- Track prices over time, Single snapshots are less valuable than trends
- Deduplicate products, The same product appears on multiple pages
- Handle variants, Products have sizes, colors, and other options
- Respect the platform, Scrape responsibly and do not overload servers