Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Guide

Best Ways to Store Scraped Data (CSV, JSON, Database)

Learn the best storage formats for scraped data. Compare CSV, JSON, SQLite, PostgreSQL, and MongoDB with practical Python examples.

Choosing the right storage format is critical for working with scraped data. Here is a comparison of the most common options.

Quick Comparison

Format Best For Size Limit Query Support Setup
CSV Tabular data, spreadsheets ~1M rows None Zero
JSON Nested/hierarchical data ~100MB None Zero
SQLite Medium datasets, queries ~10GB Full SQL Minimal
PostgreSQL Large-scale, production Unlimited Full SQL Server
MongoDB Flexible schemas Unlimited Rich queries Server

CSV Storage

Best for simple, flat data that fits in a spreadsheet.

import csv

data = [
    {"title": "Product A", "price": 29.99, "url": "https://..."},
    {"title": "Product B", "price": 49.99, "url": "https://..."},
]

with open("products.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["title", "price", "url"])
    writer.writeheader()
    writer.writerows(data)

JSON Storage

Best for nested or irregular data structures.

import json

data = {
    "scraped_at": "2026-04-21",
    "products": [
        {"title": "Product A", "variants": [{"size": "S", "price": 29.99}]},
    ]
}

with open("products.json", "w") as f:
    json.dump(data, f, indent=2)

For large datasets, use JSON Lines (one JSON object per line):

with open("products.jsonl", "a") as f:
    for item in data:
        f.write(json.dumps(item) + "\n")

SQLite Database

Best for medium datasets that need querying. No server required.

import sqlite3

conn = sqlite3.connect("scraped_data.db")
cursor = conn.cursor()

cursor.execute("""
    CREATE TABLE IF NOT EXISTS products (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        title TEXT,
        price REAL,
        url TEXT UNIQUE,
        scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    )
""")

cursor.execute(
    "INSERT OR IGNORE INTO products (title, price, url) VALUES (?, ?, ?)",
    ("Product A", 29.99, "https://example.com/product-a")
)
conn.commit()

PostgreSQL

Best for production systems with concurrent access and large datasets.

import psycopg2

conn = psycopg2.connect("postgresql://user:pass@localhost/scraping_db")
cursor = conn.cursor()

cursor.execute(
    "INSERT INTO products (title, price, url) VALUES (%s, %s, %s) ON CONFLICT (url) DO UPDATE SET price = %s",
    ("Product A", 29.99, "https://...", 29.99)
)
conn.commit()

Our Recommendations

  1. Starting out? Use CSV or JSON files
  2. Need queries? Use SQLite, zero setup, full SQL
  3. Production system? Use PostgreSQL
  4. Flexible schemas? Use MongoDB
  5. Pair with ScraperAPI for reliable data collection and any of these storage options for persistence