Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Storing Scraped Data in CSV and JSON

Save your scraped data to CSV and JSON files using Python's built-in modules. Learn best practices for data export, encoding, and file organization.

Python Scraping · #9beginner3 min read
Share:WhatsAppLinkedIn

Once you extract data from websites, you need to store it somewhere useful. CSV and JSON are the two most common formats for scraped data, CSV for spreadsheets and quick analysis, JSON for nested or complex structures.

Saving to CSV

import csv
import requests
from bs4 import BeautifulSoup

url = "https://quotes.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

quotes = []
for quote in soup.select("div.quote"):
    quotes.append({
        "text": quote.select_one("span.text").get_text(),
        "author": quote.select_one("small.author").get_text(),
        "tags": ", ".join(tag.get_text() for tag in quote.select("a.tag")),
    })

# Write to CSV
with open("quotes.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["text", "author", "tags"])
    writer.writeheader()
    writer.writerows(quotes)

print(f"Saved {len(quotes)} quotes to quotes.csv")

Saving to JSON

import json
import requests
from bs4 import BeautifulSoup

url = "https://quotes.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

quotes = []
for quote in soup.select("div.quote"):
    quotes.append({
        "text": quote.select_one("span.text").get_text(),
        "author": quote.select_one("small.author").get_text(),
        "tags": [tag.get_text() for tag in quote.select("a.tag")],
    })

# Write to JSON
with open("quotes.json", "w", encoding="utf-8") as f:
    json.dump(quotes, f, indent=2, ensure_ascii=False)

print(f"Saved {len(quotes)} quotes to quotes.json")

Appending Data Incrementally

For long-running scrapers, write data as you go instead of holding everything in memory.

import csv
import requests
from bs4 import BeautifulSoup

fieldnames = ["text", "author"]

with open("quotes_all.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()

    for page in range(1, 11):
        url = f"https://quotes.toscrape.com/page/{page}/"
        response = requests.get(url)
        soup = BeautifulSoup(response.text, "html.parser")

        for quote in soup.select("div.quote"):
            writer.writerow({
                "text": quote.select_one("span.text").get_text(),
                "author": quote.select_one("small.author").get_text(),
            })
        f.flush()  # Write to disk after each page

print("Done, data saved incrementally.")

JSON Lines (JSONL) for Large Datasets

JSON Lines stores one JSON object per line, making it easy to append and process large files.

import json

def append_jsonl(filepath, record):
    with open(filepath, "a", encoding="utf-8") as f:
        f.write(json.dumps(record, ensure_ascii=False) + "\n")

# Usage during scraping
for item in scraped_items:
    append_jsonl("quotes.jsonl", item)

CSV vs JSON

Feature CSV JSON
Nested data Not supported Supported
Spreadsheet-friendly Yes Not directly
File size Smaller Larger
Streaming writes Easy Harder (JSONL is easy)
Human-readable Yes Yes

Tips

  • Always use encoding="utf-8" to handle international characters.
  • Use newline="" when opening CSV files on Windows to avoid blank rows.
  • For very large datasets, prefer JSONL over JSON, it allows streaming reads and writes.
  • When scraping at scale with ScraperAPI or ScrapingAnt, incremental writes prevent data loss if the scraper crashes.

Next Steps

  • Store data in databases for querying and long-term storage
  • Learn to handle errors and retries for more reliable data collection