Storing Scraped Data in CSV and JSON - Python Scraping

Save your scraped data to CSV and JSON files using Python's built-in modules. Learn best practices for data export, encoding, and file organization.

Once you extract data from websites, you need to store it somewhere useful. CSV and JSON are the two most common formats for scraped data, CSV for spreadsheets and quick analysis, JSON for nested or complex structures.

Saving to CSV

import csv
import requests
from bs4 import BeautifulSoup

url = "https://quotes.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

quotes = []
for quote in soup.select("div.quote"):
    quotes.append({
        "text": quote.select_one("span.text").get_text(),
        "author": quote.select_one("small.author").get_text(),
        "tags": ", ".join(tag.get_text() for tag in quote.select("a.tag")),
    })

# Write to CSV
with open("quotes.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["text", "author", "tags"])
    writer.writeheader()
    writer.writerows(quotes)

print(f"Saved {len(quotes)} quotes to quotes.csv")

Saving to JSON

import json
import requests
from bs4 import BeautifulSoup

url = "https://quotes.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

quotes = []
for quote in soup.select("div.quote"):
    quotes.append({
        "text": quote.select_one("span.text").get_text(),
        "author": quote.select_one("small.author").get_text(),
        "tags": [tag.get_text() for tag in quote.select("a.tag")],
    })

# Write to JSON
with open("quotes.json", "w", encoding="utf-8") as f:
    json.dump(quotes, f, indent=2, ensure_ascii=False)

print(f"Saved {len(quotes)} quotes to quotes.json")

Appending Data Incrementally

For long-running scrapers, write data as you go instead of holding everything in memory.

import csv
import requests
from bs4 import BeautifulSoup

fieldnames = ["text", "author"]

with open("quotes_all.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()

    for page in range(1, 11):
        url = f"https://quotes.toscrape.com/page/{page}/"
        response = requests.get(url)
        soup = BeautifulSoup(response.text, "html.parser")

        for quote in soup.select("div.quote"):
            writer.writerow({
                "text": quote.select_one("span.text").get_text(),
                "author": quote.select_one("small.author").get_text(),
            })
        f.flush()  # Write to disk after each page

print("Done, data saved incrementally.")

JSON Lines (JSONL) for Large Datasets

JSON Lines stores one JSON object per line, making it easy to append and process large files.

import json

def append_jsonl(filepath, record):
    with open(filepath, "a", encoding="utf-8") as f:
        f.write(json.dumps(record, ensure_ascii=False) + "\n")

# Usage during scraping
for item in scraped_items:
    append_jsonl("quotes.jsonl", item)

CSV vs JSON

Feature	CSV	JSON
Nested data	Not supported	Supported
Spreadsheet-friendly	Yes	Not directly
File size	Smaller	Larger
Streaming writes	Easy	Harder (JSONL is easy)
Human-readable	Yes	Yes

Tips

Always use encoding="utf-8" to handle international characters.
Use newline="" when opening CSV files on Windows to avoid blank rows.
For very large datasets, prefer JSONL over JSON, it allows streaming reads and writes.
When scraping at scale with ScraperAPI or ScrapingAnt, incremental writes prevent data loss if the scraper crashes.

Next Steps

Store data in databases for querying and long-term storage
Learn to handle errors and retries for more reliable data collection