Deploying Scrapers to a VPS (DigitalOcean, Vultr) - Deployment

Step-by-step guide to deploying your Python web scraper to a VPS on DigitalOcean or Vultr for 24/7 operation.

A VPS (Virtual Private Server) is the simplest way to run your scraper 24/7. For $5-10/month you get a dedicated server with full control.

Choosing a Provider

Provider	Cheapest Plan	CPU	RAM	Best For
DigitalOcean	$4/month	1 vCPU	512MB	Beginners, good docs
Vultr	$2.50/month	1 vCPU	512MB	Budget-friendly
Hetzner	$3.79/month	2 vCPU	2GB	Best value in EU

Initial Server Setup

After creating your VPS, SSH in and set up the environment:

# Connect to your server
ssh root@your-server-ip

# Update packages
apt update && apt upgrade -y

# Install Python and pip
apt install python3 python3-pip python3-venv git -y

# Create a non-root user for running scrapers
adduser scraper
usermod -aG sudo scraper
su - scraper

Deploying Your Scraper

# As the scraper user
cd /home/scraper

# Clone your scraper project
git clone https://github.com/yourusername/my-scraper.git
cd my-scraper

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Running as a Systemd Service

Create a systemd service so your scraper starts on boot and restarts on failure:

sudo nano /etc/systemd/system/scraper.service

[Unit]
Description=Web Scraper Service
After=network.target

[Service]
Type=simple
User=scraper
WorkingDirectory=/home/scraper/my-scraper
ExecStart=/home/scraper/my-scraper/venv/bin/python main.py
Restart=on-failure
RestartSec=30
Environment=PYTHONUNBUFFERED=1

[Install]
WantedBy=multi-user.target

# Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable scraper
sudo systemctl start scraper

# Check status
sudo systemctl status scraper

# View logs
sudo journalctl -u scraper -f

A Simple Scraper to Deploy

# main.py
import requests
import time
import json
from datetime import datetime

def scrape_prices():
    url = "https://api.example.com/products"
    response = requests.get(url, timeout=30)
    data = response.json()

    timestamp = datetime.now().isoformat()
    filename = f"data/prices_{timestamp[:10]}.json"

    with open(filename, "w") as f:
        json.dump({"timestamp": timestamp, "data": data}, f)

    print(f"[{timestamp}] Scraped {len(data)} products")

if __name__ == "__main__":
    import os
    os.makedirs("data", exist_ok=True)

    while True:
        try:
            scrape_prices()
        except Exception as e:
            print(f"Error: {e}")
        time.sleep(3600)  # Run every hour

Updating Your Scraper

# SSH in and pull the latest code
ssh scraper@your-server-ip
cd /home/scraper/my-scraper
git pull origin main
sudo systemctl restart scraper

Tips

Start with the smallest VPS; scraping is usually I/O-bound, not CPU-bound
Use tmux or screen for quick debugging sessions
Set up log rotation to avoid filling your disk
Monitor disk space since scraped data accumulates fast