Deploying Scrapers to a VPS (DigitalOcean, Vultr)
Step-by-step guide to deploying your Python web scraper to a VPS on DigitalOcean or Vultr for 24/7 operation.
Deployment · #1beginner3 min read
A VPS (Virtual Private Server) is the simplest way to run your scraper 24/7. For $5-10/month you get a dedicated server with full control.
Choosing a Provider
| Provider | Cheapest Plan | CPU | RAM | Best For |
|---|---|---|---|---|
| DigitalOcean | $4/month | 1 vCPU | 512MB | Beginners, good docs |
| Vultr | $2.50/month | 1 vCPU | 512MB | Budget-friendly |
| Hetzner | $3.79/month | 2 vCPU | 2GB | Best value in EU |
Initial Server Setup
After creating your VPS, SSH in and set up the environment:
# Connect to your server
ssh root@your-server-ip
# Update packages
apt update && apt upgrade -y
# Install Python and pip
apt install python3 python3-pip python3-venv git -y
# Create a non-root user for running scrapers
adduser scraper
usermod -aG sudo scraper
su - scraper
Deploying Your Scraper
# As the scraper user
cd /home/scraper
# Clone your scraper project
git clone https://github.com/yourusername/my-scraper.git
cd my-scraper
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
Running as a Systemd Service
Create a systemd service so your scraper starts on boot and restarts on failure:
sudo nano /etc/systemd/system/scraper.service
[Unit]
Description=Web Scraper Service
After=network.target
[Service]
Type=simple
User=scraper
WorkingDirectory=/home/scraper/my-scraper
ExecStart=/home/scraper/my-scraper/venv/bin/python main.py
Restart=on-failure
RestartSec=30
Environment=PYTHONUNBUFFERED=1
[Install]
WantedBy=multi-user.target
# Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable scraper
sudo systemctl start scraper
# Check status
sudo systemctl status scraper
# View logs
sudo journalctl -u scraper -f
A Simple Scraper to Deploy
# main.py
import requests
import time
import json
from datetime import datetime
def scrape_prices():
url = "https://api.example.com/products"
response = requests.get(url, timeout=30)
data = response.json()
timestamp = datetime.now().isoformat()
filename = f"data/prices_{timestamp[:10]}.json"
with open(filename, "w") as f:
json.dump({"timestamp": timestamp, "data": data}, f)
print(f"[{timestamp}] Scraped {len(data)} products")
if __name__ == "__main__":
import os
os.makedirs("data", exist_ok=True)
while True:
try:
scrape_prices()
except Exception as e:
print(f"Error: {e}")
time.sleep(3600) # Run every hour
Updating Your Scraper
# SSH in and pull the latest code
ssh scraper@your-server-ip
cd /home/scraper/my-scraper
git pull origin main
sudo systemctl restart scraper
Tips
- Start with the smallest VPS; scraping is usually I/O-bound, not CPU-bound
- Use
tmuxorscreenfor quick debugging sessions - Set up log rotation to avoid filling your disk
- Monitor disk space since scraped data accumulates fast