Scraping Central is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

4.63intermediate4 min read

docker-compose for Local Full-Stack Dev

Production scrapers depend on Postgres, Redis, Mongo, Loki. docker-compose runs the whole stack locally so your dev environment matches prod.

What you’ll learn

  • Write a docker-compose.yml for a scraper + dependencies.
  • Use healthchecks and depends_on for correct startup order.
  • Distinguish dev-only overrides from base config.

A modern scraper is one binary plus ~5 dependencies: a database, a queue, a metrics store, possibly a log aggregator, possibly a proxy management service. docker-compose runs them all locally with one command.

Why bother

The alternative, installing Postgres, Redis, Loki on your laptop, runs into version drift (you have Postgres 14, prod is 16), port collisions, and "works on Maria's machine" stories. Docker-compose:

  • Runs the same versions as prod.
  • Tears down cleanly (docker compose down -v).
  • Lets new team members spin up the full stack in 5 minutes.

A complete scraper stack

docker-compose.yml:

services:
  postgres:
  image: postgres:16
  environment:
  POSTGRES_DB: scraping
  POSTGRES_USER: scraping
  POSTGRES_PASSWORD: dev
  ports:
  - "5432:5432"
  volumes:
  - pgdata:/var/lib/postgresql/data
  healthcheck:
  test: ["CMD-SHELL", "pg_isready -U scraping"]
  interval: 5s
  timeout: 3s
  retries: 10

  redis:
  image: redis:7-alpine
  ports:
  - "6379:6379"
  healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 5s
  timeout: 3s
  retries: 10

  prometheus:
  image: prom/prometheus:latest
  volumes:
  - ./ops/prometheus.yml:/etc/prometheus/prometheus.yml:ro
  ports:
  - "9090:9090"

  grafana:
  image: grafana/grafana:latest
  environment:
  GF_AUTH_ANONYMOUS_ENABLED: "true"
  GF_AUTH_ANONYMOUS_ORG_ROLE: Admin
  ports:
  - "3000:3000"
  volumes:
  - grafana:/var/lib/grafana

  loki:
  image: grafana/loki:latest
  ports:
  - "3100:3100"

  scraper:
  build: .
  depends_on:
  postgres:
  condition: service_healthy
  redis:
  condition: service_healthy
  environment:
  DATABASE_URL: postgresql://scraping:dev@postgres:5432/scraping
  REDIS_URL: redis://redis:6379
  volumes:
  - .:/app
  command: ["python", "-m", "scraper.run"]

volumes:
  pgdata:
  grafana:

Five real services + your scraper, in 50 lines. docker compose up and you have a complete local stack.

Healthchecks and depends_on

The line that matters:

depends_on:
  postgres:
  condition: service_healthy

Without this, scraper starts before postgres is ready and crashes on first connect. With it, compose waits for the Postgres healthcheck (pg_isready) to pass before starting scraper.

Define healthchecks on every service that has dependents. Without them, depends_on only waits for container-start (not for the service inside to be ready).

Service names = hostnames

Inside the compose network, services resolve by name. redis://redis:6379 works because redis is the service name. No need for localhost or external IPs.

Dev-only overrides

You want different settings in dev vs CI vs prod. Compose supports overlays.

Base docker-compose.yml: production-ish defaults. docker-compose.override.yml: dev-only changes (auto-merged when present).

# docker-compose.override.yml, auto-loaded only locally
services:
  scraper:
  environment:
  LOG_LEVEL: DEBUG
  PYTHONBREAKPOINT: ipdb.set_trace
  volumes:
  - .:/app  # live-reload code
  command: ["python", "-m", "scraper.run", "--debug"]

Run docker compose up, both files merge automatically. For CI, use docker compose -f docker-compose.yml up (no override).

Profiles for optional services

Some services (Mongo, ClickHouse) you only want sometimes. Use profiles:

services:
  mongodb:
  image: mongo:7
  profiles: ["mongo"]
  ports: ["27017:27017"]

  clickhouse:
  image: clickhouse/clickhouse-server:latest
  profiles: ["analytics"]
  ports: ["8123:8123"]

docker compose up → start nothing from profiles. docker compose --profile mongo --profile analytics up → start everything including the optional ones.

Persistent volumes vs ephemeral

volumes:
  pgdata:  # named volume, persists across `up`/`down`

docker compose down keeps named volumes. docker compose down -v deletes them. In dev, you'll regularly wipe to start fresh; never run -v in prod.

For dev convenience, sometimes you want non-persistent (each up starts clean):

postgres:
  image: postgres:16
  tmpfs:
  - /var/lib/postgresql/data  # RAM-backed, fast, ephemeral

Postgres in tmpfs is wildly fast for tests.

docker-compose vs Kubernetes

Concern docker-compose Kubernetes
Local dev First-class Possible but heavy
Production Possible (small / single-node) First-class
Restart policy Yes Yes (more nuanced)
Scaling replicas --scale flag, simple First-class
Service discovery Service name DNS + Services
Secrets Files / env / Docker secrets Kubernetes Secrets

Use docker-compose for dev and any tiny production deployment (one VPS, a few services). Use Kubernetes when you have multiple hosts and want declarative scaling.

Common pitfalls

  • Forgetting to map ports. ports: ["5432:5432"] exposes to your host; without it, only sibling services can reach it.
  • Mounting your code over the image's installed code in dev. Convenient for live-reload, surprising in CI where you forget the override and run stale code.
  • Healthcheck too strict in startup. Postgres takes 5–30s on first run. start_period: 30s gives it slack.

What to try

Replace your manual docker run postgres ... workflow with a docker-compose.yml. Add Prometheus + Grafana too. Then docker compose down -v and docker compose up. You should have a complete scraper stack running in under 60 seconds, every time, identically.

Quiz, check your understanding

Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.

docker-compose for Local Full-Stack Dev1 / 8

Inside a docker-compose network, how does the scraper service reach Postgres?

Score so far: 0 / 0