docker-compose for Local Full-Stack Dev
Production scrapers depend on Postgres, Redis, Mongo, Loki. docker-compose runs the whole stack locally so your dev environment matches prod.
What you’ll learn
- Write a docker-compose.yml for a scraper + dependencies.
- Use healthchecks and depends_on for correct startup order.
- Distinguish dev-only overrides from base config.
A modern scraper is one binary plus ~5 dependencies: a database, a queue, a metrics store, possibly a log aggregator, possibly a proxy management service. docker-compose runs them all locally with one command.
Why bother
The alternative, installing Postgres, Redis, Loki on your laptop, runs into version drift (you have Postgres 14, prod is 16), port collisions, and "works on Maria's machine" stories. Docker-compose:
- Runs the same versions as prod.
- Tears down cleanly (
docker compose down -v). - Lets new team members spin up the full stack in 5 minutes.
A complete scraper stack
docker-compose.yml:
services:
postgres:
image: postgres:16
environment:
POSTGRES_DB: scraping
POSTGRES_USER: scraping
POSTGRES_PASSWORD: dev
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U scraping"]
interval: 5s
timeout: 3s
retries: 10
redis:
image: redis:7-alpine
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 10
prometheus:
image: prom/prometheus:latest
volumes:
- ./ops/prometheus.yml:/etc/prometheus/prometheus.yml:ro
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: Admin
ports:
- "3000:3000"
volumes:
- grafana:/var/lib/grafana
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
scraper:
build: .
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
environment:
DATABASE_URL: postgresql://scraping:dev@postgres:5432/scraping
REDIS_URL: redis://redis:6379
volumes:
- .:/app
command: ["python", "-m", "scraper.run"]
volumes:
pgdata:
grafana:
Five real services + your scraper, in 50 lines. docker compose up and you have a complete local stack.
Healthchecks and depends_on
The line that matters:
depends_on:
postgres:
condition: service_healthy
Without this, scraper starts before postgres is ready and crashes on first connect. With it, compose waits for the Postgres healthcheck (pg_isready) to pass before starting scraper.
Define healthchecks on every service that has dependents. Without them, depends_on only waits for container-start (not for the service inside to be ready).
Service names = hostnames
Inside the compose network, services resolve by name. redis://redis:6379 works because redis is the service name. No need for localhost or external IPs.
Dev-only overrides
You want different settings in dev vs CI vs prod. Compose supports overlays.
Base docker-compose.yml: production-ish defaults.
docker-compose.override.yml: dev-only changes (auto-merged when present).
# docker-compose.override.yml, auto-loaded only locally
services:
scraper:
environment:
LOG_LEVEL: DEBUG
PYTHONBREAKPOINT: ipdb.set_trace
volumes:
- .:/app # live-reload code
command: ["python", "-m", "scraper.run", "--debug"]
Run docker compose up, both files merge automatically. For CI, use docker compose -f docker-compose.yml up (no override).
Profiles for optional services
Some services (Mongo, ClickHouse) you only want sometimes. Use profiles:
services:
mongodb:
image: mongo:7
profiles: ["mongo"]
ports: ["27017:27017"]
clickhouse:
image: clickhouse/clickhouse-server:latest
profiles: ["analytics"]
ports: ["8123:8123"]
docker compose up → start nothing from profiles.
docker compose --profile mongo --profile analytics up → start everything including the optional ones.
Persistent volumes vs ephemeral
volumes:
pgdata: # named volume, persists across `up`/`down`
docker compose down keeps named volumes. docker compose down -v deletes them. In dev, you'll regularly wipe to start fresh; never run -v in prod.
For dev convenience, sometimes you want non-persistent (each up starts clean):
postgres:
image: postgres:16
tmpfs:
- /var/lib/postgresql/data # RAM-backed, fast, ephemeral
Postgres in tmpfs is wildly fast for tests.
docker-compose vs Kubernetes
| Concern | docker-compose | Kubernetes |
|---|---|---|
| Local dev | First-class | Possible but heavy |
| Production | Possible (small / single-node) | First-class |
| Restart policy | Yes | Yes (more nuanced) |
| Scaling replicas | --scale flag, simple |
First-class |
| Service discovery | Service name | DNS + Services |
| Secrets | Files / env / Docker secrets | Kubernetes Secrets |
Use docker-compose for dev and any tiny production deployment (one VPS, a few services). Use Kubernetes when you have multiple hosts and want declarative scaling.
Common pitfalls
- Forgetting to map ports.
ports: ["5432:5432"]exposes to your host; without it, only sibling services can reach it. - Mounting your code over the image's installed code in dev. Convenient for live-reload, surprising in CI where you forget the override and run stale code.
- Healthcheck too strict in startup. Postgres takes 5–30s on first run.
start_period: 30sgives it slack.
What to try
Replace your manual docker run postgres ... workflow with a docker-compose.yml. Add Prometheus + Grafana too. Then docker compose down -v and docker compose up. You should have a complete scraper stack running in under 60 seconds, every time, identically.
Quiz, check your understanding
Pass mark is 70%. Pick the best answer; you’ll see the explanation right after.