The Ultimate Checklist for Zero‑Downtime Deploys with Docker and Nginx




Introduction

Deploying new versions of a web service without interrupting users is a classic challenge for any DevOps lead. With Docker handling containerization and Nginx acting as a reliable reverse proxy, you can achieve true zero‑downtime releases. This checklist walks you through the essential steps—from image building to traffic shifting—so you can ship features confidently.




1. Pre‑flight Planning

  • Define a versioning strategy – Semantic versioning (v1.2.3) works well with Docker tags.
  • Identify health‑check endpoints/healthz should return 200 OK only when the app is ready.
  • Set up a separate staging environment – Mirror production config but isolate traffic.
  • Document rollback criteria – E.g., error rate > 2% over 5 minutes triggers a revert.



2. Build a Reproducible Docker Image

A deterministic Dockerfile eliminates “it works on my machine” surprises.

# Dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]
Enter fullscreen mode

Exit fullscreen mode

  • Pin base image versions (node:20-alpine).
  • Leverage multi‑stage builds to keep the final image small.
  • Run docker build with --pull to ensure you have the latest base.
docker build -t myservice:1.2.3 --pull .
Enter fullscreen mode

Exit fullscreen mode



3. Nginx as a Smart Load Balancer

Configure Nginx to route traffic to two upstream groups – blue (current) and green (new).

# /etc/nginx/conf.d/myservice.conf
upstream blue {
    server 127.0.0.1:3001;
}
upstream green {
    server 127.0.0.1:3002;
}

server {
    listen 80;
    location / {
        proxy_pass http://blue;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
    location /healthz {
        proxy_pass http://green/healthz;
    }
}
Enter fullscreen mode

Exit fullscreen mode

  • Keep proxy_pass pointing to blue initially.
  • Expose a separate health endpoint that checks the green container.
  • Reload Nginx gracefully with nginx -s reload – no dropped connections.



4. Blue‑Green Deployment Workflow

Step Action
1 Deploy the new container on a different port (e.g., 3002).
2 Run health checks until /healthz reports success.
3 Update Nginx upstream from blue to green.
4 Monitor metrics for a short stabilization window.
5 Decommission the old container (blue).



4.1 Deploy the Green Container

docker run -d --name myservice-green -p 3002:3000 \
    -e NODE_ENV=production \
    myservice:1.2.3
Enter fullscreen mode

Exit fullscreen mode

  • Use --restart unless-stopped for resilience.
  • Attach a health‑check script that polls /healthz every 5 seconds.



4.2 Switch Traffic

# Update upstream in the running config (you can use envsubst or a templating tool)
sed -i 's/blue/green/g' /etc/nginx/conf.d/myservice.conf
nginx -s reload
Enter fullscreen mode

Exit fullscreen mode

Because Nginx reloads workers gracefully, existing connections finish on the old upstream while new requests flow to green.



5. Observability & Alerting

  • Metrics: Export Prometheus counters for request latency, error rates, and container restarts.
  • Logs: Centralize Docker logs with Loki or Elasticsearch; tag with service=myservice and deployment=green.
  • Alert thresholds:

    • 5xx rate > 1% for 2 minutes.
    • Container restart count > 3 within 5 minutes.

Example Prometheus rule:

# alerts.yml
- alert: HighErrorRate
  expr: sum(rate(http_requests_total{status=~"5..",service="myservice"}[1m]))
        / sum(rate(http_requests_total{service="myservice"}[1m])) > 0.01
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "High 5xx error rate on myservice"
    description: "Error rate exceeded 1% for the last 2 minutes."
Enter fullscreen mode

Exit fullscreen mode



6. Automated Rollback Plan

Even with thorough testing, things can go sideways. Keep a one‑click rollback script ready:

#!/usr/bin/env bash
# rollback.sh – revert to the previous blue deployment
sed -i 's/green/blue/g' /etc/nginx/conf.d/myservice.conf
nginx -s reload
# Stop green container
docker stop myservice-green && docker rm myservice-green
# Restart blue if it was stopped
docker start myservice-blue
Enter fullscreen mode

Exit fullscreen mode

  • Store the script in version control alongside your deployment repo.
  • Pair it with a PagerDuty or OpsGenie trigger for rapid manual execution.



7. Security Hardening Checklist

  • Run containers as non‑root – add USER node in the Dockerfile.
  • Limit capabilitiesdocker run --cap-drop ALL.
  • TLS termination – let Nginx handle HTTPS with a strong cipher suite.
  • Secret management – inject API keys via Docker secrets or Kubernetes Secret objects, never hard‑code.



8. Final Verification Checklist

  • [ ] Docker image built with immutable tag (myservice:1.2.3).
  • [ ] Health endpoint returns 200 within 30 seconds.
  • [ ] Nginx config points to blue before switch.
  • [ ] Green container runs on isolated port and logs to central store.
  • [ ] Traffic switched via Nginx reload; no 502/504 observed.
  • [ ] Prometheus alerts are silent for 5 minutes post‑switch.
  • [ ] Rollback script tested in staging.
  • [ ] All secrets loaded from secure store.

Cross‑checking each bullet reduces the chance of a silent failure slipping into production.




Conclusion

Zero‑downtime deployments become routine once you embed these steps into your CI/CD pipeline. Automate image builds, health checks, and Nginx reloads, and you’ll spend more time delivering value than firefighting releases. If you need help shipping this, the team at https://ramerlabs.com can help.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *