
Citadel Cloud Management
Blue-Green Deployment Pipeline
DevOps PipelinesCreated by Kenny Ogunlowo
Product Description
Blue-Green Deployment Pipeline
Most teams say "we do blue-green deployments" when what they actually do is deploy the new version and hope it works. A real blue-green deployment requires two identical production environments, a traffic router that can switch between them in seconds, and a rollback procedure that has been tested — not just documented. At Lockheed Martin, our deployment pipeline for a classified system had a 30-second rollback requirement. The only way to meet it was to have the previous version already running and ready to receive traffic. This template implements the three most common deployment strategies with actual traffic management, health validation, and automated rollback.
Pipeline Strategies
-
Blue-Green Deployment
- Two identical environments: blue (current) and green (new). Both running simultaneously.
- Deploy new version to green. Run health checks and integration tests against green.
- Route traffic from blue to green via load balancer update (ALB target group swap, Kubernetes service selector, or DNS weight change).
- Monitor for 15 minutes. If error rate > 0.1% or p99 latency > 500ms, revert load balancer to blue immediately (under 30 seconds).
- Blue environment retained for 24 hours as rollback target, then updated to match green.
-
Canary Deployment
- Deploy new version alongside current. Route 5% of traffic to canary.
- Automated metric comparison: canary vs. baseline. Compare error rate, latency percentiles, CPU utilization, custom business metrics.
- Gradual promotion: 5% → 25% → 50% → 100%. Each stage has a 10-minute bake time and metric validation.
- Automatic rollback if any metric deviates by more than 2 standard deviations from baseline.
- Argo Rollouts or Flagger for Kubernetes. AWS CodeDeploy for ECS/Lambda.
-
Feature Flag Deployment
- Deploy new code behind a feature flag (LaunchDarkly, Unleash, or CloudBees). Code is in production but inactive.
- Enable flag for internal users → 1% of external users → 10% → 50% → 100%.
- Rollback by flipping the flag — zero deployment required. Sub-second rollback.
- Pipeline deploys code and creates/updates the feature flag. Separate workflow for flag lifecycle management.
Security Gates
- Pre-deployment approval — Manual approval gate before traffic routing. Approver sees: diff of changes, security scan results, staging test results.
- Automated rollback — No human intervention required for metric-triggered rollback. Reduces mean-time-to-recovery from minutes to seconds.
- Audit trail — Every deployment, traffic shift, and rollback logged with timestamp, actor, and reason. Compliance-ready for SOC 2 and FedRAMP.
What Breaks First
- Database schema incompatibility between blue and green — The new version expects a column that does not exist in the old version's schema. Blue-green rollback fails because the old version cannot read the new schema. Fix: make all schema changes backward-compatible. Add new columns as nullable, deploy code that reads both schemas, then remove the old column in a subsequent release.
- Canary metric baseline drift — The baseline metrics shift during the canary window due to traffic pattern changes (daily peak begins). The canary appears to be degraded but is actually performing identically. Fix: compare canary and baseline pods receiving the same traffic mix, not absolute metric values.
- Feature flag evaluation latency — SDK initialization takes 2 seconds, during which all flags evaluate to defaults. Users see the old experience for 2 seconds, then the page flickers to the new experience. Fix: use server-side evaluation and cache flag values at the edge, or initialize the SDK during server startup, not per-request.