
Citadel Cloud Management
Zero-Downtime Migration Architecture Blueprint
Architecture BlueprintsCreated by Kenny Ogunlowo
Product Description
The Problem This Blueprint Solves
You are migrating a production workload — maybe from on-premises VMware to AWS, from EC2 Classic to a modern VPC, or from a monolith to containers — and the business says you cannot have a maintenance window. Every hour of downtime costs $47,000 in lost revenue and you have contractual SLA obligations. Your team has never done a zero-downtime cutover at this scale.
This blueprint documents the migration pattern I used to move a 14TB PostgreSQL database and 38 microservices from a colocation facility to AWS for an energy sector client — with zero seconds of user-facing downtime across a 72-hour cutover window.
What You Get
- Architecture diagrams — Blue-green deployment topology, DNS cutover flow, database replication pipeline, and rollback decision tree (Draw.io and PNG)
- Terraform modules — Parallel environment provisioning, Route 53 weighted routing policies, ALB target group switching, and RDS read replica promotion
- Migration runbook — 47-step checklist with go/no-go decision points, rollback triggers, and communication templates
-
Database cutover playbook —
pg_logicalreplication setup, lag monitoring queries, promotion sequence, and connection string rotation
Key Architecture Decisions
- Blue-Green over Canary for the cutover — Canary migrations leave you running two environments for weeks. Blue-green gives you a clean cut: the old environment stays warm for 48 hours, then you decommission. Total parallel run cost is bounded.
- DNS-based routing over load balancer switching — Route 53 weighted records with health checks let you shift traffic in 10% increments. If the green environment shows elevated error rates, you shift back in under 60 seconds without touching infrastructure.
-
Logical replication over physical for PostgreSQL —
pg_logicallets you replicate specific schemas, filter tables, and run different PostgreSQL major versions between source and target. Physical replication requires version parity and replicates everything, including the data you are trying to leave behind.
Who This Blueprint Is For
- Database Administrators planning their first major cloud migration
- Migration leads responsible for moving production workloads with contractual SLA requirements
- Platform Engineers building repeatable migration patterns for multiple teams
- Engineering Managers who need to present a migration timeline with concrete risk mitigation to leadership
Your First 48 Hours
Start by running the Terraform modules to provision the green environment in a non-production account. Set up pg_logical replication from a database snapshot (not production — not yet). Validate that the replication lag stays under 500ms during a synthetic write load. On day two, deploy the Route 53 weighted routing configuration and practice shifting traffic between two ALBs using the provided shell scripts. You want muscle memory on the cutover procedure before touching production.
Limitations and Trade-offs
Logical replication does not replicate DDL changes. If your application runs schema migrations during the cutover window, you must coordinate those manually. The blueprint assumes PostgreSQL 14+ — older versions have limited pg_logical support. Blue-green parallel environments double your infrastructure cost during the migration window; budget for 72-96 hours of dual-run costs. Sequence and large object replication require additional configuration not covered in the base modules.