
Citadel Cloud Management
Cost-Optimized Architecture FinOps Blueprint
Architecture BlueprintsCreated by Kenny Ogunlowo
Product Description
The Problem This Blueprint Solves
Your AWS bill grew from $12,000 to $67,000 per month in 8 months and nobody can explain why. Cost Explorer shows the top-level numbers but not the engineering decisions driving them. Three teams run oversized EC2 instances "just in case," 400GB of EBS snapshots have no retention policy, and a forgotten Redshift cluster in us-west-2 has been running for 11 months with zero queries. Your CFO wants unit economics (cost per transaction, cost per customer) and all you have is a monthly invoice.
This blueprint is the FinOps framework I implemented at a Series C SaaS company that reduced monthly cloud spend from $89,000 to $41,000 (54% reduction) within 90 days while handling 40% more traffic — by fixing visibility, rightsizing, and implementing automated cost governance.
What You Get
- Architecture diagrams — Cost allocation hierarchy, tagging enforcement pipeline, budget alerting flow, rightsizing automation architecture, and RI/Savings Plan coverage dashboard (Draw.io)
- Terraform modules — AWS Organizations tag policies, Cost Anomaly Detection monitors, Budget actions with automated SNS alerts, Trusted Advisor integration, and Lambda functions for automated rightsizing recommendations
- FinOps operating model — Team cost ownership RACI matrix, monthly cost review meeting agenda template, unit economics calculation methodology, and RI/Savings Plan purchasing decision framework
- Cost optimization playbook — Top 20 cost reduction patterns (with estimated savings per pattern), implementation priority matrix, and ROI calculation templates
Key Architecture Decisions
- Tagging enforcement via SCP over best-effort guidelines — "Please tag your resources" policies achieve 30% compliance. Service Control Policies that deny resource creation without mandatory tags achieve 100% compliance. The blueprint enforces 8 mandatory tags (team, environment, project, cost-center, owner, application, tier, data-classification) at the Organization level.
- Cost Anomaly Detection over monthly bill review — Monthly reviews catch cost spikes 30 days late. Cost Anomaly Detection uses ML to identify unusual spending patterns and alerts within hours. A forgotten load test that would add $3,000 to your bill is caught on day one, not day 30.
- Compute Savings Plans over Reserved Instances for flexibility — RIs lock you into specific instance types and regions. Compute Savings Plans cover any instance family, size, OS, tenancy, and region. When you rightsize from m5.2xlarge to m6i.xlarge, your Savings Plan still applies. RIs would not.
- Automated shutdown for non-production environments — Development and staging environments run 8 hours per day, 5 days per week. Lambda functions stop EC2 instances, scale down ECS services, and pause RDS instances outside business hours. This alone saves 75% on non-production compute — typically the single largest cost optimization.
Who This Blueprint Is For
- FinOps practitioners implementing cloud cost governance for the first time
- Engineering Managers responsible for team cloud budgets without visibility into cost drivers
- CFOs who need unit economics (cost per customer, cost per transaction) from cloud infrastructure
- Platform Engineers building automated cost optimization into infrastructure pipelines
Your First 48 Hours
Deploy the tag policy and Cost Anomaly Detection Terraform modules into your management account. Run the included tagging audit script to identify all untagged resources and their estimated monthly cost. On day two, deploy the non-production shutdown Lambda and configure it for one development environment. Calculate the projected monthly savings (hours off * hourly cost) and present it to your team as a quick win. This builds organizational buy-in for the larger FinOps program.
Limitations and Trade-offs
Tag enforcement via SCPs blocks resource creation, which can break CI/CD pipelines that do not include tags in their Terraform or CloudFormation templates. Roll out tag enforcement gradually — start in "audit" mode, fix existing resources, then switch to "deny." Savings Plans require a 1-year or 3-year commitment; over-committing locks in costs even if you optimize. The blueprint includes a coverage calculator to recommend safe commitment levels (typically 60-70% of baseline). Cost Anomaly Detection has a 24-hour detection delay for some services.