5 Kubernetes Cost Optimization Strategies That Save $50K/Year
Kubernetes has become the default orchestration platform for containerized workloads, but it has also become one of the largest line items on cloud bills. A 2025 CNCF survey found that 68% of organizations running production Kubernetes clusters overspend by 30-45% due to overprovisioned nodes, idle pods, and misconfigured autoscaling. For a mid-size deployment running 50-100 nodes on AWS EKS or Azure AKS, that overspend translates to $40,000-$80,000 annually in wasted compute.
This guide covers five specific, production-tested optimization strategies. These are not theoretical recommendations — they are drawn from FinOps engagements across EKS, AKS, and GKE clusters where the goal was measurable cost reduction without sacrificing application performance or reliability.
[IMAGE: Dashboard showing Kubernetes cluster cost breakdown by namespace, with cost allocation percentages and savings opportunities highlighted in a dark-themed FinOps interface]
Strategy 1: Right-Size Pod Resource Requests and Limits
The single most impactful optimization. Most teams set CPU and memory requests during initial deployment and never revisit them. The result: pods requesting 2 CPU cores while averaging 0.3 cores of actual usage, and requesting 4Gi of memory while using 800Mi.
The Problem in Numbers
A pod requesting 2 CPU / 4Gi memory on an m6i.xlarge node (4 vCPU / 16Gi) consumes half the node's schedulable capacity. If that pod actually uses 0.3 CPU / 800Mi, you are paying for 1.7 CPU and 3.2Gi of memory that sits completely idle — but cannot be scheduled to other pods because the Kubernetes scheduler respects requests, not actual usage.
Implementation with Kubernetes VPA
The Vertical Pod Autoscaler (VPA) in recommendation mode provides data-driven sizing:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Recommendation only — no auto-updates
resourcePolicy:
containerPolicies:
- containerName: api-server
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
Deploy VPA in Off mode first. After 7-14 days of production data collection, review the recommendations:
kubectl describe vpa api-server-vpa -n production
The output provides lowerBound, target, and upperBound recommendations. Set requests to the target value and limits to the upperBound. For latency-sensitive services, use upperBound for both.
Tools for Visibility
- Kubecost (v2.3, open-source tier): Real-time cost allocation by namespace, deployment, and label. Shows per-pod efficiency scores.
- Goldilocks (by Fairwinds): Deploys VPA in recommendation mode across all namespaces and presents a dashboard of right-sizing suggestions.
- kubectl-cost plugin: CLI-based cost reporting directly from your terminal.
Expected savings: 25-40% of compute costs. On a 50-node EKS cluster running m6i.xlarge instances at $0.192/hour, a 30% reduction saves approximately $25,000/year.
Strategy 2: Implement Cluster Autoscaler with Karpenter
Static node pools are the second largest source of waste. Teams provision for peak load and leave those nodes running 24/7, even though most workloads have clear usage patterns — high during business hours, low overnight and weekends.
Karpenter vs Cluster Autoscaler
Karpenter (v1.1, now a CNCF incubating project as of late 2025) replaced the legacy Cluster Autoscaler for AWS EKS and is the recommended approach for new deployments. Key advantages:
-
Instance type flexibility: Karpenter selects the optimal instance type from a pool of candidates based on pending pod requirements. Instead of scaling up a fixed
m6i.xlargenode group, it might provision ac6i.largefor CPU-bound pods or anr6i.largefor memory-bound pods. - Consolidation: Karpenter actively identifies underutilized nodes and reschedules pods to fewer nodes, then terminates the empty ones.
- Speed: Node provisioning in 30-60 seconds vs 3-5 minutes for Cluster Autoscaler.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general-purpose
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m6i.large
- m6i.xlarge
- m7i.large
- m7i.xlarge
- c6i.large
- c6i.xlarge
- r6i.large
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
limits:
cpu: 200
memory: 400Gi
The consolidationPolicy: WhenEmptyOrUnderutilized setting is critical. It tells Karpenter to actively consolidate workloads — moving pods off underutilized nodes and terminating them.
For AKS, use the Node Autoprovision feature (GA since AKS 1.29). For GKE, use GKE Autopilot which handles node management entirely.
Expected savings: 15-25% from dynamic scaling and right-typed instance selection.
Strategy 3: Use Spot/Preemptible Instances for Fault-Tolerant Workloads
Spot instances on AWS cost 60-90% less than on-demand pricing. Azure Spot VMs and GCP Preemptible/Spot VMs offer comparable discounts. The tradeoff: the cloud provider can reclaim these instances with 2 minutes notice (AWS) or 30 seconds (GCP).
Which Workloads Qualify
Not every workload belongs on spot. The decision matrix:
| Workload Type | Spot Suitable | Reason |
|---|---|---|
| Stateless API replicas (3+ pods) | Yes | Loss of one pod is handled by remaining replicas |
| Batch/ETL jobs (with checkpointing) | Yes | Can resume from checkpoint after interruption |
| CI/CD build agents | Yes | Build can retry on a new node |
| ML training (with checkpointing) | Yes | Save model checkpoints every N epochs |
| Single-replica databases | No | Data loss risk on interruption |
| Stateful singleton services | No | Cannot tolerate interruption |
Implementation with Karpenter
Add a separate NodePool for spot capacity:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-pool
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m6i.large
- m6i.xlarge
- m7i.large
- c6i.large
- c6i.xlarge
- r6i.large
- m6a.large
- m6a.xlarge
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: spot-class
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
Diversify instance types aggressively for spot pools. AWS spot pricing and availability vary by instance type and AZ. Specifying 8-12 instance types across 3 AZs reduces interruption frequency from ~5% to under 1% monthly.
Use pod topology spread constraints to ensure replicas distribute across spot and on-demand nodes:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: karpenter.sh/capacity-type
whenUnsatisfiable: DoNotSchedule
[IMAGE: Architecture diagram showing a Kubernetes cluster with on-demand nodes hosting stateful workloads and spot nodes hosting stateless API replicas and batch jobs, with Karpenter managing node lifecycle]
Expected savings: 60-70% on spot-eligible workloads. If 40% of your cluster runs on spot, overall savings reach 24-28%.
Strategy 4: Enforce Resource Quotas and LimitRanges per Namespace
Without guardrails, development and staging namespaces consume production-grade resources. A developer testing a new service might deploy with requests: cpu: 4, memory: 16Gi in a dev namespace and forget about it for weeks.
Namespace-Level Controls
Apply ResourceQuotas to every non-production namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: development
spec:
hard:
requests.cpu: "8"
requests.memory: 16Gi
limits.cpu: "16"
limits.memory: 32Gi
pods: "30"
persistentvolumeclaims: "10"
Apply LimitRanges to set default requests for pods that omit them:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: development
spec:
limits:
- default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 100m
memory: 128Mi
type: Container
Scheduled Scaling for Non-Production
Use CronJobs or KEDA (v2.15) to scale non-production workloads to zero outside business hours:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: dev-api-scaler
namespace: development
spec:
scaleTargetRef:
name: dev-api
minReplicaCount: 0
maxReplicaCount: 3
triggers:
- type: cron
metadata:
timezone: America/Chicago
start: "0 8 * * 1-5" # Scale up Mon-Fri 8am
end: "0 20 * * 1-5" # Scale down Mon-Fri 8pm
desiredReplicas: "2"
Running dev/staging clusters only during business hours (60 hours/week vs 168) reduces non-production compute costs by 64%.
Expected savings: $5,000-$15,000/year depending on non-production cluster size.
Strategy 5: Optimize Persistent Volume and Network Costs
Storage and data transfer are often overlooked because they represent smaller individual line items — but they accumulate. Common waste patterns:
Storage Optimization
-
Unused PVCs: Volumes from deleted pods often persist. Run
kubectl get pvc --all-namespaces | grep -v Boundweekly to find orphaned claims. -
Overprovisioned volumes: A 100Gi
gp3EBS volume costs $0.08/Gi/month ($8/month). If actual usage is 12Gi, switch to a 20Gi volume and save $6.40/month per volume. Across 50 volumes, that is $3,840/year. -
Storage class selection: Use
gp3(notgp2) on AWS — gp3 provides 3,000 IOPS baseline at 20% lower cost. On GKE, usepd-balancedinstead ofpd-ssdfor workloads that do not need sustained high IOPS.
Network Cost Reduction
-
Keep traffic in-zone: Cross-AZ data transfer on AWS costs $0.01/GB in each direction. A service making 10,000 requests/second with 1KB payloads to a database in another AZ costs $518/month. Use topology-aware routing (Kubernetes 1.27+
TopologyAwareHints). - Use internal load balancers: External ALBs/NLBs have hourly costs plus data processing charges. Internal services should use ClusterIP or internal NLBs.
- Enable VPC CNI prefix delegation on EKS to increase pod density per node, reducing the total number of nodes needed.
Expected savings: $3,000-$8,000/year from storage and network optimization.
Putting It All Together: The Savings Math
For a reference architecture of 50 nodes running m6i.xlarge ($0.192/hr) on AWS EKS:
| Strategy | Annual Savings |
|---|---|
| Right-sizing pod resources | $25,000 |
| Karpenter autoscaling + consolidation | $12,500 |
| Spot instances (40% of fleet) | $20,000 |
| Resource quotas + scheduled scaling | $8,000 |
| Storage + network optimization | $5,000 |
| Total | $70,500 |
Conservative estimates. Actual savings depend on current waste levels — teams with no existing optimization often see higher returns.
FinOps Tooling Stack for Kubernetes
| Tool | Purpose | Pricing |
|---|---|---|
| Kubecost v2.3 | Real-time K8s cost allocation | Free tier / Enterprise |
| OpenCost | CNCF cost monitoring standard | Open source |
| Karpenter v1.1 | Node lifecycle and spot management | Open source |
| Goldilocks | VPA-based right-sizing dashboard | Open source |
| KEDA v2.15 | Event-driven autoscaling | Open source |
| Infracost | IaC cost estimation pre-deploy | Free tier / Team |
Getting Started
Kubernetes cost optimization is a core competency for any cloud engineer managing production infrastructure. The strategies above — right-sizing, autoscaling, spot instances, quotas, and storage optimization — apply regardless of whether you run EKS, AKS, or GKE.
Citadel Cloud Management offers comprehensive cloud courses covering Kubernetes administration, FinOps practices, and production cluster management. Our Cloud Toolkits collection includes Terraform modules for deploying cost-optimized EKS and AKS clusters with Karpenter and Kubecost pre-configured. For hands-on labs and real-world scenarios, explore our free resources — no payment required.
Ready to build production-grade Kubernetes skills and stop overpaying for cloud infrastructure? Enroll free at Citadel Cloud Management and start learning today.
Kubernetes #FinOps #CloudCostOptimization #DevOps #EKS #AKS #GKE #Karpenter #CloudEngineering #InfrastructureAsCode
Continue Learning
Start Your Cloud Career Today
Access 17 free courses covering AWS, Azure, GCP, DevOps, AI/ML, and cloud security — built by a practicing Senior Cloud Architect with enterprise experience.
Get Free Cloud Career Resources