5 Kubernetes Cost Optimization Strategies That Save $50K/Year

Practical Kubernetes cost optimization strategies using FinOps, rightsizing, spot nodes, autoscaling, and resource quotas. Save $50K+ annually on K8s clusters.

5 Kubernetes Cost Optimization Strategies That Save $50K/Year

Kubernetes has become the default orchestration platform for containerized workloads, but it has also become one of the largest line items on cloud bills. A 2025 CNCF survey found that 68% of organizations running production Kubernetes clusters overspend by 30-45% due to overprovisioned nodes, idle pods, and misconfigured autoscaling. For a mid-size deployment running 50-100 nodes on AWS EKS or Azure AKS, that overspend translates to $40,000-$80,000 annually in wasted compute.

This guide covers five specific, production-tested optimization strategies. These are not theoretical recommendations — they are drawn from FinOps engagements across EKS, AKS, and GKE clusters where the goal was measurable cost reduction without sacrificing application performance or reliability.

[IMAGE: Dashboard showing Kubernetes cluster cost breakdown by namespace, with cost allocation percentages and savings opportunities highlighted in a dark-themed FinOps interface]

Strategy 1: Right-Size Pod Resource Requests and Limits

The single most impactful optimization. Most teams set CPU and memory requests during initial deployment and never revisit them. The result: pods requesting 2 CPU cores while averaging 0.3 cores of actual usage, and requesting 4Gi of memory while using 800Mi.

The Problem in Numbers

A pod requesting 2 CPU / 4Gi memory on an m6i.xlarge node (4 vCPU / 16Gi) consumes half the node's schedulable capacity. If that pod actually uses 0.3 CPU / 800Mi, you are paying for 1.7 CPU and 3.2Gi of memory that sits completely idle — but cannot be scheduled to other pods because the Kubernetes scheduler respects requests, not actual usage.

Implementation with Kubernetes VPA

The Vertical Pod Autoscaler (VPA) in recommendation mode provides data-driven sizing:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Recommendation only — no auto-updates
  resourcePolicy:
    containerPolicies:
    - containerName: api-server
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi

Deploy VPA in Off mode first. After 7-14 days of production data collection, review the recommendations:

kubectl describe vpa api-server-vpa -n production

The output provides lowerBound, target, and upperBound recommendations. Set requests to the target value and limits to the upperBound. For latency-sensitive services, use upperBound for both.

Tools for Visibility

  • Kubecost (v2.3, open-source tier): Real-time cost allocation by namespace, deployment, and label. Shows per-pod efficiency scores.
  • Goldilocks (by Fairwinds): Deploys VPA in recommendation mode across all namespaces and presents a dashboard of right-sizing suggestions.
  • kubectl-cost plugin: CLI-based cost reporting directly from your terminal.

Expected savings: 25-40% of compute costs. On a 50-node EKS cluster running m6i.xlarge instances at $0.192/hour, a 30% reduction saves approximately $25,000/year.

Strategy 2: Implement Cluster Autoscaler with Karpenter

Static node pools are the second largest source of waste. Teams provision for peak load and leave those nodes running 24/7, even though most workloads have clear usage patterns — high during business hours, low overnight and weekends.

Karpenter vs Cluster Autoscaler

Karpenter (v1.1, now a CNCF incubating project as of late 2025) replaced the legacy Cluster Autoscaler for AWS EKS and is the recommended approach for new deployments. Key advantages:

  • Instance type flexibility: Karpenter selects the optimal instance type from a pool of candidates based on pending pod requirements. Instead of scaling up a fixed m6i.xlarge node group, it might provision a c6i.large for CPU-bound pods or an r6i.large for memory-bound pods.
  • Consolidation: Karpenter actively identifies underutilized nodes and reschedules pods to fewer nodes, then terminates the empty ones.
  • Speed: Node provisioning in 30-60 seconds vs 3-5 minutes for Cluster Autoscaler.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m6i.large
            - m6i.xlarge
            - m7i.large
            - m7i.xlarge
            - c6i.large
            - c6i.xlarge
            - r6i.large
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-east-1a", "us-east-1b", "us-east-1c"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
  limits:
    cpu: 200
    memory: 400Gi

The consolidationPolicy: WhenEmptyOrUnderutilized setting is critical. It tells Karpenter to actively consolidate workloads — moving pods off underutilized nodes and terminating them.

For AKS, use the Node Autoprovision feature (GA since AKS 1.29). For GKE, use GKE Autopilot which handles node management entirely.

Expected savings: 15-25% from dynamic scaling and right-typed instance selection.

Strategy 3: Use Spot/Preemptible Instances for Fault-Tolerant Workloads

Spot instances on AWS cost 60-90% less than on-demand pricing. Azure Spot VMs and GCP Preemptible/Spot VMs offer comparable discounts. The tradeoff: the cloud provider can reclaim these instances with 2 minutes notice (AWS) or 30 seconds (GCP).

Which Workloads Qualify

Not every workload belongs on spot. The decision matrix:

Workload Type Spot Suitable Reason
Stateless API replicas (3+ pods) Yes Loss of one pod is handled by remaining replicas
Batch/ETL jobs (with checkpointing) Yes Can resume from checkpoint after interruption
CI/CD build agents Yes Build can retry on a new node
ML training (with checkpointing) Yes Save model checkpoints every N epochs
Single-replica databases No Data loss risk on interruption
Stateful singleton services No Cannot tolerate interruption

Implementation with Karpenter

Add a separate NodePool for spot capacity:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-pool
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m6i.large
            - m6i.xlarge
            - m7i.large
            - c6i.large
            - c6i.xlarge
            - r6i.large
            - m6a.large
            - m6a.xlarge
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: spot-class
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized

Diversify instance types aggressively for spot pools. AWS spot pricing and availability vary by instance type and AZ. Specifying 8-12 instance types across 3 AZs reduces interruption frequency from ~5% to under 1% monthly.

Use pod topology spread constraints to ensure replicas distribute across spot and on-demand nodes:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: karpenter.sh/capacity-type
    whenUnsatisfiable: DoNotSchedule

[IMAGE: Architecture diagram showing a Kubernetes cluster with on-demand nodes hosting stateful workloads and spot nodes hosting stateless API replicas and batch jobs, with Karpenter managing node lifecycle]

Expected savings: 60-70% on spot-eligible workloads. If 40% of your cluster runs on spot, overall savings reach 24-28%.

Strategy 4: Enforce Resource Quotas and LimitRanges per Namespace

Without guardrails, development and staging namespaces consume production-grade resources. A developer testing a new service might deploy with requests: cpu: 4, memory: 16Gi in a dev namespace and forget about it for weeks.

Namespace-Level Controls

Apply ResourceQuotas to every non-production namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: development
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
    pods: "30"
    persistentvolumeclaims: "10"

Apply LimitRanges to set default requests for pods that omit them:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: development
spec:
  limits:
  - default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    type: Container

Scheduled Scaling for Non-Production

Use CronJobs or KEDA (v2.15) to scale non-production workloads to zero outside business hours:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dev-api-scaler
  namespace: development
spec:
  scaleTargetRef:
    name: dev-api
  minReplicaCount: 0
  maxReplicaCount: 3
  triggers:
  - type: cron
    metadata:
      timezone: America/Chicago
      start: "0 8 * * 1-5"   # Scale up Mon-Fri 8am
      end: "0 20 * * 1-5"     # Scale down Mon-Fri 8pm
      desiredReplicas: "2"

Running dev/staging clusters only during business hours (60 hours/week vs 168) reduces non-production compute costs by 64%.

Expected savings: $5,000-$15,000/year depending on non-production cluster size.

Strategy 5: Optimize Persistent Volume and Network Costs

Storage and data transfer are often overlooked because they represent smaller individual line items — but they accumulate. Common waste patterns:

Storage Optimization

  • Unused PVCs: Volumes from deleted pods often persist. Run kubectl get pvc --all-namespaces | grep -v Bound weekly to find orphaned claims.
  • Overprovisioned volumes: A 100Gi gp3 EBS volume costs $0.08/Gi/month ($8/month). If actual usage is 12Gi, switch to a 20Gi volume and save $6.40/month per volume. Across 50 volumes, that is $3,840/year.
  • Storage class selection: Use gp3 (not gp2) on AWS — gp3 provides 3,000 IOPS baseline at 20% lower cost. On GKE, use pd-balanced instead of pd-ssd for workloads that do not need sustained high IOPS.

Network Cost Reduction

  • Keep traffic in-zone: Cross-AZ data transfer on AWS costs $0.01/GB in each direction. A service making 10,000 requests/second with 1KB payloads to a database in another AZ costs $518/month. Use topology-aware routing (Kubernetes 1.27+ TopologyAwareHints).
  • Use internal load balancers: External ALBs/NLBs have hourly costs plus data processing charges. Internal services should use ClusterIP or internal NLBs.
  • Enable VPC CNI prefix delegation on EKS to increase pod density per node, reducing the total number of nodes needed.

Expected savings: $3,000-$8,000/year from storage and network optimization.

Putting It All Together: The Savings Math

For a reference architecture of 50 nodes running m6i.xlarge ($0.192/hr) on AWS EKS:

Strategy Annual Savings
Right-sizing pod resources $25,000
Karpenter autoscaling + consolidation $12,500
Spot instances (40% of fleet) $20,000
Resource quotas + scheduled scaling $8,000
Storage + network optimization $5,000
Total $70,500

Conservative estimates. Actual savings depend on current waste levels — teams with no existing optimization often see higher returns.

FinOps Tooling Stack for Kubernetes

Tool Purpose Pricing
Kubecost v2.3 Real-time K8s cost allocation Free tier / Enterprise
OpenCost CNCF cost monitoring standard Open source
Karpenter v1.1 Node lifecycle and spot management Open source
Goldilocks VPA-based right-sizing dashboard Open source
KEDA v2.15 Event-driven autoscaling Open source
Infracost IaC cost estimation pre-deploy Free tier / Team

Getting Started

Kubernetes cost optimization is a core competency for any cloud engineer managing production infrastructure. The strategies above — right-sizing, autoscaling, spot instances, quotas, and storage optimization — apply regardless of whether you run EKS, AKS, or GKE.

Citadel Cloud Management offers comprehensive cloud courses covering Kubernetes administration, FinOps practices, and production cluster management. Our Cloud Toolkits collection includes Terraform modules for deploying cost-optimized EKS and AKS clusters with Karpenter and Kubecost pre-configured. For hands-on labs and real-world scenarios, explore our free resources — no payment required.

Ready to build production-grade Kubernetes skills and stop overpaying for cloud infrastructure? Enroll free at Citadel Cloud Management and start learning today.


Kubernetes #FinOps #CloudCostOptimization #DevOps #EKS #AKS #GKE #Karpenter #CloudEngineering #InfrastructureAsCode

Kehinde Ogunlowo

Senior Multi-Cloud DevSecOps Architect & AI Engineer

AWS, Azure, GCP Certified | Secret Clearance | FedRAMP, CMMC, HIPAA

Enterprise experience at Cigna Healthcare, Lockheed Martin, NantHealth, BP Refinery, and Patterson UTI.

Start Your Cloud Career Today

Access 17 free courses covering AWS, Azure, GCP, DevOps, AI/ML, and cloud security — built by a practicing Senior Cloud Architect with enterprise experience.

Get Free Cloud Career Resources

You might also like