Combining GKE Autoscaling Strategies for Cost and Availability Optimization

Timeline: December 2025
Role: Cloud Engineer / Site Reliability Engineer
Skills: Google Kubernetes Engine (GKE), Horizontal Pod Autoscaling (HPA), Vertical Pod Autoscaling (VPA), Cluster Autoscaler, Node Auto Provisioning (NAP), Pod Disruption Budgets, Kubernetes PriorityClass, Pause Pods, Cost Optimization


Project Summary

This project focused on using multiple autoscaling strategies in Google Kubernetes Engine (GKE) to improve both cost efficiency and application availability. The implementation combined Horizontal Pod Autoscaling (HPA), Vertical Pod Autoscaling (VPA), Cluster Autoscaler, and Node Auto Provisioning (NAP) to reduce unused resources during low demand, scale efficiently during increased demand, and improve workload responsiveness through overprovisioning with pause pods.

The project demonstrated how autoscaling in Kubernetes is most effective when viewed as a coordinated system across both pods and nodes, rather than as isolated mechanisms.


Objectives

  • Reduce deployment replica count with Horizontal Pod Autoscaling
  • Reduce CPU requests with Vertical Pod Autoscaling
  • Decrease cluster node count with Cluster Autoscaler
  • Automatically create optimized node pools with Node Auto Provisioning
  • Test autoscaling behavior under increased demand
  • Improve response time during traffic spikes using pause pods

Architecture Overview

The architecture consisted of:

  • A GKE cluster (scaling-demo) created with Vertical Pod Autoscaling enabled
  • A php-apache deployment used to demonstrate Horizontal Pod Autoscaling
  • A hello-server deployment used to demonstrate Vertical Pod Autoscaling
  • Cluster Autoscaler to reduce or add nodes based on scheduling demand
  • Node Auto Provisioning to create right-sized node pools dynamically
  • Pod Disruption Budgets for kube-system workloads to make scale-down possible
  • A load generator pod used to simulate a spike in traffic
  • Pause pods in kube-system to reserve capacity and reduce autoscaling lag during bursts

Architecture Diagram


Implementation & Highlights

1. Provisioning the Test Environment

  • Created a three-node GKE cluster with Vertical Pod Autoscaling enabled
  • Deployed a CPU-intensive php-apache application with defined CPU requests and limits
  • Exposed the deployment with a Kubernetes Service for autoscaling tests

2. Horizontal Pod Autoscaling (HPA)

  • Applied a Horizontal Pod Autoscaler to the php-apache deployment
  • Configured the autoscaler to maintain between 1 and 10 replicas
  • Used a target CPU utilization of 50%
  • Observed that under low demand the deployment scaled down toward the minimum replica count, reducing idle resource usage

3. Vertical Pod Autoscaling (VPA)

  • Deployed a hello-server workload with an intentionally oversized CPU request
  • Created a Vertical Pod Autoscaler for the deployment
  • Started with Off mode to view recommendations, then switched to Auto
  • Observed VPA reduce the CPU request dramatically from the original value to a smaller recommendation, improving node-level resource efficiency

4. Interpreting HPA and VPA Outcomes

  • Confirmed that HPA reduced the number of php-apache pods during low traffic
  • Verified that VPA resized hello-server pods based on observed usage
  • Highlighted the practical tradeoff between aggressive automatic resizing and availability risk, especially during rapid spikes

5. Cluster Autoscaler

  • Enabled autoscaling for the cluster with min/max node thresholds
  • Switched to the optimize-utilization profile to encourage more aggressive scale-down
  • Created Pod Disruption Budgets for key kube-system components so system pods could be rescheduled safely
  • Observed the cluster scale down from three nodes to two when utilization dropped sufficiently

6. Node Auto Provisioning (NAP)

  • Enabled Node Auto Provisioning with cluster-wide CPU and memory bounds
  • Allowed GKE to create new node pools automatically when workload demand required a better machine shape
  • Positioned NAP as the vertical infrastructure scaling mechanism complementing the horizontal behavior of Cluster Autoscaler

7. Testing Under Increased Demand

  • Ran a load generator against the php-apache service to simulate sustained traffic
  • Observed HPA scale the deployment up as CPU utilization exceeded the target
  • Observed Cluster Autoscaler add nodes to accommodate unschedulable pods
  • Observed Node Auto Provisioning create an optimized node pool suited to the workload’s CPU-heavy demand profile

8. Overprovisioning with Pause Pods

  • Created a low-priority pause pod deployment in the kube-system namespace
  • Used a custom PriorityClass so pause pods could be preempted by higher-priority application workloads
  • Forced the cluster to provision extra capacity in advance
  • Improved the cluster’s ability to absorb future traffic spikes more quickly by keeping a schedulable buffer available

Design Decisions

  • Used HPA to match replica count to traffic-driven CPU demand
  • Used VPA to right-size pod CPU requests based on observed historical usage
  • Used Cluster Autoscaler to remove excess nodes during low demand and add nodes under scheduling pressure
  • Used Node Auto Provisioning to let GKE choose better-suited node pools automatically instead of relying only on fixed node types
  • Added Pod Disruption Budgets so system pods could be consolidated safely during scale-down
  • Used pause pods to strike a practical balance between strict cost minimization and scale-up responsiveness

Results & Impact

  • Successfully demonstrated how multiple GKE autoscaling layers can work together
  • Reduced unnecessary replica and node usage during low-demand periods
  • Improved node-level resource efficiency by resizing oversized pod requests
  • Enabled infrastructure to scale up automatically during load spikes
  • Improved autoscaling responsiveness by reserving spare capacity with low-priority pause pods
  • Built a strong practical understanding of the tradeoffs between:
    • cost efficiency
    • scaling speed
    • availability
    • right-sizing

Tools & Technologies Used

  • Google Kubernetes Engine (GKE) – Cluster platform
  • Horizontal Pod Autoscaler (HPA) – Pod replica scaling
  • Vertical Pod Autoscaler (VPA) – Pod request sizing
  • Cluster Autoscaler – Node count scaling
  • Node Auto Provisioning (NAP) – Automatic node pool creation
  • Pod Disruption Budgets (PDBs) – Safe scale-down behavior
  • PriorityClass / Pause Pods – Buffer capacity strategy
  • Cloud Shell – Cluster management and load testing

Outcome

This project demonstrates the ability to design and tune multi-layer autoscaling strategies on GKE to improve both cost and availability. It highlights practical skills in pod scaling, node scaling, rightsizing, autoscaling governance, and capacity buffering, which are highly relevant to cloud engineering, platform engineering, and site reliability roles.


Back to Cloud Projects