Optimizing Cost and Scalability for OnlineBoutique on Google Kubernetes Engine

Timeline: December 2025
Role: Cloud Engineer / Site Reliability Engineer
Skills: Google Kubernetes Engine (GKE), Kubernetes Namespaces, Node Pools, Custom Machine Types, Pod Disruption Budgets, Horizontal Pod Autoscaling, Cluster Autoscaler, Rolling Updates, Load Testing, Cost Optimization

Project Summary

This project focused on deploying and optimizing the OnlineBoutique microservices application on Google Kubernetes Engine (GKE) with an emphasis on cost efficiency, scalability, and service availability. The work included provisioning a right-sized GKE cluster, separating resources across development and production namespaces, migrating workloads to a more efficient custom machine type node pool, applying a safe frontend update using a Pod Disruption Budget, and configuring autoscaling to respond automatically to a major traffic surge.

The implementation demonstrated how Kubernetes platform decisions at the cluster, node pool, and workload levels can work together to reduce infrastructure waste while maintaining application resilience during change and growth.

Objectives

Create a zonal GKE cluster for OnlineBoutique
Separate environments using Kubernetes namespaces
Deploy the OnlineBoutique application to the development namespace
Migrate workloads to a more cost-efficient custom machine type node pool
Apply a frontend update without downtime
Configure autoscaling for frontend and backend services
Enable cluster autoscaling to respond to demand spikes

Architecture Overview

The architecture consisted of:

A zonal GKE cluster running on the rapid release channel
Separate dev and prod namespaces for environment segregation
The OnlineBoutique microservices application deployed into the dev namespace
An initial default node pool using e2-standard-2 machines
A second optimized node pool using the custom-2-3584 machine type
A frontend Pod Disruption Budget to preserve service availability during updates
Horizontal Pod Autoscalers for frontend and recommendationservice
Cluster Autoscaler configured with node count bounds to expand and shrink infrastructure automatically
A loadgenerator pod simulating a high-concurrency traffic surge against the frontend service

Architecture Diagram

Implementation & Highlights

1. Cluster Provisioning and Environment Separation

Created a zonal GKE cluster on the rapid release channel
Started with a small two-node cluster using e2-standard-2 machines
Created separate namespaces for:
- dev
- prod
Deployed the OnlineBoutique application to the dev namespace
Established an initial environment layout that balanced simplicity with basic separation of concerns

2. Node Pool Rightsizing for Cost Optimization

Reviewed the resource shape of the deployed workloads
Identified that the current nodes had excess RAM and that smaller resource increments would likely fit workload growth more efficiently
Created a new node pool using the custom-2-3584 machine type
Configured the new pool with two nodes
Cordoned and drained the default pool to migrate the application safely
Deleted the default node pool after workloads had moved successfully
Demonstrated practical node pool optimization by aligning machine shape more closely with application requirements

3. Frontend Update with Availability Protection

Created a Pod Disruption Budget named onlineboutique-frontend-pdb
Configured the frontend deployment with minAvailable: 1
Updated the frontend image to:
- gcr.io/qwiklabs-resources/onlineboutique-frontend:v2.1
Changed imagePullPolicy to Always
Applied the update while preserving service availability
Demonstrated safer release management for a customer-facing microservice

4. Frontend Autoscaling for Traffic Growth

Applied Horizontal Pod Autoscaling to the frontend deployment
Used:
- target CPU utilization of 50%
- minimum replicas of 1
- defined maximum replica count
Designed the autoscaling threshold to leave sufficient CPU headroom while pods scaled
Improved the application’s ability to absorb a marketing-driven traffic surge without relying entirely on manual intervention

5. Cluster Autoscaling for Infrastructure Elasticity

Configured cluster autoscaling with:
- minimum nodes: 1
- maximum nodes: 6
Enabled the cluster to add nodes automatically under scheduling pressure and shrink during lower demand
Extended optimization beyond pod count to include infrastructure elasticity, helping reduce the cost of overprovisioned capacity

6. Traffic Surge Simulation and Bottleneck Identification

Ran a high-concurrency load test from the loadgenerator pod against the external frontend service
Simulated approximately 8,000 concurrent users
Observed how the cluster responded to the surge in the GKE Workloads view
Identified recommendationservice as a stressed component under the increased demand
Used load testing as an operational validation step rather than assuming the original scaling profile was sufficient

7. Backend Service Autoscaling

Applied Horizontal Pod Autoscaling to the recommendationservice
Configured scaling with:
- target CPU utilization of 50%
- minimum replicas of 1
- maximum replicas of 5
Reduced the likelihood that a single downstream service would become a bottleneck during peak traffic
Demonstrated that cost optimization must include both the entry-point service and its dependent backend services

Design Decisions

Used separate namespaces for dev and prod to introduce lightweight environment separation from the start
Started with a small initial cluster to keep baseline costs low
Migrated to a custom machine type after observing workload characteristics instead of blindly retaining the default node shape
Used a Pod Disruption Budget on the frontend to reduce the risk of downtime during update and maintenance actions
Combined HPA and Cluster Autoscaler so both pod count and infrastructure size could respond to demand
Used load testing to reveal actual weak points in the application path before making scaling decisions

Results & Impact

Successfully deployed OnlineBoutique to GKE with namespace-based environment separation
Reduced infrastructure waste by moving to a more efficient custom machine type node pool
Performed a frontend update with better availability protection
Enabled workload-level and cluster-level autoscaling
Identified and mitigated backend bottlenecks exposed by a large traffic simulation
Built a practical example of cost-aware Kubernetes operations for a microservices workload

Tools & Technologies Used

Google Kubernetes Engine (GKE) – Cluster platform
Kubernetes Namespaces – Environment separation
Node Pools – Infrastructure shaping
Custom Machine Types – Cost optimization
Pod Disruption Budgets (PDBs) – Update safety and availability
Horizontal Pod Autoscaler (HPA) – Workload scaling
Cluster Autoscaler – Node count scaling
Loadgenerator / Locust-style traffic simulation – Demand testing

Outcome

This project demonstrates the ability to optimize a microservices application on GKE for both cost and scalability by combining right-sized infrastructure, safer rollout practices, and demand-driven autoscaling. It highlights practical skills in Kubernetes platform operations, workload tuning, traffic-based scaling, and reliability-aware cost optimization, which are highly relevant to cloud engineering, platform engineering, and site reliability roles.

Back to Cloud Projects

Felix Otieno Arogo