0% found this document useful (0 votes)
26 views28 pages

State of Kubernetes Cost Optimization

This document analyzes real-world data from large-scale Kubernetes clusters to provide insights into optimizing costs. It finds that workload rightsizing, or setting appropriate resource requests for containers, is the most important factor as over-provisioning is common. Establishing best practices around rightsizing, demand-based downscaling, cluster packing, and discount coverage are identified as "golden signals" for cost optimization. High and elite performers are shown to focus on these areas without compromising reliability. The research aims to share these insights with platform administrators, application operators, and developers to help them better manage Kubernetes costs.

Uploaded by

Anirban Dey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views28 pages

State of Kubernetes Cost Optimization

This document analyzes real-world data from large-scale Kubernetes clusters to provide insights into optimizing costs. It finds that workload rightsizing, or setting appropriate resource requests for containers, is the most important factor as over-provisioning is common. Establishing best practices around rightsizing, demand-based downscaling, cluster packing, and discount coverage are identified as "golden signals" for cost optimization. High and elite performers are shown to focus on these areas without compromising reliability. The research aims to share these insights with platform administrators, application operators, and developers to help them better manage Kubernetes costs.

Uploaded by

Anirban Dey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

State of Kubernetes

Cost Optimization
Authored by: Billy Lambe i • Fernando Rubbo • Kent Hua • Li Pan • Melissa Kendall

June 2023
rt
Table of contents

Executive summary 3 Final thoughts 19


Key Findings 4
Acknowledgements 20
Kubernetes cost optimization golden
Authors 21
signals 6

Methodology 23
Why do Elite pe ormers establish
the practices to be followed? 8 The Resources group 24

Workload rightsizing 24
The At Risk segment 10
Demand based downscaling 25

Balancing reliability and cost Cluster bin packing 25


e ciency 13
The Discount group 26
What do High and Elite pe ormers Other Metrics 27
teach us about how to improve cost
Research population 27
e ciency? 17
Method 27

Steps for analysis 27

State of Kubernetes Cost Optimization 2


ffi
ffi
rf
rf
Executive summary
Given the economic headwinds that sta ed with and best practices to the community (e.g.
the COVID-19 and the macroeconomic challenges pla orm administrators, application operators,
that followed (such as high in ation, supply chain and developers) on how to manage public cloud
disruptions, and geopolitical tensions), IT teams Kubernetes clusters in a cost e cient manner,
have been pursuing cloud cost optimization to without compromising the pe ormance and
achieve be er resilience and re-invest saved reliability of workloads. The ndings of this repo
resources in providing a be er and more are critical for organizations currently running
innovative experience to their customers. While clusters in the public cloud and organizations
FinOps is known to be a key practice for driving ramping up on Kubernetes, so that they can learn
both resiliency and cloud transformation, cost while they are migrating to the cloud.
optimization has proven to be challenging to
execute especially when cloud native pla orms, Our research focuses on examining how
such as Kubernetes, are taken into account. capabilities and practices predict the outcome for
cost optimization pe ormance of Kubernetes
Kubernetes provides a rich feature set and clusters running on any cloud provider. Because
scheduling capabilities. However, optimizing the cloud cost optimization is a continuous process
use of Kubernetes without impacting the end that requires the engagement of di erent teams
user experience and overall reliability of the across an organization, we propose that teams
related applications requires great understanding should measure their pe ormance using what we
of the capabilities provided by the pla orm. call the golden signals: workload rightsizing,
demand based downscaling, cluster bin packing,
It is with great pleasure that Google presents the
and discount coverage. As described in the
State of Kubernetes Cost Optimization repo .
Methodology section, these signals are used by
This repo is a quantitative analysis of real-world
this repo to segment clusters into di erent
large-scale anonymized data. It provides insights
levels of cost optimization pe ormance.

State of Kubernetes Cost Optimization 3


tf
rt
rt
tt
rf
rf
tt
fi
rf
rf
fl
ffi
ff
ff
tf
rt
tf
rt
rt
Key Findings 2. Workload rightsizing is the most impo ant
golden signal

1. Kubernetes cost optimization sta s with The research has found that workload
understanding the impo ance of se ing resource requests are, on average,
appropriate resource requests substantially over-provisioned. Even Elite
pe ormers that e ciently set memory
In addition to Kubernetes using CPU and requests have room for improvement when it
memory requests for bin packing, scheduling, comes to se ing CPU requests. The research
cluster autoscaling, and horizontal workload also found that workload rightsizing has the
autoscaling, Kubernetes also uses resource biggest oppo unity to reduce resource waste.
requests to classify a Pod's Quality of Service Most impo antly, cluster owners that focus on
(QoS) class. Kubernetes relies on this addressing discount coverage or cluster bin
classi cation to make decisions about which packing without addressing workload
Pods should be immediately killed when a rightsizing may nd that they have to re-do
given node's utilization approaches its their e o s when their workloads become
capacity. Not se ing requests indirectly properly sized. To reduce over-provisioning,
assigns the BestE o class to Pods. companies should rst create awareness with
Whenever Kubernetes needs to reclaim technical teams about the impo ance of
resources at node-pressure, without any se ing requests, and then focus on rightsizing
warning or graceful termination, BestE o their workloads.
Pods are the rst to be killed, potentially
causing disruption to the overall application. A 3. Some clusters struggle to balance
similar problem happens with Burstable reliability and cost e ciency
workloads that constantly use more memory
There are some fairly large clusters (2.1x
than they have requested. Because cost
bigger than Low pe ormers) that
optimization is a discipline to drive cost
demonstrate, through a set of enabled
reduction while maximizing business value,
features, a bigger than average intent to
indiscriminately deploying these kinds of
optimize costs. However, due to an
workloads impacts the behavior of Kubernetes
unintentional over use of both BestE o and
bin packing, scheduling, and autoscaling
memory under-provisioned Burstable Pods,
algorithms. It can also negatively a ect end
cluster nodes are o en overloaded. This
user experience and, consequently, your
situation increases the risk of intermi ent and
business.
hard to debug pe ormance and reliability
issues that were discussed in nding #1. In our
exploratory analysis, cluster owners shared
that they either didn't know they were running
large amounts of BestE o and memory
under-provisioned Burstable Pods, or they
didn't understand the consequences of
deploying them.

State of Kubernetes Cost Optimization 4


tt
rf
fi
ff
rt
rt
tt
fi
rt
fi
tt
rf
ffi
ff
fi
ft
rf
rt
ffi
ff
rt
rt
fi
rt
ff
rt
ff
tt
ff
tt
rt
rt
rt
4. End user experience can be compromised 6. Elite and High pe ormers take advantage
when cost optimization e o s don't of cloud discounts
consider reliability
Elite and High pe ormers adopt cloud
Due to how the Kubernetes scheduler works, discounts 16.2x and 5.5x more than Low
clusters running large quantities of BestE o pe ormers, respectively. Because these
Pods or memory under-provisioned Burstable segments run the largest clusters (6.2x and
Pods tend to have low cluster bin packing 2.3x bigger than Low pe ormers,
(slightly higher than Low pe ormers). Pla orm respectively), they tend to have be er in-
admins could interpret this signal as a false house Kubernetes expe ise and dedicated
oppo unity for cost savings through scaling teams focusing on cost optimization activities.
down the cluster or rightsizing cluster nodes. This allows them to incorporate the use of
If pla orm admins implement these changes, it signi cantly discounted Spot VMs and forecast
can result in application disruption due to the long term commitments.
increased chances of Pods being terminated
without warning or eviction time. 7. Chargeback can be compromised if Pods
don't set requests properly
5. Demand-based downscaling relies heavily Most Kubernetes cost allocation tooling
on workload autoscaling leverages resource requests to compute
costs. When Pods do not set requests
The research has found that Elite pe ormers
properly, such as when using BestE o Pods
can scale down 4x more than Low pe ormers.
and under-provisioned Burstable Pods, it can
This happens because Elite pe ormers take
limit showback and chargeback accuracy. For
advantage of existing autoscaling capabilities
example, even in a scenario where a given
more than any other segment. In this context,
BestE o Pod consumes a large amount of
autoscaling capabilities refers to Cluster
CPU and memory from a cluster, no cost is
Autoscaler (CA), Horizontal Pod Autoscaler
a ributed to the Pod because it requests no
(HPA), and Ve ical Pod Autoscaler (VPA). The
resources.
data shows that Elite pe ormers enable CA
1.4x, HPA 2.3x, and VPA 18x more than Low
pe ormers. In other words, enabling CA is not
enough to make a cluster scale down during
o -peak hours. To autoscale a cluster, it is
necessary to con gure workload autoscaling
(e.g. HPA and VPA) properly. The more
workloads you manage to scale down during
o -peak hours, the more e ciently CA can
remove nodes.

State of Kubernetes Cost Optimization 5


tt
ff
ff
rf
rf
fi
tf
rt
ff
rt
rt
rf
fi
rf
rt
rf
rf
ffi
ff
rf
rt
rf
tt
ff
rf
rf
rt
ff
tf
rt
Kubernetes cost optimization
golden signals
Resources Cloud discounts

$
Workload Demand based Cluster bin Discount
rightsizing downscaling packing coverage

Actual resources utilization Low demand Requested resources % covered by either Spot or
vs should drive vs cloud provider continuous
Requested resources cluster down scale Allocatable resources use discounting

Pla orm admin

Application developer Budget owner

Both the increase in container costs and the As you can see in the image above, the golden
need to reduce waste are highlighted as top signals are broken into two distinct groups. The
challenges in the 2023 State of FinOps repo . Resources group focuses on the capacity of
The Flexera’s 2023 State of Cloud Repo also using the CPU and Memory that you are paying
noted optimizing existing use of cloud as the top for, while the Cloud discounts group focuses on
initiative for the seventh year in a row. Despite the ability to take advantage of cloud provider
these items being broader to many public cloud discounts. Although it is recommended that
products, they are predominant and especially companies measure some of these signals, such
challenging to Kubernetes clusters. Kubernetes is as workload rightsizing and demand based
a complex distributed system with many features downscaling, at both the cluster and workload
that, when not used correctly, can lead to levels, this research segments and compares
increased levels of over-provisioning. To avoid clusters, not workloads. Fu hermore, all data in
being billed for resources that you don't need, the repo is measured at the cluster level. Along
observability is critical; you can’t manage what with the signals, we also highlight the role
you can't see. A er several years of iterating over responsibilities that are o en shared between
numerous metrics and feedback through di erent teams.
customer engagements, Google has identi ed
four key signals (“the golden signals”) that help Note: We use the term application developer,
you to measure the cost optimization or simply developer, interchangeable with any
pe ormance of your Kubernetes clusters on role that is responsible for managing
public clouds. Kubernetes con guration les, usually
developers, DevOps, application operators, etc.

State of Kubernetes Cost Optimization 6


ff
rf
tf
rt
ft
fi
ft
fi
rt
r
fi
r
The Resources group The Cloud Discounts group

Workload rightsizing measures the capacity of Discount coverage measures the capacity of
developers to use the CPU and memory they pla orm admins to leverage machines that o er
have requested for their applications. signi cant discounts, such as Spot VMs, as well
as the capacity of budget owners to take
Demand based downscaling measures the
advantage of long-term continuous use discounts
capacity of developers and pla orm admins to
o ered by cloud providers.
make their cluster scale down during o -peak
hours.

Cluster bin packing measures the capacity of


developers and pla orm admins to fully allocate
the CPU and memory of each node through Pod
placement.

State of Kubernetes Cost Optimization 7


ff
tf
fi
tf
tf
ff
ff
Why do Elite pe ormers establish
the practices to be followed?
As discussed previously, this repo uses As you can see, the Elite segment pe orms
Kubernetes cost optimization golden signals to be er on all golden signal metrics. Therefore, to
group clusters into di erent segments. Using this meet the demand of running cost-e cient
segmentation we analyzed the distinguishing Kubernetes clusters without compromising the
features of each segment. The following table reliability and the pe ormance of applications,
summarizes how Medium, High, and Elite the Elite pe ormers establish the practices to be
pe orming cluster metrics compare to Low followed. The main reasons are summarized as
pe ormers, in number of times, for each follows.
individual golden metric signal. For example, 2.8x
represents 2.8 times be er than Low pe orming
clusters.

Comparison of each segment to Low pe ormers

Pe ormers CPU Workload Memory Workload Demand based CPU Cluster bin Memory Cluster Discount
rightsizing rightsizing downscaling packing bin packing coverage

Medium 1.3x 1.3x 2.0x 1.6x 1.8x 1.7x

High 2.2x 2.1x 3.2x 1.9x 2.4x 5.4x

Elite 2.8x 2.7x 4.0x 2.1x 2.9x 16.2x

Elite pe ormers take advantage of cloud


discounts 16.2x more than Low pe ormers. They Elite pe ormers take advantage of cloud
also consistently consume the compute resources discounts 16.2x more than Low pe ormers.
they pay for be er than the Low pe ormers (5.1x
more for CPU and 2.7x more for memory). Because
Elite pe ormers surpass the other segments on
both the golden signals and the adoption of cost
optimization related features (e.g. Cluster
Autoscaler, Horizontal Pod Autoscaler, cost
allocation tools, etc), we can assume their teams Compared to Low pe ormers, Elite
have a be er understanding of the Kubernetes
pe ormers consistently consume more of
pla orm and are be er prepared to forecast long
term commitments. These capabilities make it the resources they pay for - 5.1x more CPU,
possible for them to run considerably lower priced and 2.7x more memory.
workloads compared to other segments.

State of Kubernetes Cost Optimization 8


rf
tt
rf
rf
tf
rf
rf
rf
rf
tt
rf
tt
tt
rf
ff
tt
rf
rt
rf
rf
ffi
rf
rf
rf
rf
rf
When it comes to correctly sizing their workloads,
Elite pe ormers achieve 2.8x be er CPU and 2.7x
Compared to Low pe ormers, Elite
be er memory than Low pe ormers. These
numbers show that developers deploying to Elite pe ormers rightsize their workloads 2.8x
pe ormer clusters have a be er understanding be er on CPU, and 2.7x be er on memory.
of the resources required by their applications in
each environment (eg. testing, staging, and
production environments).

For the demand based downscaling signal, Elite


pe ormers demonstrate 4x more capacity to
scale down their clusters during o -peak hours
than Low pe ormers. It is impo ant to remember
Elite pe ormers demonstrate 4x more
that scaling down a cluster is a shared
responsibility between developers and pla orm capacity to scale down their clusters during
teams. If developers don't scale down their o -peak hours than Low pe ormers.
workloads, pla orm teams' ability to scale down a
cluster is limited. On the other hand, if developers
manage to scale down their workloads during o -
peak hours, pla orm admins can ne tune Cluster
Autoscaler to satisfy demand according to
business needs.

For cluster bin packing, we have found that Elite


Compared to Low pe ormers, Elite
pe ormers can be er pack their Pods into their
cluster nodes for both CPU and memory pe ormers be er pack their Pods into their
compared to Low pe ormers (2.1x and 2.9x, cluster nodes - 2.1x be er on CPU and 2.9x
respectively). Cluster bin packing is another
be er on memory.
shared responsibility between developers and
pla orm teams, where both should collaborate to
nd the appropriate machine shape to be er t
the workloads.

In the next section we discuss the At Risk


segment, which has the highest probability
across all segments to negatively impact end user
experience. Balance reliability and cost e ciency
discusses best practices for avoiding the pi all of
having your workloads killed without any warning
or graceful termination. What do High and Elite
pe ormers teach us about how to improve cost
e ciency? presents best practices, and Final
thoughts presents concluding remarks.

State of Kubernetes Cost Optimization 9


fi
ffi
ff
tt
rf
rf
rf
rf
tf
tt
tt
rf
rf
rf
rf
rf
tf
tf
tt
tt
rf
rf
rf
tt
rf
tt
rt
tt
tt
fi
rf
ff
ffi
tt
tf
tf
fi
ff
The At Risk segment
In addition to segmenting Elite from High, the rst ones to be killed by kubelet. Even though
Medium, and Low pe ormers, we created the default Kubernetes behavior prefers to
another segment, which we call the At Risk schedule incoming Pods on nodes with low bin
segment, with clusters where the sum of actual packing, the bin packing algorithm doesn't take
resource utilization is generally higher than the into account actual resource utilization, it only
sum of their workloads' requested resources. We considers resources requested. However, the
decided to separate these clusters into a di erent Kubernetes scheduler continues to schedule
segment because a er several years of customer incoming BestE o and under-provisioned
engagements, Google has identi ed that the Burstable Pods on a few low bin packed nodes,
usage pa ern of these cluster's results in a causing these nodes to have higher than
higher risk of intermi ent and hard to debug requested utilization. This situation can trigger
reliability issues caused by the way Kubernetes the kubelet "self-defense mode".
reclaims resources at node-pressure.
To avoid negatively impacting end user
In summary, whenever a cluster's node resource experience and to avoid spending time
utilization approaches its capacity, kubelet enters debugging intermi ent and hard to predict
into a "self-defense mode" by terminating Pods application errors, both BestE o Pods and
immediately, meaning without any warning or memory under-provisioned Burstable Pods must
graceful termination, to reclaim the starved be used with caution. Because they can be killed
resources. This situation is caused by Pods that by kubelet whenever a node is under pressure,
use more resources than they have requested, application developers and operators must fully
such as BestE o Pods and memory under- understand the consequences of running them.
provisioned Burstable Pods. These Pods are also

State of Kubernetes Cost Optimization 10


fi
tt
ff
ff
rt
rt
tt
ft
tt
rf
ff
fi
rt
ff
The research has found that the clusters in the At
Risk segment use more resources than they have At Risk segment deploys more BestE o
requested because they deploy signi cantly more Pods - 1.6x more than Low pe ormers,
BestE o Pods (1.6x more than Low pe ormers
and 2.7x more than Elite pe ormers.
and 2.7x more than Elite pe ormers).

Both BestE o Pods and memory-


underprovisioned Burstable Pods remain useful
for utilizing the temporary idle capacity from a
cluster, but developers should adopt these kinds
of Pods only for best e o workloads that can
be killed at any moment, without any eviction
time.
Both BestE o Pods and memory-
The At Risk segment is composed of medium size underprovisioned Burstable Pods remain
clusters (2.1x bigger than Low pe ormers and 3x
useful for utilizing the temporary idle
smaller than Elite pe ormers, in number of
nodes). These clusters demonstrate, through a
capacity from a cluster, but developers
set of enabled features, bigger than the average should adopt these kinds of Pods only for
intent to optimize cost (3.9x more adoption of best e o workloads that can be killed
cloud discounts, 1.5x more adoption of cost
at any moment, without any eviction
allocation tools, and 3.1x more VPA deployments
using recommendation mode than Low time.
pe ormers). Despite intent to optimize, their
chargeback or showback solutions are more likely
to a ribute inaccurate costs to teams or divisions.
This happens because most Kubernetes cost
allocation tools leverage resource requests to
compute costs, and the large adoption of
BestE o Pods causes this segment to
constantly use more CPU and memory than they
have requested. For example, the research shows
that clusters in this segment use close to 60%
more memory than they have requested. Because
most Kubernetes cost allocation tools don't
examine actual utilization, any usage that
exceeds what was requested is not accounted
from a cost perspective. For example, if a
cluster's Pods request 25 mCPU but use 75
mCPU, cost allocation tools consider the
additional 50 mCPU as unused.

State of Kubernetes Cost Optimization 11


rf
tt
ff
ff
ff
rt
rt
ff
rt
ff
rt
rt
rf
ff
rt
rf
rf
rf
rf
fi
rf
ff
rt
We also found that the At Risk segment has
relatively low cluster bin packing (1.3x worse CPU At Risk segment adopts 1.3x less Cluster
bin packing and 1.7x worse memory bin packing
Autoscaler than Low pe ormers.
than Medium pe ormers). Some pla orm teams
may view this as a false oppo unity for cost
optimizing their clusters. If cluster operators act
on this false optimization oppo unity, they can
cause disruption due to the increased chance of
BestE o Pods and memory under-provisioned
Burstable Pods being terminated without any
warning or eviction time.
At Risk segment ne tune 2.2x less Cluster
Based on low cluster bin packing and minimal
Autoscaler than Low pe ormers.
adoption, between all segments, of cost
optimization features (1.3x smaller adoption of
Cluster Autoscaler, 2.2x less ne tuning of Cluster
Autoscaler downscaling, and 1.7x smaller
adoption of HPA than Low pe ormers), we
believe that the At Risk segment is a empting to
mitigate reliability issues discussed in this section
by keeping clusters a i cially over-provisioned. At Risk segment adopts 1.7x less
Horizontal Pod Autoscaler than Low
The next section discusses strategies and best
practices you can follow to avoid the pi alls faced pe ormers.
by clusters in the At Risk segment.

State of Kubernetes Cost Optimization 12


rf
ff
rt
rf
fi
rt
fi
fi
rf
rf
rt
rf
rt
tf
tt
tf
Balancing reliability and
cost e ciency
As discussed previously, beyond making it harder to horizontally scale your applications and
impacting cost a ribution solutions, Kubernetes bin packing, scheduling, and cluster
autoscaler algorithms, the unintentional over use of BestE o Pods and memory under-
provisioned Burstable Pods can also negatively a ect the end user experience. Although the
At Risk segment has a higher risk of facing such problems, the research has found that all
segments have the oppo unity to reduce the use of BestE o Pods and memory under-
provisioned Burstable Pods, and consequently reduce, or even eliminate, debugging
intermi ent errors caused by kubelet killing Pods without warning.

To accomplish this task, companies must invest in providing technical teams with training,
visibility, and guardrails.

For training, companies should ensure developers and operators understand the Kubernetes
Quality of Service (QoS) model and how their con guration choices can impact the reliability
of their applications when cluster nodes are under resource pressure:

BestE o Burstable Guaranteed

apiVersion: v1 apiVersion: v1 apiVersion: v1


… … …
spec: spec: spec:
containers: containers: containers:
- name: qos-example - name: qos-example - name: qos-example
image: nginx image: nginx image: nginx
resources: resources:
requests: requests:
cpu: 500m cpu: 3
memory: 100Mi memory: 200Mi
limits: limits:
cpu: 1 cpu: 3
memory: 200Mi memory: 200Mi

Pod can be killed and


Pod can be killed and
! ! marked as Fail while As the name says, Pod
marked as Fail at any
using more memory is guaranteed to run
time
than requested

● BestE o Pods are Pods that don't have any requests and limits set for their containers.
Kubernetes kills these Pods rst when a node is running out of memory. As the name
suggests, these Pods are meant exclusively for running best e o workloads. In other
words, it is not a problem if the workload doesn't run right away, if it takes longer to nish,
or if it is inadve ently resta ed or killed.

State of Kubernetes Cost Optimization 13


ff
ff
tt
rt
rt
rt
tt
ffi
rt
rt
fi
ff
fi
ff
ff
rt
rt
ff
rt
fi
● Burstable Pods are Pods with containers that have resource requests with upper, or
unbounded (not set), limits. Pods using more memory than they have requested while
the node is under resource pressure can also be killed because the Pod's memory can't
be compressed. Burstable Pods are not meant to be constantly running above what was
requested. Instead, they are meant for workloads that occasionally require additional
resources, such as speeding up the Pod's sta up time or bursting while HPA is creating
a new replica.

● Guaranteed Pods are Pods where containers have either an equal amount of request
and limit resources or set limits only (Kubernetes automatically copies limits to
requests). These Pods are meant for running workloads with strict resource needs. As
they cannot burst, these Pods have higher priority and are guaranteed to not be killed
before BestE o and Burstable Pods.

Because deploying BestE o Pods and memory under-provisioned Burstable Pods can
cause disruption to workloads, we recommend the following best practices:

● Avoid using BestE o Pods for workloads that require a minimum level of reliability.

● Set memory requests equal to memory limits for all containers in all Burstable Pods. You
can set upper limits for CPU because Kubernetes can thro le CPU requests whenever
needed, though this can impact application pe ormance. Se ing upper limits for CPU
allows your applications to use idle CPU from nodes without worrying about abrupt
workload termination.

Best practice for Burstable workloads

Burstable Burstable

apiVersion: v1 apiVersion: v1
… …
spec: spec:
containers: containers:
- name: qos-example - name: qos-example
image: nginx image: nginx
resources: resources:
requests: requests:
cpu: 500m cpu: 500m
memory: 100Mi memory: 200Mi
limits: limits:
cpu: 1 cpu: 1
memory: 200Mi memory: 200Mi

Pod can be killed and marked


! as Fail while using more
memory than requested
Pod can have CPU thro led
to request

State of Kubernetes Cost Optimization 14


ff
rt
ff
tt
rt
ff
rt
rt
rf
tt
tt
For visibility, both DevOps and pla orm admins should provide application owners with
rightsizing recommendations, while highlighting and tracking the use of workloads that are at
reliability risk, such as all BestE o Pods and memory under-provisioned Burstable Pods. It is
impo ant to adopt dashboards, warnings, ale s, and actionable strategies such as
automatic issue ticketing or automatic pull requests with actual recommendations. These
strategies are even more e ective when they are integrated into the developers workstreams,
such as into developer IDEs, developer po als, CI/CD pipelines, etc. Such alternatives have
been shown to be useful, not only for improving reliability of the overall applications, but also
for building a more cost conscious culture.

For guardrails, both DevOps and pla orm teams can build solutions that enforce the best
practices discussed in this section. Enforcement can be done using standard Kubernetes
constructs, such as validation and mutation webhooks, or by using policy controller
frameworks, such as Open Policy Agent (OPA) Gatekeeper. It is also impo ant to have an in-
place break glass process to allow teams that understand Kubernetes QoS and its
consequences to take advantage of idle cluster resources. For example, when a developer
creates a merge request without se ing resources, a validation pipeline could either demand
a specialized peer review or annotate the merge request with a warning. If the code is
merged into the main branch and run in production, the team could enforce an annotation
that states the team understands the consequences and they want to bypass organization
policy validations.

In the situation where a team has prepared workloads for abrupt termination and they fully
understand the consequences of running BestE o Pods, there is an oppo unity to utilize
their idle cluster resources and increase savings by running best e o workloads on Spot
VMs. Spot VMs are o en o ered at a discounted price in exchange for cloud providers being
allowed to terminate and reclaim resources on sho notice. Pla orm teams can implement
this strategy using a mutation webhook to append a node a nity preference to Spot VMs on
all Pods not se ing requests. This leaves room on standard nodes for workloads that require
greater reliability. Once this strategy is incorporated into an organization's policy or
automation pipeline, if no Spot VMs are provisioned, standard VMs are used.

State of Kubernetes Cost Optimization 15


rt
tt
ft
ff
ff
ff
rt
tf
tt
tf
rt
rt
ff
rt
rt
ffi
tf
ff
rt
rt
rt
If the pla orm team doesn't want to allow best practices to be bypassed, the
recommendation is to validate and reject non-compliant workloads. If that is not a possibility,
for workloads that are tolerant to graceful resta s, an alternative is to either recommend or
enforce the adoption of Ve ical Pod Autoscaler using the Auto mode.

Note: When this repo was wri en, VPA required Pods to be resta ed to update Pod's
resources. However, the KEP #1287: In-Place Update of Pod Resources became an alpha
feature in Kubernetes 1.27. This feature will allow future versions of VPA to, in the majority
of the cases, update resource values without resta ing a Pod.

Finally, as shown below, you can also set defaults for container resources using the
Kubernetes LimitRange API. Because defaults can result in your workload becoming either
under- or over-provisioned, this should not replace the recommendation of adopting VPA for
rightsizing Pods. The bene t of using defaults for resources is that resources can be applied
when the workload is rst deployed, while VPA is still in LowCon dence mode. In this mode,
VPA does not update Pod resources due to not having enough data to make a con dent
decision.

apiVersion: v1
kind: LimitRange
metadata:
name: my-container-resources-defaults
namespace: my-namespace
spec:
limits:
- default: # defines default resources for limits
memory: 500Mi
defaultRequest: # defines default resources for requests
cpu: 250m
memory: 500Mi
type: Container

State of Kubernetes Cost Optimization 16


tf
rt
fi
fi
rt
tt
rt
rt
fi
rt
fi
What do High and Elite
pe ormers teach us about how
to improve cost e ciency?
Cluster owners that adopt a continuous practice chargeback processes to allocate or even bill the
to measure and improve Kubernetes cost costs associated with each depa ment's or
optimization golden signals can earn signi cant division's usage from multi-tenant Kubernetes
cost savings by running lower priced workloads in clusters.
the public cloud. Therefore, to meet the demand
of cost-e cient Kubernetes clusters without However, to enable clusters to scale down during
compromising reliability and pe ormance, it is o -peak hours as much as High and Elite
impo ant to understand and follow the practices pe ormers do (3.2x and 4x more than Low
of High and Elite pe ormers. pe ormers, respectively), it is also necessary to
scale workloads according to demand. The
Our research shows that both High and Elite research shows that High and Elite pe ormers
pe ormers run the largest clusters across all use two main approaches. Firstly, the adoption of
segments (High pe ormers run clusters 2.3x workload autoscaling APIs (e.g. HPA and VPA in
larger and Elite pe ormers run clusters 6.2x Auto/Init mode), in which High and Elite
larger, in number of nodes, than Low pe ormers). pe ormers demonstrate 1.5x and 2.3x higher
Larger clusters allow resources to be be er adoption than Low pe ormers, respectively.
utilized. For example, while a few workloads are Secondly, High and Elite clusters run 3.6x and 4x
idle, other workloads can be bursting. Large more jobs to completion than Low pe ormers,
clusters also tend to be multi-tenant clusters. The respectively. Unfo unately, it was not possible to
higher the number of tenants, the more measure which strategy has the biggest impact,
operational challenges need to be managed. as the segments contain clusters that use a
Such a pa ern demands specialized pla orm variety of such approaches.
teams that need to put in place company cost
optimization policies, best practices, and The research has also found that the adoption of
guardrails. workload autoscaling and jobs to completion is
much more impo ant for downscaling a cluster
The pla orm teams from High and Elite than optimizing Cluster Autoscaler for a faster
pe ormers also tend to enable more cluster level and more aggressive scale down. In other words,
cost optimization features than Low pe ormers, if developers don't properly con gure their
such as Cluster Autoscaling (1.3x and 1.4x, workloads to scale according to demand, pla orm
respectively) and cost allocation tooling (3x and teams are limited in their ability to scale down
4.6x, respectively). While Cluster Autoscaler their cluster. The research has also found that all
allows clusters to automatically add and remove segments, including Elite pe ormers, could
nodes as needed, cost allocation tools enable improve demand based autoscaling e ciency by
companies to implement showback or increasing the adoption of HPA and VPA.

State of Kubernetes Cost Optimization 17


ff
rf
rf
rf
rf
rf
rt
tf
rf
ffi
tt
rt
rt
rf
rf
rf
rf
rf
rf
fi
rt
rf
ffi
rf
rf
rf
tf
tt
fi
tf
ffi
High and Elite segments also demonstrate be er The Data on Kubernetes 2022 research
adherence to workload best practices. For shows the bene ts of running data-centric
example, developers from High and Elite applications include:
pe ormers rightsize their workloads up to 2.2x
and 2.8x be er than Low pe ormers, and deploy ● ease of scalability
50% and 70% less BestE o Pods than clusters
● ability to standardize management of
in the At Risk segment, respectively. These
workloads
excellent results re ect developers' be er
knowledge of their applications and demonstrate ● consistent environments from
the value of High and Elite segments specialized development to production
pla orm teams that can build tools and custom
solutions. As a result, the High and Elite segments ● ease of maintenance
can continuously enforce policies, provide golden
● improved security
paths, optimize rightsizing (e.g. VPA at
recommendation mode deployed 12x and 16.1x ● improved use of resources
more by these segments, respectively, than Low
pe ormers) and provide best practices ● co-location of latency sensitive
recommendations to developer teams. workloads

Lastly, despite the challenges of building and ● ease of deployment


operating data-centric applications on
Kubernetes, the community has seen a rapid
increase in deployments of databases and other
stateful workloads. High and Elite pe ormers,
who run 1.8x and 2x more StatefulSets than Low
pe ormers, are the main drivers of this increase.

State of Kubernetes Cost Optimization 18


rf
rf
rf
tf
tt
fi
fl
ff
rt
rf
rf
tt
tt
Final thoughts
As discussed throughout this repo , many can also be seen in this repo where the largest
Kubernetes features and external tools rely on clusters, while being the hardest to operate, have
Pods se ing resource requests properly so that demonstrated the best cost optimization
organizations can achieve the best reliability, pe ormance. To overcome the long tail
pe ormance, and cost e ciency from the challenges, organizations should invest in
pla orm. The unintentional over use of Pods that de ning and enforcing company-wide policies,
don't set requests or constantly utilize more pa erns, and golden pre-approved paths, as well
memory than requested can cause unexpected as nding the right balance between control,
behavior in clusters and applications, including extensibility, and exibility.
negatively impacting end user experience. Even
Finally, pla orm teams should be aware that
though the At Risk segment faces these issues at
measuring cluster bin packing alone doesn't tell
higher frequency, the research has found that
the entire story. Looking at the At Risk segment,
there are oppo unities for all segments, including
we can see low values for cluster bin packing and
Elite pe ormers, to improve both reliability and
high values (sometimes above 100%) for
cost e ciency.
workload rightsizing. This is a re ection of the
It is impo ant to highlight that addressing the over use of BestE o Pods and under-
oppo unities presented by the Kubernetes cost provisioned Burstable Pods, in which the Pods
optimization signals should be continuously use more cluster resources than they have
prioritized. Moreover, both cluster and application requested, consequently skewing both signals. If
owners should prioritize correctly sizing pla orm teams decide to save money by packing
applications and ensuring applications can be er clusters in such a situation, the risk of
automatically scale based on demand, instead of running into reliability issues increases
primarily focusing on cluster bin packing and accordingly.
seeking additional discount coverage. This is
because signi cant changes in workload
rightsizing and autoscaling may result in re-work
required for bin packing and long-term discount
strategies.

Even though this research did not analyze


clusters across segments within a company,
Google has identi ed a pa ern through cost
optimization engagements with customers, where
larger multi-tenant clusters tend to gather the
majority of organizational Kubernetes expe ise.
This has resulted in many long tail clusters being
managed by teams whose main goal is to deliver
a service, not manage a pla orm. This pa ern

State of Kubernetes Cost Optimization 19


tt
fi
tt
rf
rf
fi
tf
tf
rt
ffi
rf
tt
rt
tf
fi
rt
fl
fi
ff
rt
ffi
tt
tf
rt
fl
rt
tt
rt
Acknowledgements
A large family of passionate contributors made this research possible. Data gathering, data
engineering, analysis, writing, editing, and repo design are just a few of the ways that our
colleagues helped to realize this large e o . The authors would like to thank all of these
people for their input and guidance on the repo this year. All acknowledgements are listed
alphabetically.

Abby Holly Erwan Menard Piotr Koziorowski

Ameenah Burhan Frank Lamar Praveen Rajasekar

Andre Ellis Geo rey Anderson Rahul Khandkar

Anthony Bushong Harish Jayakumar Richard Seroter

Bobby Allen Iain Foulds Roman Arcea

Dave Ba ole i Jerzy Foryciarz Slav Tringov

Drew Bradstock Michael Chen Thorgane Marques

Eric Lam Pathik Sharma Yana Lubell

State of Kubernetes Cost Optimization 20


ff
rt
tt
ff
rt
rt
rt
Authors
Billy is a Data Scientist for the Cloud Product
Analytics Team at Google. He works on projects
related to identifying oppo unities to make
products at Google Cloud more e cient and
e ective. Before his time at Google, he worked on
projects related to satellite imagery, medical and
health sciences, image shape analysis, and
explainable a i cial intelligence. He received his
Billy Lambe i PhD in Computational Sciences and Informatics
with a specialization in Data Science.

Fernando Rubbo is a Cloud Solutions Architect at


Google, where he builds global solutions and
advises customers on best practices for
modernizing and optimizing applications running on
Google Cloud. Throughout his career, he has worn
many hats, including customer solutions engineer,
team lead, product manager, so ware and pla orm
architect, so ware and pla orm consultant, and
so ware developer. Fernando also holds a MS in
Fernando Rubbo
Computer Science from UFRGS University.

Kent Hua is a Global Solution Manager at Google


Cloud, advocating solutions that help organizations
modernize applications and accelerate their
adoption of cloud technologies on Google Cloud.
He is a co-author of Cloud Native Automation with
Google Cloud Build, a book to help individuals and
organizations automate so ware delivery. His focus
on customer success is paramount in his current
and previous roles as an enterprise architect, pre-
Kent Hua sales engineer and consultant.

State of Kubernetes Cost Optimization 21


ff
ft
ft
rt
rt
fi
ft
rt
tf
ft
ffi
tf
Li Pan is an Operations and Infrastructure Data
scientist at Google Cloud, where she leads a team
to optimize GCE e ciency, focusing on intelligence
to allocate vi ual machines into optimal locations,
and strategies for new products or new features to
existing products to consume unsold capacity.
Before Google, she worked in other tech
companies as a data scientist. She received her
Li Pan Ph.D. in Mathematics from the University of
California, San Diego in 2013.

Melissa Kendall is a technical writer at Google


where she focuses on GKE networking, cost
optimization, and observability. Outside Google,
she is a creative writer, public speaker, and lover of
all things Halloween.

Melissa Kendall

State of Kubernetes Cost Optimization 22


rt
ffi
Methodology
The State of Kubernetes Cost Optimization repo groups anonymized GKE clusters into 5
segments (i.e. At Risk, Low, Medium, High, Elite) based on Kubernetes cost optimization
golden signals. Due to possible privacy concerns, the original dataset and the actual metrics
values are not shared. Instead, the repo provides the comparison ratio between di erent
segments. For instance, the relative magnitude of Elite pe ormers compared to Low
pe ormers in a given metric. For example, if the Elite and Low pe ormers have 45 and 10 for
a given metric, respectively, the corresponding ratio would be 4.5. This ratio means that the
typical Elite pe ormers are 4.5 times be er than the typical Low pe ormers for a given
metric.

Each observation represents an anonymized GKE cluster on a given day. Therefore, each
unique anonymous GKE cluster can appear more than once. Our notation is as follows:

● Let there be total unique clusters.

● For a given cluster, k, the number of times the cluster occurs (or the number of days
the cluster is "alive") is .

● This leads us to, and where is the cardinality


operator.

The cardinality operator counts the number of observations that belong to a set. For
example, if the number of alive days with a pa icular cluster from 01/01/2023 to 01/31/2022
was 10, then the cardinality would be 10.

There can be many nodes within a cluster. In this repo , we only measured the user nodes,
excluding control plane nodes, as described in the following sections.

State of Kubernetes Cost Optimization 23


rf
rf
rt
tt
rt
rt
rt
rf
rf
rf
ff
The Resources group
We assume that some clusters are well utilized, and others that are not. The following three
metrics capture such behavior:

● Workload rightsizing

● Demand based downscaling

● Cluster bin packing

Workload rightsizing
The CPU and RAM workload rightsizing is de ned for the cluster as, respectively:

In the previous equations:

● The Average Recommended CPU/RAM is the mean of Ve ical Pod Autoscaler (VPA)
Recommendations over the daily average.

○ Google computes VPA recommendations at scale for every single workload running in
any GKE cluster. Instead of using actual Pod utilization, we have chosen to use Pod's
VPA recommendations, which is more constant and slightly higher than actual Pod
utilization.

● and have a suppo (or a range of possible values) of . However, they are
usually between [0,1). There are cases where a user doesn’t set a requested CPU or RAM
amount for some containers, which may result in a workload rightsizing at the cluster level
being greater than 1.

State of Kubernetes Cost Optimization 24


rt
fi
rt
Demand based downscaling
Demand based downscaling is de ned for the cluster as

This value has a suppo of [0,1]. A value close to 1 corresponds to a cluster that is using all of
its nodes. A value close to 0 corresponds to a cluster that is not using all of its nodes. We
would expect clusters with a "high" capacity of downscaling would have values closer to 1.

Cluster bin packing


CPU and RAM bin packing is de ned for the cluster as, respectively,

Note that both “Average Request” and "Average Allocatable" is the daily average. This
value has a suppo of [0,1]. A value close to 0 corresponds to a cluster in which
workloads are not requesting the allocated CPUs or RAM, while a value close to 1
corresponds to a cluster in which workloads are requesting all the allocated CPUs or
RAM. We would expect clusters with a "high" bin packing to have values closer to 1.

State of Kubernetes Cost Optimization 25


rt
rt
fi
fi
The Discount group
In the discount group, we only have one metric: discount coverage. This metric de nes the
percentage of cluster core hours which are covered by cloud discounts (or percentage
discounted as a sho hand), is de ned for the cluster as

Percentage discounted has a suppo of [0,1]. A value close to 0 or 1 indicates that the cluster
is not or is utilizing discounted nodes, respectively. We would expect clusters with a "high"
cloud discount to have values closer to 1.

Spot VMs are o en o ered at a high discounted price in exchange for cloud providers being
allowed to terminate and reclaim resources on sho notice (Source, Source 2). Thus,
customers should take a balanced approach of using Spot VMs for non-critical workloads
that can tolerate disruption.

Continuous use discounts (CUD) correspond to those renewed contracts at a larger discount.
CUD are ideal for predictable and steady state usage (Source).

Thus, corresponds to the propo ion of core hours utilizing Spot or CUD discounts. While
Spot and CUD di er in their nature, they are pa of all major public cloud providers o ered
as discount options.

State of Kubernetes Cost Optimization 26


ft
ff
rt
ff
rt
fi
rt
rt
rt
fi
ff
Other Metrics are considered “Low Pe ormer” clusters. Thus,
we made the naive assumption that each metric
Other metrics are provided in this repo to contributes equally to the consideration of which
describe the segments. However, none were used group it belongs to while also using quantiles to
in the creation of the segments. Thus, they are determine the exact cuto values for each metric.
not expanded in great detail.
This process resulted in 5 segments: Low,
Medium, High, Elite, and At Risk. The “Low” to
Research population
“Elite” segments are coho s who range in their
To run this research we analyzed all GKE clusters ability to be computationally e cient. For
in an anonymized form, except for those with 3 or example, the “Elite” segment are able to use their
less nodes. We chose to exclude these clusters resources in an exceptional manner. As another
as likely testing clusters or clusters with low example, the “Medium” segment might excel in
usage. We chose to make this exclusion because one area, but struggle in another. “At Risk”
the default GKE cluster creation includes 3 clusters have the sum of their actual resources
nodes. Data was used from multiple separate utilization is more o en than not above the sum
months. This was done to con rm that the values of their workloads' resources requested
were similar across the di erent months to avoid
a seasonality e ect. The nal repo ed values
Steps for analysis
utilized throughout the document are from
January 2023. The high-level steps for pe orming this analysis
were as follows:

Method 1. Collect the data.


For this research we have used a classi cation
2. Build the class labels.
tree technique. Tree models segment the feature
space into pa itions using rules. If the problem a. Determine the inputs.
has a continuous outcome, a regression tree is
used where the predicted value is the mean value b. Determine the number of classes.
of a given segment. If the output is a series of
3. Build the classi cation tree.
classes, then a classi cation tree is used where
the prediction is a pa icular class. For more 4. Summarize the segments based on the
information about trees, see An Introduction to predictions from the tree.
Statistical Learning with Applications in R.
This process was repeated for each month of
For this repo , we constructed our classes and interest. For example, these 4 steps were
formulated a classi cation tree setup. We were pe ormed for November independent of the 4
able to formulate the problem as a classi cation steps pe ormed for February.
tree by using handmade and data driven classes
because we have domain expe ise. Generally
speaking, those observations with many metrics
closer to one are considered “Elite Pe ormers”
clusters and those with many metrics closer to 0

State of Kubernetes Cost Optimization 27


rf
rf
rt
rt
ff
fi
fi
ft
fi
rt
rf
fi
ff
ff
rt
rf
fi
ffi
rt
rt
rf
rt
fi
fi

You might also like