0% found this document useful (0 votes)
116 views14 pages

Insights on Serverless Platform Performance

This document summarizes a study of serverless computing platforms AWS Lambda, Azure Functions, and Google Cloud Functions. The study conducted the largest measurement to date, launching over 50,000 function instances. The key findings include: 1) AWS Lambda had the best scalability and lowest "cold start" latency, but performance could decrease by up to 19x with high contention from other functions in the same account. 2) Azure Functions sometimes ran instances on less performant VMs, degrading performance. It also had placement vulnerabilities allowing cross-tenant side channel attacks. 3) Google Cloud Functions previously had an accounting issue that enabled using resources almost free by leaving function instances running. The study provided new

Uploaded by

yihun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views14 pages

Insights on Serverless Platform Performance

This document summarizes a study of serverless computing platforms AWS Lambda, Azure Functions, and Google Cloud Functions. The study conducted the largest measurement to date, launching over 50,000 function instances. The key findings include: 1) AWS Lambda had the best scalability and lowest "cold start" latency, but performance could decrease by up to 19x with high contention from other functions in the same account. 2) Azure Functions sometimes ran instances on less performant VMs, degrading performance. It also had placement vulnerabilities allowing cross-tenant side channel attacks. 3) Google Cloud Functions previously had an accounting issue that enabled using resources almost free by leaving function instances running. The study provided new

Uploaded by

yihun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Peeking Behind the Curtains

of Serverless Platforms
Liang Wang, UW-Madison; Mengyuan Li and Yinqian Zhang, The Ohio State University;
Thomas Ristenpart, Cornell Tech; Michael Swift, UW-Madison
https://www.usenix.org/conference/atc18/presentation/wang-liang

This paper is included in the Proceedings of the


2018 USENIX Annual Technical Conference (USENIX ATC ’18).
July 11–13, 2018 • Boston, MA, USA
ISBN 978-1-939133-02-1

Open access to the Proceedings of the


2018 USENIX Annual Technical Conference
is sponsored by USENIX.
Peeking Behind the Curtains of Serverless Platforms

Liang Wang 1 , Mengyuan Li 2 , Yinqian Zhang 2 , Thomas Ristenpart3 , Michael Swift1


1 UW-Madison, 2 Ohio State University, 3 Cornell Tech

Abstract Serverless computing originated as a design pattern


Serverless computing is an emerging paradigm in which for handling low duty-cycle workloads, such as process-
an application’s resource provisioning and scaling are ing in response to infrequent changes to files stored on
managed by third-party services. Examples include the cloud. Now it is used as a simple programming model
AWS Lambda, Azure Functions, and Google Cloud for a variety of applications [14, 22, 42]. Hiding resource
Functions. Behind these services’ easy-to-use APIs management from tenants enables this programming
are opaque, complex infrastructure and management model, but the resulting opacity hinders adoption for
ecosystems. Taking on the viewpoint of a serverless many potential users, who have expressed concerns
customer, we conduct the largest measurement study about: security in terms of the quality of isolation,
to date, launching more than 50,000 function instances DDoS resistance, and more [23, 35, 37, 40]; the need to
across these three services, in order to characterize their understand resource management to improve application
architectures, performance, and resource management performance [4, 19, 24, 27, 28, 40]; and the ability
efficiency. We explain how the platforms isolate the of platforms to deliver on performance [10–12, 29–
functions of different accounts, using either virtual 31]. While attempts have been made to shed light on
machines or containers, which has important security platforms’ resource management and security [33, 34],
implications. We characterize performance in terms known measurement techniques, as we will show, fail to
of scalability, coldstart latency, and resource efficiency, provide accurate results.
with highlights including that AWS Lambda adopts We therefore perform the most in-depth study of
a bin-packing-like strategy to maximize VM memory resource management and performance isolation to
utilization, that severe contention between functions can date in three popular serverless computing providers:
arise in AWS and Azure, and that Google had bugs that AWS Lambda, Azure Functions, and Google Cloud
allow customers to use resources for free. Functions (GCF). We first use measurement-driven
approaches to partially reverse-engineer the architectures
1 Introduction of Lambda and Azure Functions, uncovering many
Cloud computing has allowed backend infrastructure undocumented details. Then, we systematically examine
maintenance to become increasingly decoupled from a series of issues related to resource management: how
application development. Serverless computing (or quickly function instances can be launched, function
function-as-a-service, FaaS) is an emerging application instance placement strategies, function instance reuse,
deployment architecture that completely hides server and more. Several security issues are identified and
management from tenants (hence the name). Tenants discussed.1 We further explore how CPU, I/O and
receive minimal access to an application’s runtime network bandwidth are allocated among functions and
configuration. This allows tenants to focus on developing the ensuing performance implications. Last but not least,
their functions — small applications dedicated to specific we explore whether all resources are properly accounted
tasks. A function usually executes in a dedicated function for, and report on two resource accounting bugs that
instance (a container or other kind of sandbox) with allow tenants to use extra resources for free. Some
restricted resources such as CPU time and memory. highlights of our results include:
Unlike virtual machines (VMs) in more traditional • AWS Lambda achieved the best scalability and
infrastructure-as-a-service (IaaS) platforms, a function the lowest coldstart latency (the time to provision
instance will be launched only when the function is a new function instance), followed by GCF. But
invoked and is put to sleep immediately after handling
a request. Tenants are charged on a per-invocation basis, 1 Weresponsibly disclosed our findings to related parties before this
without paying for unused and idle resources. paper was made public.

USENIX Association 2018 USENIX Annual Technical Conference 133


AWS Azure Google
the lack of performance isolation in AWS between
64 * k 128 * k
function instances from the same account caused up Memory (MB)
(k = 2, 3, ..., 24)
1536
(k = 1, 2, 4, 8, 16)
to a 19x decrease in I/O, networking, or coldstart CPU
Proportional to
Unknown
Proportional to
Memory Memory
performance. Python 2.7/3.6 Nodejs 6.11.5,
• Azure Functions used different types of VMs as Language Nodejs 4.3.2/6.10.3 Python 2.7, Nodejs 6.5.0
Java 8, and others and others
hosts: 55% of the time a function instance runs on Runtime OS Amazon Linux Windows 10 Debian 8*
a VM with debased performance. Local disk (MB) 512 500 > 512
Run native code Yes Yes Yes
• Azure had exploitable placement vulnerabilities [36]: Timeout (second) 300 600 540
a tenant can arrange for function instances to run on Execution time
Execution time Execution time
Billing factor Allocated memory
the same VM as another tenant’s, which is a stepping Allocated memory Consumed memory
Allocated CPU
stone towards cross-function side-channel attacks.
Table 1: A comparison of function configuration and
• An accounting issue in GCF enabled one to use billing in three services. (*: We infer the OS version
a function instance to achieve the same computing of GCF by checking the help information and version of
resources as a small VM instance at almost no cost. several Linux tools such as APT.)
Many more results are given in the body. We have
repeated several measurements in May 2018 and high-
the function instance(s) have processed the requests
light in the paper the improvements the providers have
and exited or reached the maximum execution time
made. We noticed that serverless platforms are evolving
(see “Timeout” in Table 1), the function instance(s)
quickly; nevertheless, our findings serve as a snapshot
becomes idle. They may be reused to handle subsequent
of the resource management mechanisms and efficiency
requests to avoid the delay of launching new instances.
of popular serverless platforms, provide performance
However, idle function instances can also be suddenly
baselines and design options for developers to build more
terminated [32]. Each function instance is associated
reliable platforms, and help tenants improve their use of
with a non-persistent local disk for temporarily storing
serverless platforms. More generally, our study provides
data, which will be erased when the function instance
new measurement techniques that are useful for other
is destroyed.
researchers. Towards facilitating this, we will make our
measurement code public and open source.2 One benefit of using serverless services is that tenants
do not pay for resources consumed when function
2 Background instances are idle. Tenants are billed based on resource
consumption only during execution.4 In common
Serverless computing platforms. In serverless com- across platforms is charging for aggregated function
puting, an application usually consists of one or more execution time across all invocations. Additionally, the
functions — standalone, small, stateless components price varies depending on the pre-configured function
dedicated to handle specific tasks. A function is most memory (AWS, Google) or the actual consumed memory
often specified by a small piece of code written in some during invocations (Azure). Google further charges
scripting language. Serverless computing providers different rates based on CPU speed.
manage the execution environments and backend servers
of functions, and allocate resources dynamically to Related work. Many serverless application developers
ensure their scalability and availability. have conducted their own experiments to measure
In recent years, many serverless computing platforms coldstart latency, function instance lifetime, maximum
have been developed and deployed by cloud providers, idle time before shut down, and CPU usage in AWS
including Amazon, Azure, Google, and IBM. We focus Lambda [10–12, 19, 27, 28, 40]. Unfortunately, their
on Amazon AWS Lambda, Azure Functions and Google experiments were ad-hoc, and the results may be
Cloud Functions.3 In these services, a function is misleading because they did not control for contention
executed in a dedicated container or other type of by other instances. A few research papers report on
sandbox with limited resources. We use function measured performance in AWS. Hendrickson et al. [18]
instance to refer to the container/sandbox a function measured request latency and found it had higher latency
runs on. The resources advertised as available to a than AWS Elastic Beanstalk (a platform-as-a-service
function instance varies across platforms, as shown in system). McGrath et al. [34] conducted preliminary
Table 1. When the function is invoked by requests, one measurements on four serverless platforms, and found
or more function instances (depending on the request
4 Azure Functions offers two types of function hosting plans.
volume) will be launched to execute the function. After
Consumption Plan manages resources in a serverless-like way while
2 https://github.com/liangw89/faas_measure
App Service Plan is more like “container-as-a-service”. We only
3 We use AWS, Azure and Google to refer to these services. consider Consumption Plan in this paper.

134 2018 USENIX Annual Technical Conference USENIX Association


that AWS achieved better scalability, coldstart latency, AWS Lambda Azure Functions

and throughput than Azure and Google. VM1 VM2 VM1


website instance_id: abcd…
A concurrent study from Lloyd et al. [33] investigated Instance root ID:
sandbox-root-aaaaaa
Instance root ID:
sandbox-root-bbbbbb

the factors that affect application performance in AWS A func1 A B


B func3
and Azure. The authors developed a heuristic to B func3 func1

identify the VM a function runs on in AWS based on A func2


func2
the VM uptime in /proc/stat. Our experimental
evaluation suggests that their heuristic is unreliable (see
§4.5), and that the conclusions they made using it are Figure 2: VM and function instance organization in AWS
mostly inaccurate. Lambda and Azure Functions. A rectangle represents a
In our work, we design a reliable method for identi- function instance. A or B indicates different tenants.
fying instance hosts, and use systematic experiments to
inspect resource scheduling and utilization. providers and follow the official instructions to configure
the time synchronization service in the vantage points. 5
3 Methodology We implemented the measurement function in various
We take the viewpoint of a serverless user to characterize languages, but most experiments used Python 2.7 and
serverless platforms’ architectures, performance, and Nodejs 6.* as the language runtime (the top 2 most
resource management efficiency. We set up vantage popular languages in AWS according to Newrelic
points in the same cloud provider region to manage and [25]). We invoked the functions via synchronous HTTP
invoke functions from one or more accounts via official requests. Most of our measurements were done from
APIs, and leverage the information available to functions July–Dec 2017.
to determine important characteristics. We repeated the Ethical considerations. We built our measurement
same experiment under various settings by adjusting functions in a way that should not cause undue burden
function configuration and workloads to determine the on platforms or other tenants. In most experiments,
key factors that could affect measurement results. In the the function did no more than collecting necessary
rest of the paper, we only report on the relevant factors information and sleeping for a certain amount of time.
affecting the experiment results. Once we discovered performance issues we limited our
We integrate all the necessary functionalities and tests to not DoS other tenants. We only conducted
subroutines into a single function that we call a mea- small-scale tests to inspect the security issues but did not
surement function. A measurement function performs further exploit them.
two tasks: (1) collect invocation timing and function
instance runtime information, and (2) run specified 4 Serverless Architectures Demystified
subroutines (e.g., measuring local disk I/O throughput, We combine two approaches to infer the architectures
network throughput) based on received messages. The of AWS Lambda, Google Cloud Functions, and Azure
measurement function collects runtime information via Functions: (1) reviewing official documents, related
the proc filesystem on Linux (procfs), environment online articles and discussions, and (2) measurements
variables, and system commands. It also reports — analyzing the data collected from running our
on execution start and end time, invocation ID (a measurement functions many times (> 50,000) under
random 16-byte ASCII string generated by the function varying conditions. This data enables partially re-
that uniquely identify an invocation), and function verse engineering the architectures of AWS, Azure,
configurations to facilitate further analysis. and Google.
The measurement function checks the existence of a
4.1 Overview
file named InstanceID on the local disk, and if it does
not exist, creates this file with a random 16-byte ASCII AWS. A function executes in a dedicated function
string that serves as the function instance ID. Since the instance. Our measurements suggest different versions
local disk is non-persistent and has the same lifetime as of a function will be treated as distinct and executed
the associated function instance, the InstanceID file will in different function instances (we discuss outliers in
not exist for a fresh function instance, and will not be §5.5). The procfs file system exposes global statistics
modified or deleted during the function instance lifetime of the underlying VM host, not just a function instance,
once created. and contains useful information for profiling runtime,
The regions for functions were us-east-1, us-central-1, 5 AWS: http://docs.aws.amazon.com/AWSEC2/latest/
“EAST US” in AWS, Google and Azure (respectively). UserGuide/set-time.html; Google: https://developers.
The vantage points were VMs with at least 4 GB RAM google.com/time/; Azure does not offer instructions so we use the
and 2 vCPUs. We used the software recommended by the default NTP servers at http://www.pool.ntp.org/en/use.html

USENIX Association 2018 USENIX Annual Technical Conference 135


(1) Set up N distinct functions f1 , ..., fN that run the when reading /proc/uptime (/proc/meminfo) at the
following task upon receiving a RUN message: record same time.
/proc/diskstats, write 20 K – 30 K times to a file (1
byte each time), and record /proc/diskstats again.
We call the IP obtained via querying IP address lookup
(2) Invoke each function once without RUN message to launch
tools from an instance VM public IP, and the IP obtained
N function instances. from running uname command VM private IP. Function
(3) Assuming the instances of f1 , ..., fk (k instances) share the instances that share the same instance root ID have the
same instance root ID, invoke f1 , ..., fk once each with the same VM public IP and VM private IP.
RUN message and examine I/O statistics of each function
instance. Azure. The WEBSITE INSTANCE ID environment
variable serves as the VM identifier, according to official
Figure 3: I/O-based coresidency test in AWS. documents [6]. We refer to it as Azure VM ID. We used
Flush-Reload via shared DLLs to verify coresidency of
instances sharing the same Azure VM ID [43]. The
identifying host VMs, and more. From procfs, we results suggest Azure VM ID is a robust VM identifier.
found host VMs mostly have 2 vCPUs and 3.75 GB
physical RAM (same as EC2 c4.large instances). Google. We could not find any information enabling
us to identify a host. Using I/O-based coresidency did
Azure. Azure Functions uses Function Apps to not work as procfs contains no global usage statistics.
organize functions. A function app, corresponding We tried to use performance as a covert-channel (e.g.,
to one function instance, is a container that contains performing patterned I/O operations in one function
the execution environments for individual functions [5]. instance and detecting the pattern from I/O throughput
The environment variables in the function instance variation in another) but found this is not reliable, as
contain some global information about the host VM. The performance varied greatly (See §6.2).
environment variables collected suggest the host VMs
can have 1, 2 or 4 vCPUs. 4.3 Tenant isolation
One can create multiple functions in a function app Prior studies showed that co-located VMs in AWS
and run them concurrently. In our experiments, we EC2 allow attacks [36, 38, 41]. With the knowledge
assume that a function app has only one function. of instance-VM relationship, we examined how well
Google. Google isolates and filters information that can tenants’ primary resources — function instances — are
be accessed from procfs. The files under procfs only isolated. We assume that one tenant corresponds to one
report usage statistics of the current function instance. user account, and only consider VM-level coresidency.
Also, many system files and syscalls are obscured or AWS. The functions created by the same tenant will
disabled so we cannot get much information about share the same set of VMs, regardless of their con-
runtime. The /proc/meminfo and /proc/cpuinfo figurations and code. The detailed instance placement
files suggest a function instance has 2 GB RAM and 8 algorithm will be discussed in §5.1. AWS assigns
vCPUs, which we suspect is the configuration for VMs. different VMs to each tenant, since we have never seen
function instances from different tenants in the same
4.2 VM identification
VM. We conducted a cross-tenant coresidency test to
Associating function instances with VMs enables us to confirm this assumption. The basic principle is similar
perform richer analysis. The heuristic for identifying to Figure 3: in each round, we create a new function
VMs in AWS Lambda proposed by Lloyd et al., under each of the two accounts at the same time, write
though theoretically possible, has never been evaluated a random number of bytes in one function, and check the
experimentally [33]. Therefore, we looked for a more disk usage statistics in another function. We ran this test
robust method. for 1 week, but found no VM-coresidency of cross-tenant
AWS. The /proc/self/cgroup file has a special entry function instances.
that we call instance root ID. It starts with “sandbox- Azure. Azure Functions are a part of the Azure App
root-” followed by a 6-byte random string. We found service, in which all tenants share the same set of
it can be used to reliably identity a host VM. Using VMs according to Azure [2]. Hence, tenants in Azure
the I/O-based coresidency tests (shown in Figure 3), we Functions should also share VM resources. A simple test
confirmed that the instances sharing the same instance confirmed this assumption: we invoked 500 functions in
root ID are on the same VM, as the difference in the total each of two accounts and found that 30% of function
bytes written between two consecutive invocations, for fi instances were coresident with a function instance from
and fi+1 respectively, is almost the same as the number of the other account, executing in a total of 120 VMs. Note
bytes written by fi . Moreover, we can get the same kernel that as of May 2018, different tenants no longer share the
uptime (or memory usage statistics) from the instances same VMs in Azure. See §5.1 for more details.

136 2018 USENIX Annual Technical Conference USENIX Association


4.4 Heterogeneous infrastructure 128 MB 256 MB 512 MB
35
We found the VMs in all the considered services had a 30

No. of VMs
variety of configurations. The variety, likely resulting 25
20
from infrastructure upgrades, can cause inconsistent 15
function performance. To estimate the fraction of 10
different types of VM in a given service, we examined 5
0
the configurations of the host VMs of 50,000 unique 0 100 200
function instances in each service.
No. of concurrent requests
In AWS, we checked the model name and the
processor numbers in the /proc/cpuinfo, and the Figure 4: The total number of VMs being used after
MemTotal in the /proc/meminfo, and found five types sending a given number of concurrent requests in AWS.
of VMs: two E5-2666 vCPUs (2.90 GHZ), two E5-
2680 vCPUs (2.80 GHZ), two E5-2676 vCPUs (2.40
dedicated adversary might use it as a steppingstone to
GHZ), two E5-2686 vCPUs (2.30 GHZ), and one E5-
more sophisticated attacks. Overall, accesses to runtime
2676 vCPUs. These types account for 59.3%, 37.5%,
information, unless necessary, should be restricted for
3.1%, 0.09% and 0.01% of 20,447 distinct VMs.
security purposes. Additionally, providers should expose
Azure shows a greater diversity of VM configurations. such information in an auditable way, i.e., via API calls,
The instances in Azure report various vCPU counts: so they are able to detect and block suspicious behaviors.
of 4,104 unique VMs, 54.1% use 1 vCPU, 24.6% use
2 vCPUs, and 21.3% use 4 vCPUs. For a given 5 Resource Scheduling
vCPU count, there are three CPU models: two Intel
We examine how instances and VMs are scheduled in the
and one AMD. Thus, nine (at least) different types
three serverless platforms in terms of instance coldstart
of VMs are being used in Azure. Performance may
latency, lifetime, scalability, and more.
vary substantially based on what kind of host (more
specifically, the number of vCPUs) runs the function. 5.1 Scalability and instance placement
See §6 for more details. Elastic, automatic scaling in response to changes in
In Google, the model name is always “unknown”, but demand is a main advertised benefit of the serverless
there are 4 unique model versions (79, 85, 63, 45), model. We measure how well platforms scale up.
corresponding to 47.1%, 44.7%, 4.2%, and 4.0% of
We created 40 measurement functions of the same
selected function instances.
memory size f1 , f2 , . . . , f40 and invoked each fi with 5i
4.5 Discussion concurrent requests. We paused for 10 seconds between
Being able to identify VMs in AWS is essential for our batches of invocations to cope with rate limits in the
measurements. It helps to reduce noise in experiments platforms. All measurement functions simply sleep for
and get more accurate results. For the sake of 15 seconds and then return. For each configuration we
comparison, we evaluated the heuristic designed by performed 50 rounds of measurements.
Lloyd et al. [33]. The heuristic assumes that different AWS. AWS is the best among the three services
VMs have distinct boot times, which can be obtained with regard to supporting concurrent execution. In
from /proc/stat, and group function instances based our measurements, N concurrent invocations always
on the boot time. We sent 10 – 50 concurrent requests produced N concurrently running function instances.
at a time to 1536 MB functions for 100 rounds, used AWS could easily scale up to 200 (the maximum
our methodology (instance root ID + IP) to label the measured concurrency level) fresh function instances.
VMs, and compared against the heuristic. The heuristic We observed that 3,328 MB was the maximum
identified 940 VMs as 600 VMs, so 340 (36%) VMs aggregate memory that can be allocated across all
were incorrectly labeled. So, we conclude this heuristic function instances on any VM in AWS Lambda. AWS
is not reliable. Lambda appears to treat instance placement as a bin-
None of these serverless providers completely hide packing problem, and tries to place a new function
runtime information from tenants. More knowledge of instance on an existing active VM to maximize VM
instance runtime and the backend infrastructure could memory utilization rates, i.e., the sum of instance
make finding vulnerabilities in function instances easier memory sizes divided by 3,328. We invoked a single
for an adversary. In prior studies, procfs has been function with sets of concurrent requests, increasing
used as a side-channel [9, 21, 46]. In the serverless from 5 to 200 with a step of 5, and recorded the
setting, one actually can use it to monitor the activity total number of VMs being used after each number
of coresident instances; while seemingly harmless, a of requests. A few examples are shown in Figure 4.

USENIX Association 2018 USENIX Annual Technical Conference 137


#vCPU Total 1 2 3 4 >4 the instances of a target victim. In each round of this
1 61.3 16.6 24.6 13.7 4.9 1.5
2 19.5 7.3 7.1 3.3 1.4 0.4
test, we launched either 5 or 100 function instances
4 19.2 7.6 6.2 3.9 1.3 0.2 from one account (the victim) and 500 simultaneous
All 100 31.5 37.9 20.9 7.6 2.1 function instances from another account (the attacker).
On average, 0.12% (3.82%) of the 500 attacker instances
Table 5: The average (over 10 runs) probabilities (as per-
were coresident with the 5 (100) victim instances in each
centages) of getting N-way single-account coresidency
round (10 rounds in total). So, it was possible to achieve
(for N ∈ {1, 2, 3, 4, } and N > 4, when launching 1,000
cross-tenant coresidency even for a few targets. In the
function instances in Azure. Here N = 1 indicates no
test with 100 victim instances, we were able to obtain
coresidency among the functions.
up to 5 attacker instances on the same VM. Security
implications will be discussed in §5.6.
The number of active VMs are close to the “expected” We repeated the coresidency tests in May 2018 but
number if AWS maximizes VM memory utilization. could not find any cross-tenant coresident instances,
Quantitatively speaking, more than 89% of VMs we got even in the test in which we tired 500 victim instances.
in the test achieved 100% memory utilization. Sending Therefore, we believe that Azure has fixed the cross-
concurrent requests to different functions resulted in tenant coresidency issue.
the same pattern, indicating placement is agnostic to
Google. Google failed to provide our desired scalability,
function code.
even though Google claims HTTP-triggered functions
In a further test we sent 10 sets of random numbers
will scale to the desired invocation rate quickly [13].
of concurrent requests to randomly-chosen functions of
In general, only about half of the expected number of
varied memory sizes over 50 runs. AWS’s placement still
instances, even for a low concurrency level (e.g., 10),
worked efficiently: the average VM memory utilization
could be launched at the same time, while the remainder
rate across VMs in the same run ranged from 84.6% to
of the requests were queued.
100%, with a median of 96.2%.
5.2 Coldstart and VM provisioning
Azure. Azure documentation states that it will
automatically scale up to at most 200 instances for We use coldstart to refer to the process of launching
a single Nodejs-based function and at most one new a new function instance. For the platform, a coldstart
function instance can be launched every 10 seconds [7]. may involve launching a new container, setting up the
However, in our tests of Nodejs-based functions, we saw runtime environment, and deploying a function, which
at most 10 function instances running concurrently for a will take more time to handle a request than reusing an
single function, no matter how we changed the interval existing function instance (warmstart). Thus, coldstarts
between invocations. All the requests were handled by a can significantly affect application responsiveness and,
small set of function instances. None of the concurrently in turn, user experience.
running instances were on the same VM. So, it appears For each platform, we created 1,000 distinct functions
that Azure does not try to co-locate function instances of of the same memory and language and sequentially
the same function on the same VMs. invoked each of them twice to collect its coldstart
We conducted a single-account coresidency test to and warmstart latency. We use the difference of
examine how function instances are placed on VMs of invocation send time (recorded by the vantage point)
different numbers of vCPUs. We invoked 100 different and function execution start time (recorded by the
functions from one account at a time until we had 1,000 function) as an estimation of its coldstart/warmstart
concurrent, distinct function instances running. We then latency. As baselines, the median warmstart latency in
checked for co-residency, and repeated the entire test AWS, Google, and Azure were about 25, 79 and 320 ms
10 times. (respectively) across all invocations.
We observed at most 8 instances on a single 1/2/4- AWS. We examine two types of coldstart events: a
vCPU VM. Co-resident instances tend to be on 1-vCPU function instance is launched (1) on a new VM that
VMs (presumably because there are more 1-vCPU VMs we have never seen before and (2) on an existing VM.
for Azure Functions). We show the breakdown of co- Intuitively, case (1) should have significantly longer
residency results in Table 5. In general, co-residency is coldstart latency than (2) because case (1) may involve
undesirable for users wanting many function instances, starting a new VM. However, we found case (1) was only
as contention between instances on low-end VMs will slightly longer than (2) in general. The median coldstart
exacerbate performance issues. latency in case (1) was only 39 ms longer than (2) (across
We further conducted a cross-account coresidency test all settings). Plus, the smallest VM kernel uptime (from
in a more realistic scenario where an attacker wants /proc/uptime) we found was 132 seconds, indicating
to place her function instances on the same VM with that the VM has been launched before the invocation.

138 2018 USENIX Annual Technical Conference USENIX Association


Existing VM New VM Provider-Memory Median Min Max STD
AWS-128 265.21 189.87 7048.42 354.43
1,000 AWS-1536 250.07 187.97 5368.31 273.63
Coldstart latency (ms)
Google-128 493.04 268.5 2803.8 345.8
Google-2048 110.77 52.66 1407.76 124.3
Azure 3640.02 431.58 45772.06 5110.12
500
Table 7: Coldstart latencies (in ms) in AWS, Google, and
Azure using Nodejs 6.* based functions for comparison.
200
th 2.7 8
th .7- 2
py 2.7 024

th 3.6 6
no 3.6 28
no js4 36
no js4. -128

de .1 6
.1 128

a 6
a8 28
6
150
py on -12
py n2 -51

py on 53

no ejs6 -153

jav 153

53
on -1
de -15

jav 8-1
on 1
th -1

js6 0-

-1
th .7

d e .3

0-
d 3
py n2

100
o
th

o
py

50
AWS
0
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
1600
Figure 6: Median coldstart latency with min-max error 1400
1200
Google
bars (across 1,000 rounds) under different combinations 1000
800

b
600
of function languages and memory sizes in AWS. Y-axis 400
200
is truncated at 1,000 ms. 0
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
18000
16000 Azure
14000
12000
So, AWS has a pool of ready VMs. The extra delays in 10000
8000
6000
case (1) are more likely introduced by scheduling (e.g., 4000
2000
selecting a VM) rather than launching a VM. 0
1 12 24 36 48 60 72 84 96 108 120 132 144 156 168
Mon Tue Wed Thu Fri Sat Sun
Our results are consistent with prior observations:
function memory and language affect coldstart la- Figure 8: Coldstart latency (in ms) over 168 hours. All
tency [10], as shown in Figure 6. Python 2.7 achieves the measurements were started at right after midnight on
the lowest median coldstart latencies (167–171 ms) a Sunday. Each data point is the median of all coldstart
while Java functions have significantly higher latencies latencies collected in a given hour. For clarity, the y-axes
than other languages (824–974 ms). Coldstart latency use different ranges for each service.
generally decreases as function memory increases. One
possible explanation is that AWS allocates CPU power
168 hours (7 days), and calculated the median of the
proportionally to the memory size; with more CPU
coldstart latencies collected in a given hour. The changes
power, environment set up becomes faster (see §6.1).
of coldstart latency are shown in Figure 8. The coldstart
A number of function instances may be launched
latencies in AWS were relatively stable, as were those
on the same VM concurrently, due to AWS’s instance
in Google (except for a few spikes). Azure had the
placement strategy. In this case, the coldstart latency
highest network variation over time, ranging from about
increases as more instances are launched simultaneously.
1.5 seconds up to 16 seconds.
For example, launching 20 function instances of a
We repeated our coldstart measurements in May 2018.
Python 2.7-based function with 128 MB memory on a
We did not find significant changes in coldstart latency
given VM took 1,321 ms on average, which is about 7
in AWS. But, the coldstart latencies became 4x slower
times slower than launching 1 function instance on the
on average in Google, probably due to its infrastructure
same VM (186 ms).
update in February 2018 [15], and 15x better in Azure.
Azure and Google. The median coldstart latency in This result demonstrates the importance of developing a
Google ranged from 110 ms to 493 ms (see Table 7). measurement platform for serverless systems (similar to
Google also allocates CPU proportionally to memory, [39] for IaaS) to do continuous measurements for better
but in Google memory size has greater impact on performance characterization.
coldstart latency than in AWS. It took much longer
to launch a function instance in Azure, though their 5.3 Instance lifetime
instances are always assigned 1.5 GB memory. The A serverless provider may terminate a function instance
median coldstart latency was 3,640 ms in Azure. even if still in active use. We define the longest time
Anecdotes online [3] suggest that the long latency is a function instance stays active as instance lifetime.
caused by design and engineering issues in the platform Tenants prefer long lifetimes because their applications
that Azure is both aware of and working to improve. will be able to maintain in-memory state (e.g., database
Latency variance. We collected the coldstart latencies connections) longer and suffer less from coldstarts.
of 128 MB, Python 2.7 (AWS) or Nodejs 6.* (Google To estimate instance lifetime, we set up functions
and Azure) based functions every 10 seconds for over of different memory sizes and languages, and invoked

USENIX Association 2018 USENIX Annual Technical Conference 139


AWS Google Azure

1 128MB,1 req/5s 1 1
1536MB,1 req/5s
0.8 128MB,1 req/60s 0.8 0.8
1536MB,1 req/60s
Fraction

Fraction

Fraction
0.6 0.6 0.6
0.4 0.4 128MB,1 req/5s 0.4
2048MB,1 req/5s
0.2 0.2 128MB,1 req/60s 0.2 1 req/5s
2048MB,1 req/60s 1 req/60s
0 0 0
0 100 200 300 400 500 0 200 400 600 800 1000 0 20 40 60 80 100 120 140
Lifetime (mins.) Lifetime (mins.) Lifetime (hours.)

Figure 9: The CDFs of instance lifetime in AWS, Google, and Azure under different memory and request frequency.

them at different frequencies (one request per 5/30/60 shut down after 26 minutes. When their host VM is
seconds). The lifetime of a function instance is the “idle”, i.e., no active instances on that VM, idle function
difference between the first time and the last time we saw instances will be recycled the following way: Assuming
the instance. We ran the experiment for 7 days (AWS that the function instances of N functions f1 , . . . , fN are
and Google) or longer (Azure) so that we could collect at coresident on a VM, and k fi instances are from fi . For a
least 50 lifetimes under a given setting. given function fi , AWS will shut down bk fi /2c of the
In general, Azure function instances have significantly idle instances of fi every 300 (more or less) seconds
longer lifetimes than AWS and Google as shown in until there are two or three instances left, and eventually
Figure 9. In AWS, the median instance lifetime across shut down the remaining instances after 27 minutes (we
all settings was 6.2 hours, with the maximum being 8.3 have tested with k fi = 5, 10, 15, 20). AWS performs these
hours. The host VMs in AWS usually lives longer: operations to f1 , . . . , fN on a given VM independently,
the longest observed VM kernel uptime was 9.2 hours. and also on individual VMs independently. Function
When request frequency increases instance lifetime tends memory or language does not affect maximum idle time.
to become shorter. Other factors have little effect on If there are active instances on the VM, instances
lifetime except in Google, where instances of larger can stay inactive for a longer time. We kept one
memory tend to have longer lifetimes. For example, instance active on a given VM by sending a request
when being invoked every five seconds, the lifetimes every 10 seconds and found: (1) AWS still adopted the
were 3–31 minutes and 19–580 minutes for 90% of the same strategy to recycle the idle instances of the same
instances of 128 MB and 2,048 MB memory in Google, function, but (2) somehow idle time was reset for other
respectively. So, for functions with small memory under coresident instances. We observed some idle instances
a heavy workload, Google seems to launch new instances could stay idle in such cases for 1–3 hours.
aggressively rather than reusing existing instances. This Azure and Google. In Azure, we could not find a
can increase the performance penalty from coldstarts consistent maximum instance idle time. We repeated
5.4 Idle instance recycling the experiment several times on different days and found
the maximum idle times of 22, 40, and more than 120
To efficiently use resources, Serverless providers shut-
minutes. In Google, the idle time of instances could be
down idle instances to recycle allocated resources (see,
more than 120 minutes. After 120 minutes, instances
e.g., [32]). We define the longest time an instance can
remained active in 18% of our experiments.
stay idle before getting shut down as instance maximum
idle time. There is a trade-off between long and short 5.5 Inconsistent function usage
idle time, as maintaining more idle instances is a waste Tenants expect the requests following a function update
of VM memory resources, while fewer ready-to-serve should be handled by the new function code, especially
instances cause more coldstarts. if the update is security-critical. However, we found in
We performed a binary search on the minimum delay AWS there was a small chance that requests could be
tidle between two invocations of the function that resulted handled by an old version of the function. We call such
in distinct function instances. We created a function, cases inconsistent function usage. In the experiment, we
invoked it twice with some delay between 1 and 120 sent k = 1 or k = 50 concurrent requests to a function,
minutes, and determined whether the two requests used and did this again without delay after updating one of
the same function instance. We repeated until we the following aspects of the function: IAM role, memory
identified tidle . We confirmed tidle (to minute granularity) size, environment variables, or function code. For a
by repeating the measurement 100 times for delays close given setting, we performed these operations for 100
to tidle . rounds. When k = 1, 1%–4% of the tests used an
AWS. An instance could usually stay inactive for at most inconsistent function. When there were more associated
27 minutes. In fact, in 80% of the rounds instances were instances before the update (k = 50), 80% of our

140 2018 USENIX Annual Technical Conference USENIX Association


rounds had at least one inconsistent function. Looking
CPU utilization Mem*2/3328 CPU utilization
across all tests from all rounds, we found that 3.8% of 1
1
instances ran an inconsistent function. Examining the 0.8

Fraction

Fraction
cases, we found two situations: (1) AWS launched new 0.6
0.4 0.5
instances of the outdated function (2% of all the cases),
0.2
and (2) AWS reused existing instances of the outdated 0
function. Inconsistent instances never handle more than 0 500 1,000 1,500 0 1,000 2,000
Function memory (MB) Function memory (MB)
one request before terminating (note that max execution
time is 300 s in AWS), but still, a considerable faction of (a) AWS (b) Google
requests may fail to get desired results. Figure 10: The median instance CPU utilization rates
As we waited for a longer time after the function with min-max error bars in AWS and Google as function
update to send requests, we found fewer inconsistent memory increases, averaged across 1,000 instances for a
cases, and eventually zero cases with a 6-second given memory size.
waiting time. So, we suspect that the inconsistency
issues are caused by race conditions in the instance 1 vCPU 2 vCPUs 4 vCPUs 1 vCPU 2 vCPUs 4 vCPUs
scheduler. The results suggest coordinating function

CPU utilization rate


1 0.8
update among multiple function instances is challenging

Fraction
0.6
as the scheduler cannot do an atomic update. 0.5
0.4
5.6 Discussion 0
0.2

0 0.2 0.4 0.6 0.8 2 4 6


We believe our results motivate further study on CPU utilization rate No. of coresident instances
designing more efficient instance scheduling algorithms (a) Azure: CPU utilization CDF (b) Azure: CPU vs. coresideny
and robust schedulers to further improve VM resource
utilization, i.e., to maximize VM memory usage, reduce Figure 11: (a) CDFs of CPU utilization rates of instances
scheduling latency, and promptly propagate function (1,000 for each type) and (b) the median CPU utilization
updates while guaranteeing consistency. rates across a given number of coresident instances (50
Loading modules or libraries could introduce high rounds) in Azure, with min-max error bars.
latency during coldstart [1, 3]. To reduce coldstart la-
tency, providers might need to adopt more sophisticated to isolate their tenants and achieve better security and
library loading mechanisms, for example, using library privacy, AWS may need to provide a finer-grained VM
caching to speed up this process, and resolving the isolation mechanism, i.e., allocating a set of VMs to each
library dependence before deployment and only loading IAM role instead of to each account.
required libraries.
6 Performance Isolation
Cross-tenant VM sharing in Azure plus the ability to
run arbitrary binaries in the function instance could make In this section, we investigate performance isolation. We
applications vulnerable to many kinds of side-channel mainly focus on AWS and Azure, where our ability to
attacks [16, 17, 20, 45]. We did not examine how well achieve coresidency allows more refined measurements.
Azure can tackle the potential threats resulting from We also present some basic performance statistics for
cross-tenant VM sharing, and leave the actual security instances in Google that surface seeming contention with
vulnerable as an open question. other tenants.
AWS’s bin-packing placement may bring security 6.1 CPU utilization
issues to an application, depending on its design. When
a multi-tenant application in Lambda uses IAM roles To measure CPU utilization, our measurement function
to isolate its tenants, function instances from different continuously records timestamps using time.time()
application tenants still share the same VMs. We found (Python) or Date.now() (Nodejs) for 1,000 ms. The
two real services that use this pattern: Backand [8] and metric instance CPU utilization rate is defined as
Zapier [44]. Both allow their tenants to deploy functions the fraction of the 1,000 ms for which a timestamp
in Lambda in some way. We successfully achieved was recorded.
cross-account function coresidency in Backand in just a AWS. According to AWS, a function instance’s CPU
few tries, while failing in Zapier due to its rate limits power is proportional its pre-configured memory [26].
and large user base (1 M+). Nevertheless, we could However, AWS does not give details of how exactly
still observe the changes of procfs caused by other CPU time is allocated to instances. We measured the
Zapier tenants’ applications, which may admit side- CPU utilization rates on 1,000 distinct function instances
channels [9, 21, 46]. For these multi-tenant applications and show the median rate for a given memory size in

USENIX Association 2018 USENIX Annual Technical Conference 141


Aggr. throughput (Mbps)

Aggr. throughput (Mbps)


Aggr. throughput (MB/s)

Aggr. throughput (MB/s)


128 MB 256 MB 128 MB 256 MB 1 vCPU 2 vCPUs 4 vCPUs 1 vCPU 2 vCPUs 4 vCPUs
100
60 600 1,000
80
40 400 60
40 500
20 200
20
0 0 0 0
0 5 10 15 20 0 5 10 15 20 2 4 6 2 4 6
No. of coresident instances No. of coresident instances No. of coresident instances No. of coresident instances
(a) AWS: I/O (b) AWS: Network (c) Azure: I/O (d) Azure: Network

Figure 12: Aggregate I/O and network throughput across coresident instances as concurrency level increases. The
coresident instances perform the same task simultaneously. The values are the median values across 50 rounds.

Figure 10a. Instances with higher memory get more increased, the CPU utilization of instances on 1-vCPU
CPU cycles. The median instance CPU utilization rate VMs drops more dramatically, as shown in Figure 11b.
increased from 7.7% to 92.3% as memory increased
from 128 to 1,536 MB, and the corresponding standard 6.2 I/O and network
deviations (SD) were 0.7% and 8.7%. When there is To measure I/O throughput, our measurement functions
no contention from other coresident instances, the CPU in AWS and Google used the dd command to write
utilization rate of an instance can vary significantly, 512 KB of data to the local disk 1,000 times (with
resulting in inconsistent application performance. That fdatasync and dsync flags to ensure the data is
said, an upper bound on CPU share is approximated by written to disk). In Azure, we performed the same
2 ∗ m/3328, where m is the memory size. operations using a Python script (which used os.fsync
We further examine how CPU time is allocated among to ensure data is written to disk). For network
coresident instances. We let colevel be the number of throughput measurement, the function used iperf 3.13
coresident instances and a colevel of 1 indicates only with default configurations to run the throughput test for
a single instance on the VM. For memory size m, we 10 seconds with different same-region iperf servers, so
selected a colevel in the range 2 to b3328/mc. We that iperf server-side bandwidth was not a bottleneck.
then measured the CPU utilization rate in each of the The iperf servers used the same types of VMs as the
coresident instances. Examining the results over 20 vantage points.
rounds of tests, we found that the currently running
AWS. Figure 12 shows aggregate I/O and network
instances share CPU fairly, since they had nearly the
throughput across a given number of coresident in-
same CPU utilization rate (SD <0.5%). With more
stances, averaged across 50 rounds. All the coresident
coresident instances, each instance’s CPU share becomes
instances performed the same measurement concur-
slightly less than, but still close to 2 ∗ m/3328 (SD
rently. Though the aggregate I/O and network throughput
<2.5% in any setting).
remains relatively stable, each instance gets a smaller
The above results indicate that AWS tries to allocate a
share of the I/O and network resources as colevel
fixed amount of CPU cycles to an instance based only on
increases. When colevel increased from 1 to 20, the
function memory.
average I/O throughput per 128 MB instance dropped
Azure and Google. Google adopts the same mechanism by 4x, from 13.1 Mbps to 2.9 Mbps, and network
as AWS to allocate CPU cycles based on function throughput by 19x, from 538.6 MB/s to 28.7 MB/s.
memory [13]. In Google, the median instance CPU Coresident instances get less share of the network with
utilization rates ranged from 11.1% to 100% as function more contention. We calculate the Coefficient of Varia-
memory increased. For a given memory size, the tion (CV), which is defined as SD divided by the mean,
standard deviations of the rates across different instances for each colevel. A higher CV suggests the performance
are very low (Figure 10b), ranging from 0.62% to 2.30%. of instances differ more. For 128 MB instances, the CV
Azure has a relatively high variance in the CPU of network throughput ranged from 9% to 83% across all
utilization rates (14.1%–90%), while the median was colevels, suggesting significant performance variability
66.9% and the SD was 16%. This is true even though the due to contention with coresident instances. In contrast,
instances are allocated the same amount of memory. The the I/O performance was similar between instances (CV
breakdown by vCPU number shows that the instances on of 1% to 6% across all colevels). However, the I/O
4-vCPU VMs tend to gain higher CPU shares, ranging performance is affected by function memory (CPU) for
from 47% to 90% (Figure 11a). The distributions of small memory sizes (≤ 512 MB), and therefore the I/O
utilization rates of instances on 1-vCPU VMs and 2- throughput of an instance could degrade more when
vCPU VMs are in fact similar; however, when colevel competing with instances of higher memory.

142 2018 USENIX Annual Technical Conference USENIX Association


Azure. In Azure, the I/O and network throughput of an longest time it stayed alive was 21 hours. We could
instance also drops as colevel increases, and fluctuates not find any logs of the network activity performed by
due to contention from other coresident instances. Even the background process and were not charged for its
more interestingly, resource allocation is differentiated resource consumption.67 In contrast, one could run such
based on what type of VM a function instance happens background script in Azure but Azure logged all the
to be scheduled on. As shown in Figure 12, the 4-vCPU activity. Our observations suggest that: (1) In Azure and
VMs could get 1.5x higher I/O and 2x higher network Google the function instance execution context will not
throughput than the other types of VMs. The 2-vCPU be frozen after an invocation, as opposed to AWS; and
VMs have higher I/O throughput than 1-vCPU VMs, but (2) Google does resource accounting via monitoring the
similar network throughput. Node.js process rather than the entire function instance.
Google. In Google, both the measured I/O and network One can exploit the billing issue in Google to run
throughput increase as function memory increases: sophisticated tasks at negligible cost. For a function
the median I/O throughput ranged from 1.3 MB/s to instance with 2 GB memory and 2.4 GHz CPU, one only
9.5 MB/s, and the median network throughput ranged needs to pay for a few invocations ($0.0000029/100 ms,
from 24.5 Mbps to 172 Mbps. The network throughput with 2 M free calls) to get the same computing resources
measured from different instances with the same memory as using a g1-small instance ($0.0257/hour) on Google
size can vary substantially. For instance, the network Cloud Platform.
throughput measured in the 2,048 MB function instances
CPU accounting. In Google, we found there was an
fluctuated between 0.2 Mbps and 321.4 Mbps. We found
80% chance that a just-launched function instance (of
two cases: (1) all instances throughputs’ fluctuated
any memory size other than 2,048 MB) could temporally
during a given period of time, irrespective of memory
gain more CPU time than expected. Measuring the CPU
sizes, or (2) a single instance temporarily suffered from
utilization rates and the completion times of a CPU-
degraded throughput. Case (1) may be due to changes in
intensive task, we confirmed that the instances that one
network conditions, while case (2) leads us to suspect
expects to have 8%–58% of the CPU time (see §6) had
that GCF tenants actually share hosts and suffer from
near 100% of the CPU time, the same as that given to
resource contention.
2,048 MB instances. The instance can retain the CPU
6.3 Discussion resources until the next invocation. Note that if one wants
AWS and Azure fail to provide proper performance to conduct performance measurements in Google, this
isolation between coresident instances, and so contention issue could introduce a lot of noise (we appropriately
can cause considerable performance degradation. In controlled for it in previously reported experiments).
AWS, the fact that they bin-pack function instances from
the same account onto VMs means that scaling up a 8 Conclusion
function places the same function on the same VM,
resulting in resource contention and prolonged execution In this paper, we provided insights into architectures,
time (not to mention a longer coldstart latency). Azure resource utilization, and the performance isolation effi-
has similar issues, with the additional issue that ciency of three modern serverless computing platforms.
contention within VMs arises between accounts. The We discovered a number of issues, raised from either
latter also opens up the possibility for cross-tenant specific design decisions or engineering, with regard to
degradation of service attacks. security, performance, and resource accounting in the
We leave developing new, efficient isolation mech- platforms. Our results surface opportunities for research
anisms that take the special characteristics of server- on improving resource utilization and isolation in future
less (e.g., frequent instance creation, short-lived in- serverless platform designs.
stances, and small memory-footprint functions) as
considerations for future work. Acknowledgements
7 Resource accounting The authors thank engineers from Microsoft, Amazon,
and Google for their feedback and helpful discussions.
In the course of our study, we found several resource
This work was supported in part by NSF grants 1558500,
accounting issues that can be abused by tenants.
1330308, and 1718084.
Background processes. We found in Google one
could execute an external script in the background that 6 Google has a free tier of service, but even after that is used up the
continued to run even after the function invocation background process consumption went unbilled.
concluded. The script we ran posted a 10 M file every 7 We have reported this issue to Google and Google has been

10 seconds to a server under our control, and the working on fixing it as of May 2018.

USENIX Association 2018 USENIX Annual Technical Conference 143


References [19] How long does AWS Lambda keep your idle functions around
before a cold start? https://read.acloud.guru/how-
[1] SOCK: Rapid task provisioning with serverless-optimized long-does-aws-lambda-keep-your-idle-functions-
containers. In 2018 USENIX Annual Technical Conference around-before-a-cold-start-bf715d3b810, 2017.
(USENIX ATC 18) (Boston, MA, 2018), USENIX Association.
[20] I RAZOQUI , G., E ISENBARTH , T., AND S UNAR , B. S$A:
[2] Azure app service, virtual machines, service fabric, and cloud a shared cache attack that works across cores and defies vm
services comparison. https://docs.microsoft.com/en- sandboxing and its application to AES. In Security and Privacy
us/azure/app-service/choose-web-site-cloud- (SP), 2015 IEEE Symposium on (2015), IEEE, pp. 591–604.
service-vm, 2017.
[21] JANA , S., AND S HMATIKOV, V. Memento: Learning secrets
[3] Cold start taking a long time in consumption mode for from process footprints. In Security and Privacy (SP), 2012 IEEE
C# Azure Function. https://github.com/Azure/azure- Symposium on (2012), IEEE, pp. 143–157.
functions-host/issues/838, 2017.
[22] J ONAS , E., P U , Q., V ENKATARAMAN , S., S TOICA , I., AND
[4] Consumption plan scaling issues. https://github.com/ R ECHT, B. Occupy the cloud: distributed computing for the 99%.
Azure/azure-webjobs-sdk-script/issues/1206, 2017. In Proceedings of the 2017 Symposium on Cloud Computing
[5] Create your first function in the Azure portal. https: (2017), ACM, pp. 445–451.
//docs.microsoft.com/en-us/azure/azure- [23] K RUG , A. Hacking serverless runtimes profiling Lambda, Azure,
functions/functions-create-first-azure-function, and more., 2017.
2017.
[24] Lambda CPU relative to which instance type? https:
[6] Azure runtime environment. https://github.com/ //forums.aws.amazon.com/message.jspa?messageID=
projectkudu/kudu/wiki/Azure-runtime-environment, 614558, 2014.
2017. [25] AWS Lambda in production. https://blog.newrelic.com/
[7] Azure Functions scale and hosting. https://docs. 2017/11/21/aws-lambda-state-of-serverless/, 2017.
microsoft.com/en-us/azure/azure-functions/ [26] Configuring Lambda functions. https://docs.aws.amazon.
functions-scale, 2017. com/lambda/latest/dg/resource-model.html, 2017.
[8] Backand. https://www.backand.com/, 2018. [27] How does proportional CPU allocation work with AWS
[9] C HEN , Q. A., Q IAN , Z., AND M AO , Z. M. Peeking into Lambda? https://engineering.opsgenie.com/how-
your app without actually seeing it: UI state inference and does-proportional-cpu-allocation-work-with-aws-
novel android attacks. In USENIX Security Symposium (2014), lambda-41cd44da3cac, 2018.
pp. 1037–1052. [28] The occasional chaos of AWS Lambda runtime performance.
[10] How does language, memory and package size affect cold starts https://blog.symphonia.io/the-occasional-chaos-
of AWS Lambda? https://read.acloud.guru/does- of-aws-lambda-runtime-performance-880773620a7e,
coding-language-memory-or-package-size-affect- 2017.
cold-starts-of-aws-lambda-a15e26d12c76, 2017. [29] My accidental 35x speed increase of AWS Lambda
functions. https://serverless.zone/my-accidental-
[11] Understanding AWS Lambda performance. https:
3-5x-speed-increase-of-aws-lambda-functions-
//blog.newrelic.com/2017/01/11/aws-lambda-cold-
6d95351197f3, 2017.
start-optimization/, 2017.
[30] Comparing AWS Lambda performance when using Node.js,
[12] Understanding AWS Lambda coldstarts. https:
Java, C# or Python. https://read.acloud.guru/
//www.iopipe.com/2016/09/understanding-aws-
comparing-aws-lambda-performance-when-using-
lambda-coldstarts/, 2016.
node-js-java-c-or-python-281bef2c740f, 2017.
[13] Google Cloud Functions quotas. https://cloud.google. [31] AWS Lambda performance issues. https://stackoverflow.
com/functions/quotas, 2017. com/questions/43089879/aws-lambda-performance-
[14] G LIKSON , A., NASTIC , S., AND D USTDAR , S. Deviceless edge issues, 2017.
computing: extending serverless computing to the edge of the [32] Understanding container reuse in AWS lambda. https:
network. In Proceedings of the 10th ACM International Systems //aws.amazon.com/blogs/compute/container-reuse-
and Storage Conference (2017), ACM, p. 28. in-lambda/, 2014.
[15] Google cloud functions release notes. https://cloud. [33] L LOYD , W., R AMESH , S., C HINTHALAPATI , S., LY, L., AND
google.com/functions/docs/release-notes, 2018. PALLICKARA , S. Serverless computing: An investigation of
[16] G RUSS , D., M AURICE , C., WAGNER , K., AND M ANGARD , S. factors influencing microservice performance.
Flush+Flush: a fast and stealthy cache attack. In International [34] M C G RATH , G., AND B RENNER , P. R. Serverless computing:
Conference on Detection of Intrusions and Malware, and Design, implementation, and performance. In Distributed
Vulnerability Assessment (2016), Springer, pp. 279–299. Computing Systems Workshops (ICDCSW), 2017 IEEE 37th
[17] G RUSS , D., S PREITZER , R., AND M ANGARD , S. Cache International Conference on (2017), IEEE, pp. 405–410.
template attacks: Automating attacks on inclusive last-level [35] P ETERSON , E. Serverless security and things that go bump
caches. In USENIX Security Symposium (2015), pp. 897–912. in the night. https://www.infoq.com/presentations/
serverless-security, 2017.
[18] H ENDRICKSON , S., S TURDEVANT, S., H ARTER , T.,
V ENKATARAMANI , V., A RPACI -D USSEAU , A. C., AND [36] R ISTENPART, T., T ROMER , E., S HACHAM , H., AND S AVAGE ,
A RPACI -D USSEAU , R. H. Serverless computation with S. Hey, you, get off of my cloud: exploring information leakage
openlambda. In Proceedings of the 8th USENIX Conference on in third-party compute clouds. In Proceedings of the 16th ACM
Hot Topics in Cloud Computing (2016), USENIX Association, conference on Computer and communications security (2009),
pp. 33–39. ACM, pp. 199–212.

144 2018 USENIX Annual Technical Conference USENIX Association


[37] Security and serverless. https://read.acloud.guru/
security-and-serverless-ec52817385c4, 2017.
[38] VARADARAJAN , V., Z HANG , Y., R ISTENPART, T., AND S WIFT,
M. M. A placement vulnerability study in multi-tenant public
clouds. In USENIX Security Symposium (2015), pp. 913–928.
[39] WANG , L., NAPPA , A., C ABALLERO , J., R ISTENPART, T.,
AND A KELLA , A. Whowas: A platform for measuring web
deployments on iaas clouds. In Proceedings of the 2014
Conference on Internet Measurement Conference (2014), ACM,
pp. 101–114.
[40] W ILLAERT, F. AWS Lambda container lifetime and config
refresh. https://www.linkedin.com/pulse/aws-lambda-
container-lifetime-config-refresh-frederik-
willaert, 2016.
[41] X U , Z., WANG , H., AND W U , Z. A measurement study on co-
residence threat inside the cloud. In USENIX Security Symposium
(2015), pp. 929–944.
[42] YAN , M., C ASTRO , P., C HENG , P., AND I SHAKIAN , V.
Building a chatbot with serverless computing. In Proceedings of
the 1st International Workshop on Mashups of Things and APIs
(2016), ACM, p. 5.
[43] YAROM , Y., AND FALKNER , K. Flush+ reload: A high
resolution, low noise, l3 cache side-channel attack. In USENIX
Security Symposium (2014), pp. 719–732.
[44] Backand. https://zapier.com/, 2018.
[45] Z HANG , Y., J UELS , A., R EITER , M. K., AND R ISTENPART,
T. Cross-tenant side-channel attacks in PaaS clouds. In
Proceedings of the 2014 ACM SIGSAC Conference on Computer
and Communications Security (2014), ACM, pp. 990–1003.
[46] Z HOU , X., D EMETRIOU , S., H E , D., NAVEED , M., PAN , X.,
WANG , X., G UNTER , C. A., AND NAHRSTEDT, K. Identity,
location, disease and more: Inferring your secrets from android
public resources. In Proceedings of the 2013 ACM SIGSAC
conference on Computer & communications security (2013),
ACM, pp. 1017–1028.

USENIX Association 2018 USENIX Annual Technical Conference 145

You might also like