AWS Serverless Compute
AWS Serverless Compute
Serverless
Short notes
AWS Serverless Services
Application integration
Compute
Data storage
Number of Requests per month
Execution time (GB seconds) = Time x GB of memory
Free tier: 1M requests | 400k GB-seconds per month
Serverless Compute
AWS Lambda
The ability to run applications and services without thinking about servers or capacity provisioning.
Lambda limits
Total number of concurrent function executions per Account per region
Lambda function anatomy
• You can include up to five layers per function. You can use layers only with Lambda functions deployed as a .zip file archive. Lambda
functions packaged as container images do not support adding Lambda layers to the function configuration.
• For functions defined as a container image, package your preferred runtime and all code dependencies when you create the container image.
Creating and accessing Lambda layers
To create a layer, package your dependencies into a .zip file, similar to how you create a normal deployment package. Steps in
creating a Lambda layer are as follows,
1. Package your layer content: This means creating a .zip file archive.
2. Create the layer in Lambda: Publish the lambda layer version by specifying the zip file or S3 URL pointing to the zipped package.
3. Add the layer to the Lambda function(s): You can add up to five Lambda layers into your function configuration. When you add a
layer to a function, Lambda extracts the layer contents into the /opt directory in the function’s execution environment. This fives
function access to the layer contents
• Creating a Lambda layer. Make sure that the layers you add to a function are compatible with
the runtime and instruction set architecture of the function.
Allows developers to package and deploy container images as Lambda functions using familiar
container image build and deployment flows
• You add extensions to .zip archive functions using Lambda layers or include them in the image for functions
deployed as container images.
• Extensions can be used for use-cases such as
1. Capturing diagnostic information before, during, and after function invocation
2. Automatically instrumenting your code without needing code changes
3. Fetching configuration settings or secrets before the function invocation
4. Detecting and alerting on function activity through hardened security agents, which can run as separate processes from the function
INIT Phase:
• Lambda creates or unfreezes an execution environment with the configured resources, downloads the code for the function and all layers, initializes any
extensions, initializes the runtime, and then runs the function’s initialization code (the code outside the main handler).
• The Init phase happens either during the first invocation, or before function invocations if you have enabled provisioned concurrency.
• The Init phase is split into three sub-phases:
1. Extension Init - starts all extensions
2. Runtime Init - bootstraps the runtime
3. Function Init - runs the function's static code
• These sub-phases ensure that all extensions and the runtime complete their setup tasks before the function code runs.
INVOKE Phase:
• Lambda invokes the function handler. After the function runs to completion, Lambda prepares to handle another function
invocation.
SHUTDOWN Phase:
• If the Lambda function does not receive any invocations for a period of time, this phase initiates.
• Lambda shuts down the runtime, alerts the extensions to let them stop cleanly, and then removes the environment. Lambda
Lambda function lifecycle
Lambda Function Concurrency is the number of requests handled by the function at a given time. Each request is handle by a function instance
First Execution environment is warm and available for another function execution Function is warm started upon receipt of Request #6
Request #9 required a cold start as additional execution environment is required to handle the concurrency
Lambda Bursting
How fast Lambda functions can scale is determined by bursting & concurrency limits
While you are limited by the concurrency limit all the time, how fast Lambda can scale is determined by the Burst size. Lower the
burst size, more time it takes for the Lambda to scale the number of Execution environments up to the maximum concurrency
Lambda Concurrency vs. Throughput
Estimated Concurrency
Given steady state traffic:
Limited by entitlement
Increase in Execution time causes increase in the required
Concurrency to handle the same volume
• Increased latency from dependent services, bad coding practices Average Average
result in increased Execution time Throughput Function Execution Time
• Higher the concurrency, number of execution contexts increase Represent the demand on the function
resulting in even higher load on dependent services
Lambda Execution Context re-use
Once the Lambda function is bootstrapped, it establishes the Execution context. Bootstrapped Lambda function may be invoked
multiple times and all that is saved in Execution context can be re-used across invocations.
• Initialize SDK clients and database connections outside of the function handler,
and cache static assets locally in the /tmp directory. Subsequent invocations
processed by the same instance of your function can reuse these resources. This
saves cost by reducing function run time.
• When a Lambda function is called for the first time, it executes a Cold Start
• Each Cold Start bootstraps of the Execution context - imports libraries, initializes global variables by executing any code
outside the handler function including those that initializes connections.
• Lambda has to cold start a function every time
• Function is invoked for the first time
• When the Function is updated
• Lambda function is started or modified or after 15 minutes (life time of the Micro VM provisioned for Lambda)
• Lambda handler code must assume that it is stateless, however, it can check if, for example, database connections are
saved in Execution context, before creating one to get the benefit of Execution Context re-use and reduce execution time
Lambda Provisioned concurrency
Use for Interactive and latency sensitive Lambda workloads
Lambda Provisioned concurrency initializes a requested number of execution environments so that they are prepared to
respond immediately to your function's invocations. A chargeable feature that is typically configured for interactive and
latency sensitive Lambda functions
• Rather than setting a fixed value for provisioned concurrency and pay for the Scaling type: Target tracking
irrespective of used or not, registering Lambda functions as scalable targets Scaling metric: Provisioned Concurrency utilization
Target: 70%
and applying target tracking scaling policy on Provisioned concurrency
utilization more cost effective use of provisioned concurrency can be
achieved
Provisioned Concurrency use-cases
Reserved Concurrency ensures the function Reserved Concurrency restricts the function
always can use that amount of concurrency to its concurrency value
Note: If there is increased function errors, due to frequent retries, concurrency limit can easily be reached.
3. Modify the calling application to have exponential backoff when calling the Lambda function to prevent the chances of
function throttling
4. Use Dead Letter queues if the caller is an async caller to protect your data due to function throttling
Lambda function timeouts
A Lambda function times out when it cannot process the request within the function timeout setting and logs “Task
Timeout” in the CloudWatch logs.
Typical causes Best practices to avoid timeouts
• Requester has configured insufficient timeout setting
for the function to do a cold start and process the 1. Make sure that your Lambda function is idempotent: Due to
request transient network issues client may resend the request causing
• Function does not have sufficient resources configured Lambda function to receive duplicate requests
• Backend service call takes longer preventing Lambda 2. Initialize your function's static logic outside of the function
function to complete the processing before the function handler
timeout is reached 3. Verify that the retry count and timeout settings on the AWS
SDK that you're using allow enough time for your function to
initialize
4. Verify that your Lambda function has enough system resources
Diagnosis 5. Verify that your Lambda function is configured to work within
the maximum timeout settings of any integrated AWS services:
• CloudWatch logs: Determine the request ID from the For example API Gateway has a maximum timeout of 29
log line that captures the Task Timeout and look for seconds, hence if Lambda function is invoked synchronously, it
clues in the application logs defined for the language must return within that timeout
environment 6. Confirm that there's a valid network path to the endpoint that
• Xray traces: Use traces to determine the causes of your function is trying to reach
delays if backend services at accessed 7. If required setup provisioned concurrency for the Lambda
• VPC Flow logs: To identify network issues relating to function
function execution
• Lambda Insights: Collects system level metrics such as
CPU, memory and diagnostics information pointing to
Lambda cold starts and worker shutdowns
Lambda function metrics
• ConcurrentExecutions is the number of function instances that are processing events. If this number reaches your concurrent executions quota for the
Region, or the reserved concurrency limit on the function, then Lambda throttles additional invocation requests.
• OffsetLag for Kafka streams is the difference in offset between the last record written to a topic and the last record that your
function's consumer group processed
• IteratorAge metric measures the time between when a stream receives the record and when the event source mapping sends
the event to the function.
• AyncEventsAge metric represents the time between when Lambda successfully queues the event and when the function is
invoked. The value of this metric increases when events are being retried due to invocation failures or throttling.
Lambda invocation styles
Push model (via Lambda API) Pull model (pick from a stream or queue)
• Lambda service sends the events directly to the function, waits for the response and sends the function's response back
to the invoker
• For functions with a long timeout, your client might be disconnected during synchronous invocation while it waits for a
response. Configure your HTTP client, SDK, firewall, proxy, or operating system to allow for long connections with timeout
or keep-alive settings
Lambda Asynchronous innovation model
Service such as S3, CloudWatch logs, EventBridge, SNS invokes Lambda functions asynchronously
These services invoke Event invoke Frontend Service of Lambda and send the event which is then authenticated and authorized
• Authorized request will be sent to an internal event queue implemented using SQS and then the caller is acknowledged
• These internal queues are constantly being polled by a number of Lambda pollers, once a message is detected in the
internal queue allocated to the poller, it uses the same synchronous invocation mechanism used when Lambda is
synchronously invoked by a service and sends the event to the function
Failure handling in Asynchronous invocations
Supports two built-in mechanisms to handle invocation errors - Dead Letter Queues (DLQ) and Destinations.
1. Dead Letter Queue (DLQ): Give you more control over message handling for all asynchronous invocations, including
those delivered via AWS events (S3, SNS, IoT, etc). Setup a DLQ by configuring the 'DeadLetterConfig' property when
creating or updating your Lambda function. You can provide an SQS queue or an SNS topic as the 'TargetArn' for your
DLQ, and AWS Lambda will write the event object invoking the Lambda function to this endpoint after the standard
retry policy (2 additional retries on failure) is exhausted.
2. Destinations: Gives you the ability to handle the Failure of function invocations along with their Success. When a
function invocation fails, such as when retries are exhausted or the event age has been exceeded (hitting its TTL),
Destinations routes the record to the destination resource for every failed invocation for further investigation or
processing.
• Destinations provide more useful capabilities by
• passing additional function execution information, including code exception stack traces
• Supports more destination services i.e. Lambda as a destination in addition to SNS and SQS
• Destinations and DLQs can be used together and at the same time although Destinations should be considered a more
preferred solution. If you already have DLQs set up, existing functionality does not change and Destinations does not
replace existing DLQ configurations. If both Destinations and DLQ are used for Failure notifications, function invoke errors
are sent to both DLQ and Destinations targets.
Lambda Asynchronous invocations
• It is possible to configure a destination for invocation records, separately for successful runs as well as failures (see diagram, where upon
a successful run, invocation record is pushed to the Event Bus of EventBridge)
Lambda Destinations for Async invocations
Enabling further actions based on result of Lambda function w/o need for coding
On success:
You can use this to monitor the
health of your serverless
applications via execution status
or build workflows based on the
invocation result.
On Failure:
Use either Lambda Destinations
or DLQs. Provide detailed failure
traces
• Route asynchronous function results as an Execution Record to a destination resource without writing additional
code.
• Execution record: [version, timestamp, request context, request payload, response context, and response payload]
• For each execution status such as Success or Failure you can choose one of four destinations: another Lambda function, SNS, SQS, or EventBridge.
• Lessen the coding effort to realize event-driven microservices architectural pattern using Lambda. Now Lambda
functions can communicate with each other via Destinations which are ideally suited for asynchronous communication
Lambda stream based (poller) invoke model
With Stream-based invoke model, Lambda function actively polls the stream/queue for events and then process them
• Lambda pollers will be configured to poll on the event sources defined in line with event source mapping settings.
• Pollers read the message/event record from the source, then filters them, batches them and invokes Lambda frontend
invoke service synchronously
• SQS and SNS are supported as event destinations for Kinesis data streams and DynamoDB streams
Lambda Event Source Mapping
• Event filtering in an Event source mapping configuration allows Lamba service to filter messages or records before
invoking the Lamba function. This reduces the calls made to the Lambda function, simplifies code (you can now remove
the code to check the message or record to see if it should be processed) and reduced cost
• Supported for DynamoDB Streams, Kinesis Data Streams and SQS, filtered out messages will be read from the stream or
queue (deletes from event source in the case of SQS) and ensure they are not required any further
2. MaximumRecordAgeInSeconds:
3. BisectBatchOnFunctionError:
Addresses the Poisson Pill problem where a single corrupt message can stall
processing from an entire shard
Split the batch of stream records received such that only corrupted stream
record can be discarded. This allows you to easily separate the malformed data
record from the rest of the batch, and process the rest of data records
*In fact five consumers per stream includes KDFH and
Kinesis Analytics consumers
successfully
Kinesis Data stream processed by Lambda stream pollers Inside a Lambda stream poller
• Stream pollers of Lambda service picks a record or batch or records from the stream and synchronously invokes Lambda functions.
• Multiple Lambda functions along with other consumers can consume the stream
Lambda for Streaming Analytics
Lambda networking - VPC Attached Lambda functions
Accessing public services for VPC attached Lambda functions
• Strict exfiltration requirements: You need to assign policies to what traffic can go out from the VPC etc.
• Specific IP address: All traffic originating from Lambda function must come from a specific IP address, for
instance for traffic inspection purpose, by assigning the originating IP address into an allow-list of a traffic
inspection appliance
Lambda networking - Interface Endpoints for Lambda service
• VPC resources can establish connections with Lambda functions privately using interface endpoints without the need for
NAT Gateways or Public IP addresses and Internet Gateways
• You can call any of the Lambda API operations from your VPC. For example, you can invoke
the Lambda function by calling the Invoke API from within your VPC.
• Lambda purges idle connections over time, so you must use a keep-alive directive to maintain persistent connections.
Attempting to reuse an idle connection when invoking a function results in a connection error. To maintain your persistent
connection, use the keep-alive directive associated with your runtime.
Lambda Networking Best Practices
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt159186333252",
"Action": ["lambda:CreateFunction","lambda:UpdateFunctionConfiguration"],
"Effect": "Deny",
"Resource": "*",
"Condition": {"ForAllValues:StringNotEquals": {"lambda:SubnetIds": ["subnet-046c0d0c487b0515b","subnet-
091e180fa55fb8e83"]}}
},
{
"Sid": "Stmt159186333253",
"Action": ["lambda:CreateFunction","lambda:UpdateFunctionConfiguration"],
"Effect": "Deny",
"Resource": "*",
"Condition": {"ForAllValues:StringNotEquals": {"lambda:SecurityGroupIds": ["sg-0a56588b3406ee3d3"]}}
}
]
}
• Associating the above IAM policy as an Service Control Policy to your account allows Lamda service to associate Lambda functions only
with the indicated Subnet/Security Group combinations
• Allows access to the specific private VPC only
• Allow access to MySQL databases which are in the given security groups in two AZ’s
Lambda security model It’s a resource-based policy
Execution Role: What Lambda function can do Function Policy: Who can invoke the Lambda function
• Execution role is created and assigned when the Lambda function is created
• Function policy is created when you add a trigger to the Lambda function
Use least privilege, for example above policy allows API gateway to invoke the Lambda function
only when a particular method is invoked
Lambda security best practices
• Developer experience: no need to handle connection pooling, clean-up of idle connections etc. - lean code
• Load on database goes down, hence same database footprint to handle more connections
• Integrates with Secrets Manager for simple authentication
Lambda versions and aliases
Lambda deployments using CodeDeploy
• Lambda is integrated with CodeDeploy for automated rollout
with traffic shifting. CodeDeploy supports multiple traffic
shifting options
1. Canary
2. Linear
3. All at once
• CodeDeploy also supports Alarms and Hooks during
deployment
• Alarms: These instruct CloudWatch to monitor the deployment
and trigger an alarm if any errors occurred during rollout. Any
alarms would automatically roll back your deployment.
• Hooks give you the option to run pre-traffic and post-traffic test
functions that run sanity checks before traffic-shifting starts to the
new version and after traffic-shifting completes
• Reduce the time it takes Lambda to unpack deployment packages authored in • Leverage your logging library and AWS Lambda Metrics and Dimensions to catch
Java by putting your dependency .jar files in a separate /lib directory. app errors.
• Minimize the complexity of your dependencies. Prefer simpler frameworks that • Use Cost Anomaly Detection to detect unusual activity on your account.
load quickly on execution environment startup. • Test with different batch and record sizes so that the polling frequency of each
• Avoid using recursive code in your Lambda function, wherein the function event source is tuned to how quickly your function is able to complete its task.
automatically calls itself until some arbitrary criteria is met. • Use Batch window to avoid invoking the function with a small number of
• Do not use non-documented, non-public APIs in your Lambda function code. records
• Write idempotent code. • Increase Kinesis stream processing throughput by adding shards.
• Use CloudWatch on IteratorAge to determine if your Kinesis stream is being
processed. For example, configure a CloudWatch alarm with a maximum setting
to 30000 (30 seconds).
Lambda Best Practices
• Memory
• Power of the Lambda function is determined by the memory allocated to the function
• Use Lambda Power tuning tool to load test against different memory configuration and determine the optimal memory footprint for the function
• Adding more memory will reduce to overall cost of running the function up to a point, beyond which cost will start increasing
• Timeout
• Dictates how long a function can run before Lambda terminates the Lambda function (< 900s)
• Most functions will fail fast before the maximum timeout is reached, hence it is important to determine the optimal timeout value. Function will
be charged for the duration it is running
• Load testing is the best way to determine the optimal timeout value for the function
• Concurrency
• Concurrency is the number of invocations the function runs at any given moment
• Three types: Unreserved concurrency (at least 100 per account/region), Reserved concurrency and Provisioned concurrency
• Limit the concurrency in line with the ability of backend resources to handle the peak workload
• Reserve the concurrency for critical functions that have to honour SLA’s by handling the peak workload
• Provision concurrency to cater to temporary load increases
Lambda Best practices
Best practices for testing Lambda functions to tune memory, timeout and concurrency
3. Does your error handling work as expected? CloudWatch metrics relating to concurrency
• Tests should include pushing the application beyond the
concurrency settings to verify correct error handling.
Lambda Best Practices Using purpose-built services instead of Lambda
Use concise function logic
• Ensure that purpose-built services are used for event transport and not Lambda, use
Lambda only when a transformation logic needs to be applied
• Keep memory foot-print small by reading only what is required using filters
• Ensure I/O is optimized in the target service to prevent Lambda waiting beyond
what it should
• Different concurrency models supported by the AWS event transport services may
need to be considered when selecting how the events may be pushed to Lambda
• Stream based transport allows massive concurrency using batches and the number of
shards concurrency can easily be scaled
• SNS or API Gateway can push events faster across the Lambda functions
Lambda Best Practices
Increasing per function compute power
• Choosing more memory would increase the compute cost, but in most cases would disproportionately reduce function
execution time hence overall cost
• Multi-threading the function may achieve gains if the function is CPU bound or I/O bound when you increase memory beyond
1.8GB (attracting additional CPU cores)
Lambda Best Practices
Keep orchestration logic outside function and only do
business logic
• Orchestration logic inside function will unnecessarily adds to the execution time. Handover the workflow orchestration
logic to Step functions
Lambda Best Practices
Tune per function concurrency
Cost considerations:
• Lambda execution time
• Lambda invocation rate
• CW logs/metrics
Prevent overloading backend for synchronous invocations
*
Use CW embedded metrics
(Push to CW logs to reduce rate)
• If Synchronous invocation is really required, implement the enhancements at each dependent service
• Check if Synchronous really required? Do you need to know the request was processed successfully or durably stored for
processing is sufficient? If so resort to Asynchronous invocation patterns.
* • If feedback on process state is required, still use asynchronous invocation and use [Polling | Webhooks | Web Sockets]
Think Asynchronously
Store first, process later
“StoreFirst”
“Store First”
“Process Later”
• Store the request durably in an appropriate AWS service (SQS, SNS, EventBridge, S3) and then process later
Converting Synchronous to Asynchronous
Note: DDB streams is used to trigger events to start translation and once
finished to start transcription
Input payload
Use Lambda destinations to consolidate error handling
• Rather than handling errors in each Lambda function separately, it will be cleaner to handle them via a Lambda destination.
• Lambda destinations are supported on EventBridge, another Lambda function, SNS and SQS.
• Define another Lambda function as the destination for the Lambda functions executing business logic. Error handling
Lambda function can persist the errors in a DDB table or a queue which can be serviced and appropriate corrective actions
can be taken periodically by another Lambda function
Offload non-business processes to the right AWS managed services
• Do not build Lambda into a monolith. In order to reduce cost and efficiency of Lambda processing, only get
business logic into the Lambda function. Offload all other processing into purpose-built AWS services
Implement Zero Trust by implementing micro-perimeters
• By moving away from large monolithic Lambda functions into smaller purpose-built Lambda functions and offloading other processing logic
into AWS services, it is possible to make the security perimeter smaller and attack surface smaller. Now these micro-perimeters can be
protected using Lambda function policies and access policies
• Decouple business logic from security posture by implementing roles and permissions.
Combine best practices in real-life applications
• Order manager service, converts the ordering request into an event which will be handled via step function workflow
• Barista staff gets updates on each step completed in the order fulfilment process by subscribing to a Web Socker connection implemented by AWS Manage
service IoT Core (without building the web socket connection programmatically)
Optimize Serverless
When API Response is not required to continue processing
1. Integrate API Gateway with SQS, Lambda to pick events from SQS 2. Offload Orchestration code to Step functions
Optimize Serverless
When API Response is required to continue
processing
1. Client polling
2. Client is notified through Web Sockets 3. Client is notified via Web Hook
Note: Can be used when the client is trusted (allowing to subscribe via SNS)
Serverless security best practices
Serverless Application Use Cases
Serverless Application Use Cases
Event-driven web application backend
Serverless Application Use Cases
Event-driven real-time file processing
Serverless Application Use Cases
ETL pipeline for daily air quality measurements (min-max-avg.)
Serverless Application Use Cases
Creating an index for files using meta data and store in ES for fast indexing
Serverless Compute
• Step functions
• HTTP APIs
• REST APIs
• DynamoDB tables
• Lambda layers
• Applications
Benefits:
1. Allows deploying all related resources for Serverless application as a single versioned entity. In a single deployment configuration it is easy to share
configurations, such as timeouts, memory size etc. across a number of resources
2. Since SAM template define infrastructure config, it is easy to enforce best practices such as code review style verification, use of AWS tooling such as
CodeDeploy for safe deployment and AWS-Xray for enabling tracing
3. Since SAM CLI provides a Lambda-like local execution environment, it provides way to catch early in development cycle, the issues that may arise in
executing your code in cloud
4. Provides deep integration with development tools – CodeSuite, AWS Cloud9 IDE, Jenkins plugin etc.
SAM templates
AWS SAM Transform
Tells CloudFormation to transform the code in SAM template
• SAM is built on-top of CloudFormation, when CloudFormation encounters a SAM template (it recognizes the same through “Transform:”
section), it examines the code and converts the code that relates to serverless application into CloudFormation code (create
CloudFormation resource types from serverless resource types) and uses any non-serverless code as it is
• SAM templates can also contain non serverless resources
AWS SAM Serverless resource types (7)
1. aws::serverless::function
Attach Policy
template to the
Lambda function
5. aws::serverless::LayerVersion 6. aws::serverless::StateMachine
7. aws::serverless::Application
AWS SAM Globals
Externalize environment
specific parameters to
Secrets Manager or SSM
Parameter store
SAM Best Practices - Re-usable templates
Use pseudo parameters and intrinsic functions
Generate certificate if
“CreateCert” condition is
Pseudo parameters true
Intrinsic functions
• Intrinsic functions?
• Pseudo parameters?
Ref. https://www.youtube.com/watch?v=QBBewrKR1qg
SAM CLI
• Deploys Docker container locally with capabilities to test and debug serverless applications and validate SAM templates
• You can locally test and "step-through" debug your serverless applications before uploading your application to the AWS Cloud.
• In the SAM CLI debug mode, you can attach a Debugger and step through the code line by line, see the values of various variables, and fix issues the
same way you would for any other application.
• You can verify whether your application is behaving as expected, debug what's wrong, and fix any issues, before going through the steps
of packaging and deploying your application.
• Since SAM CLI emulates the Lambda service endpoint locally, it is easy to author integration tests, run them against the Lambda function
locally verify its functionality before deploying into Cloud. Same integration tests can be modified to test the same function in the cloud
SAM CLI
• CLI tool for local development, debugging, testing, deploying and monitoring of serverless applications
• Supports API Gateway “proxy-style” + Lambda Service API testing
• Response objects and function logs available on your local machine
• Lambda execution environment is mimicked using docker lambda images
• Can help build native dependencies
Fargate compute & memory (Standard and SPOT)
• vCPU-hours (per second billing)
• GB-hours (per second billing)
Ephemeral storage:
• GB-hours (per second billing)
AWS Fargate
Serverless compute engine for ECS and EKS
AWS Fargate
• AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building
applications without managing servers.
• Fargate is compatible with ECS and EKS
AWS ECS Fargate mode
Deploying a container image
Fargate spot
• Fargate provides compute capacities that are very granular and can closely match the resource requirements.
• Per second billing with one minute minimum
• Tasks and Pods will run on right-sized compute environments. More than 50 different task/pod configurations available
• vCPU and memory resources are calculated from the time your container
images are pulled until the Amazon ECS task or EKS pod terminates, rounded
up to the nearest second. A minimum charge of 1 minute applies.
EC2 Lambda
Workloads with little to no idle Fargate Workloads with long idle periods
Workloads with little to no idle time Minimizing operational overhead
time. Minimize operational overhead Security posture maintenance
Steady state and predictable
Security posture needs to be limited need to be limited - only secure
Workloads that would benefit from
–only secure container image the application code
specialized CPUs or GPUs not yet Faster scaling requirements Faster scaling requirements
available for Fargate / Lambda. Burst handling capability
Cost considerations
1. When right sized with constraints, EC2 has the best cost.
2. When constraints are smaller than the smallest EC2 instance, then Fargate’s flexibility of rightsizing provides better cost.
3. Lambda starts saving money over EC2 once it runs half or less of the time.
4. Lambda saves money over Fargate once it runs a quarter or less of the time.
Elastic Container Registry (ECR)
AWS managed container image registry service that is secure, scalable, and reliable
Elastic Container Registry
Fully managed Container artifact registry service. ECR supports private repositories with resource-based permissions using
IAM so that specified users or EC2 instances can access the container repository and images. It supports integration with all
container orchestration platforms (ECS, EKS, Self-managed) and compute platforms (EC2, Fargate, On-prem). With ECR you
can use your preferred CLI to push, pull, and manage Docker images, Open Container Initiative (OCI) images, and OCI
compatible artifacts.
Features of ECR:
• Lifecycle policies help with managing the lifecycle of the
images in your repositories. You define rules that result in
the cleaning up of unused images. You can test rules
before applying them to your repository.
• Cross-Region and cross-account replication makes it easier for you to have your images where you need them. This is configured as a
registry setting and is on a per-Region basis. For more information, see Private registry settings.
• Pull through cache rules provide a way to cache repositories in an upstream registry in your private Amazon ECR registry. Using a pull
through cache rule, Amazon ECR will periodically reach out to the upstream registry to ensure the cached image in your Amazon ECR
private registry is up to date.
Components of ECR
ECR consists of repositories which are used to securely store container images. Governance of the ECR repository is managed
by policies.
1. Repository policy defines the who is permitted to access the images in the repository
2. Lifecycle policy governs how many version of each image (tagged or untagged) shall be maintained in the repository.
Basic Scanning:
• Free scanning activated only upon image push
• If the image needs to be scanned again, it needs to be pushed
again.
• Scans only operating system runtime using hosted scanning
software called Clare
Enhanced Scanning:
• Amazon Inspector executes the scanning of the image once
pushed to the registry and every time it finds a vulnerability
through its wide-range of vulnerability feeds (Continuous
scanning)
• Can enforce scanning on multiple accounts with AWS
Organizations integration.
• Scan not only OS software (runtime), but also programming
language packages per image layer
ECR Cross Region Replication
Allows replicating ECR private registries across regions and accounts. When turned on, all private ECR repositories in the
registry will automatically copy images to multiple other repositories in different accounts and/or regions, reducing pull
latency that make your containers start up faster as they can now pull images in-region.
ECR Pull through cache
Images from public registries are used in most of the container applications, either during build to use as the base image or
to use as a side-car for the container application. Pull through cache creates a repository that caches the images from public
registry and makes those images directly from the private registry of your own.
• Once the image is pushed to the ECR public, AWS internally replicates them to other regions and provisions CloudFront
distributions for faster image access.
• Docker official images are available to download from ECR public gallery
Elastic Container Services (ECS)
Highly scalable container management service that makes it easy to run, stop and manager
containers on a cluster
AWS proprietary Container orchestration engine
Container compute
Enabling ECS Managed tags in Task Definition Tagging supported by ECS resources
• When ECS Managed tags option is enabled and when you launch
• A standalone task: ECS will automatically add following tags
• aws:ecs:clustername = <name of the cluster>
• All task definition tags added by the users
• An ECS service: ECS will automatically add the following tags
• aws:ecs:clustername = <name of the cluster> and aws:ecs:servicename = <name of the ECS service>
• All task definition tags added by the users
• All service tags added by the users
ECS Launch types
Scheduling of tasks
ESC Container Agent
(Fargate Agent in Fargate mode)
Architectural considerations:
• Cluster management engine is decoupled from scheduler, allowing custom schedulers to schedule the Tasks such that based on user defined metrics,
resources can be allocated – service priority (container running high priority service will be scheduled in before a low priority request)
• Cluster manager maintains state in a key value store which maintains a time ordered sequence of state transitions and manages the concurrency control
of two requests for cluster resources using optimistic concurrency control. This optimistic model helps maintain high availability, scalability of the cluster
and low latency for state transitions
Ref. https://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
ECS Task Placement strategies
Allow developers more control over where the Tasks should run 1. Placement constraints
2. Placement strategies
• Default placement will be spread across AZs and place on the instance with least
number of tasks running
• Placement strategies can be chained to ensure tasks are placed in the way the
workload requires
• Identification of the placement strategies and constraints (which instance type,
which AMI etc.) needs to be done after carrying out load testing on the container
based application
Available instances
Blox is an Open Source custom scheduler developed. It uses stream of cluster state change events it receives
from ECS in order to make scheduling decisions
ECS Networking: bridge mode vs. awsvpc mode
Bridge mode:
• Relies on Docker internal networking
• Containers will not have routable IP addresses
• Containers share the instance ENI
• Performance challenges due to multiple translations
• Lack of fine-grained security controls
• SG cannot be applied at container level
• ECS container agent implements the awslogs log driver to send container logs to
CloudWatch logs.
• Fargate launch type: Before your containers can send logs to CloudWatch, you must specify the
awslogs log driver for containers in your task definition.
• EC2 launch type: Ensure that ECS optimized AMI is used in the container instances. If a custom
AMI is used, ensure that ecs-init package is at 1.9.0-1 or higher
• FireLens for ECS runs as a side car and uses ECS task definition parameters to route logs to
an AWS service or APN destination for log storage and analytics.
awslogs driver configuration • FireLens works with Fluentd and Fluent Bit.
• Can append, transform and filter of log events before sending to the destination
ECS resources - CPU, memory and network - must be defined at Task level as well as at Container level
• Configure ALB and Task security groups so that they can talk
to each other
• If the ECS is configured in AWSVPC mode, ECS tasks are
presented as ENI’s, therefore they shall be registered with
ELB as IP address targets
1. Create a target group of IP address type
2. Associate task definitions to the target group of the load
balancing by creating ECS service.
• Tasks will act like EC2 instances and ECS service will act like
Auto Scaling Group when configured as a Scalable resource
using AWS Application Auto scaling
Multiple ALB target group support for ECS
Allows you to attach a single Amazon ECS service running on either EC2
or AWS Fargate, to multiple target groups of the same ELB or multiple
ELBs.
• Use cases:
1. Allow same ECS service to be exposed via two Load Balancers:
typical use-case is to expose an ECS service to public internet as well
as internal VPC based clients. Refer diagram.
2. Expose multiple ports from the same container: for example, Jenkins
container may expose port 8080 for web interface and port 5000 for
API.
3. Expose multiple containers listening on different ports per task:
When a task consists of more than one container with each one
listening on different ports. Each container can be configured in a
different target group on the same ELB.
Interconnecting services in ECS
Supports multiple service discovery capabilities
1. ECS Service discovery: provides basic service discovery
capabilities using Route53. Under the covers, the services will be
registered with AWS Cloud Map which acts as an interface to add
DNS names of the services to a Route53 private hosted zone. No
support for traffic telemetry and other smart discovery
capabilities.
Reliable deployments
Create an ECS Service Connect service
You enable Service Connect for an ECS service when it is created by adding a “ServiceConnectConfiguration” stanza. It defines
the namespace to which the ECS service will be registered in Cloud Map, friendly discoverable name and an optional Client
Alias. Client Alias can be used to override the discovery name and used in migration scenarios to retain the previous name
When an ECS service is created with Service Connect enabled, ECS carries out the
following tasks,
1. Requests ECS agent to initiate the task
2. Since it is a Service Connect enabled task, ECS agent instantiates Service Connect
Agent (which consists of an Envoy proxy and an agent, this agent monitors the
health of Service Connect agent, collects, aggregates metrics from Envoy proxy
and periodically sends them to CloudWatch)
3. ECS service configures the Service Connect agent based on the
ServiceConnectConfiguration provided
4. ECS fetches the existing services registered in the namespace from Cloud Map and
passes it on to Service Connect agent
5. ECS registers the discoverable name of the task in the namespace in Cloud Map so
that it is discoverable by other services
ECS Service Connect request flow
All communications to the tasks, ingress as well as egress, happens via ECS Service Connect agent. It essentially acts as a
proxy for each task.
• Any ingress connection to the task (10.1.2.1:8080) will automatically be redirected to the Service Connect agent of that task which will act
as a proxy to handle the required communication with the target container.
• Also, any outbound request to Service Connect enabled tasks will be redirected to the Server Connect agent running in that task.
Reliable inter-service communication
ECS Service connect handles failures and errors of the tasks in the target service transparently to the requesting service.
When a task has failed, Service Connect agent of the requesting When a task returns an error, Service Connect agent of the
task detects the failure and marks it as bad. The request will be requesting task detects the error and after a number of attempts
retried to another healthy task automatically and transparently to to connect, will mark it as bad and redirects traffic to a healthy
the requesting service task in the target service.
Robust deployments
When there is a new deployment, Service connect agent will detect the tasks which will be in de-provisioning stage and
redirects the request to healthy tasks in the Blue service and once all tasks are deprovisioned, the requests will be redirected
to the tasks in the Green service.
ECS Service Auto Scaling
ECS uses Application Auto Scaling to increase and decrease the task count for an ECS service automatically.
• Scaling policies supported
• Target tracking scaling policies: Increase or decrease the number of
tasks that your service runs based on a target value for a specific metric.
• Step scaling policies: Increase or decrease the number of tasks that your
service runs based on a set of scaling adjustments, known as step
adjustments, that vary based on the size of the alarm breach.
• When the task count is increases, if there is a need for additional underlying
compute capacity
• Compute capacity will be automatically increase/decrease for Fargate launch type
• Configure ECS Cluster Auto Scaling to automatically increase/decrease compute
capacity for EC2 launch type
ECS Cluster Auto Scaling (CAS)
Capability of ECS to manage the scaling of EC2 Auto Scaling Groups. CAS relies on ECS capacity providers, which provide the
link between your ECS cluster and the ASGs you want to use. It uses a Capacity Provider Reservation metric to determine
how and when to scale-out and scale-in
• Capacity Provide reservation is the ratio of how big the ASG needs to be to how big it actually is, expressed as a
percentage
Capacity Provide Strategy defines how each capacity provider contributes (using weights) to the overall capacity of the cluster
ECS Service Auto Scaling and ECS Cluster Auto Scaling
Capacity Provider represents the infrastructure the ECS tasks will run on
Capacity Provider = ASG | FARGATE | FARGATE_SPOT
Containers by definition are ephemeral and stateless, hence do not required persistent storage. However, there are wide
range of use-cases that require stateful containers. Traditional solution to this was to use service-to-service communication
to store the state (such as in an S3 bucket or in a database)
Allows ECS tasks (on both EC2 and Fargate launch types) to natively map an EFS file system endpoint
transparently without further infrastructure configurations.
• Earlier, every task has to copy data into their local storage and
use, also every time a task is restarted, it had to pull the data
from S3.
ECS integration with CI/CD pipelines
CodeBuild supports pulling builder images from image repository, build a new container image by merging the code from the
code repository and upload it back to the repository. CodeDeploy supports B/G deployments into the ECS. Similar integrations
are supported with other CI/CD tools as well.
ECS use cases: Batch processing
ECS vs. EKS
ECS integrated Services
AWS CloudFormation
Elastic Kubernetes Service (EKS)
Fully managed upstream Kubernetes compatible container orchestrator
EKS solution portfolio
• EKS Distro is used for development environments or for production environments when there is a mature management tooling available
• All other EKS solution options rely on EKS Distro.
• EKS Anywhere uses EKS Distro and provides automation tooling that simplifies cluster creation, administration and operations on your own
infrastructure on-premises. Further, Amazon EKS Anywhere provides default configurations on operating system and networking and
brings additional opinionated tooling you would need to run Kubernetes in production.
Amazon EKS Distro
A distribution of the same open-source Kubernetes and dependencies deployed by Amazon EKS, helping you to manually run Kubernetes
clusters anywhere. Challenges it addresses: Without a vendor supported distribution of Kubernetes, you have to
spend a lot of effort tracking updates, determining compatible versions of Kubernetes and its dependencies,
testing them for compatibility, and maintaining pace with the Kubernetes release cadence.
• Does not include eksctl, Amazon official distribution of CSI and CNI plugins, Amazon controllers for
Kubernetes. Also does not include integration components with AWS services such as IAM
Authenticator.
• AWS integrations supported by EKS Distro:
• Amazon EKS Distro is aligned with Amazon EKS versions and components and is supported by the Amazon
EKS operations dashboard.
• Provides copies of builds in Amazon S3 and ECR for developers creating Kubernetes clusters on AWS.
• EKS Distro has been tested for use with Amazon Linux 2, Bottlerocket, and AWS Outposts.
• EKS Distro will support ECR Public repositories as a secure, fast source for you to download EKS Distro for use
within AWS Regions or on premises.
Clusters can be deployed as stand-alone clusters which are managed independently by admin runtime or through long-lived
EKS Anywhere management cluster
Standalone cluster: If you are only running a
single EKS Anywhere cluster, you can deploy a
standalone cluster. EKS Anywhere management
components on the same cluster that runs
workloads.
• Standalone clusters must be managed with the
eksctl CLI. A standalone cluster is effectively a
management cluster, but in this deployment type,
only manages itself.
Amazon will do version compatibility testing of each of these packages with new versions of EKS. All these packages are
supported under AWS enterprise support
EKS on Outposts
Use Amazon EKS to run on-premises Kubernetes applications on AWS Outposts. Two deployment options
• Extended clusters: Run the Kubernetes control plane in an AWS Region and nodes on your Outpost.
• Local clusters: Run the Kubernetes control plane and nodes on your Outpost.
For both deployment options, the Kubernetes control plane is fully managed by AWS. You can use the same Amazon EKS
APIs, tools, and console that you use in the cloud to create and run Amazon EKS on Outposts.
• If you're concerned about the quality of the network connection from your Outposts to the parent
AWS Region and require high availability through network disconnects, use the local cluster deployment
option.
• Extended configuration option is suitable if you can invest in reliable, redundant network connectivity from
your Outpost to the AWS Region. The quality of the network connection is critical for this option. The
way that Kubernetes handles network disconnects between the Kubernetes control plane and nodes might
EKS Anywhere vs. EKS on Outposts
EKS Anywhere, Amazon EKS on Outposts provides a means of running Kubernetes clusters using EKS software on-premises.
The main differences are that:
• The primary interfaces for EKS Anywhere are the EKS Anywhere Custom Resources. Amazon EKS does not have a CRD-
based interface today.
Kubernetes Deployments vs. ReplicaSets
Deployment is responsible for managing a set of replica pods. It provides features such as rolling updates, rollbacks, and scaling
of the number of replicas. Deployments also have self-healing capabilities, which ensure that the desired number of replicas are
always running and healthy.
ReplicaSet is a lower-level object in Kubernetes that is responsible for ensuring that a specified number of replicas are always
running. It is used by deployments to manage the replica pods but can also be used directly in some scenarios. ReplicaSets
provide the ability to scale the number of replicas and replace failed pods with new ones.
Kubernetes StatefulSets and DaemonSets
StatefulSets is a controller that manages the deployment and scaling of stateful pods.
• What are stateful pods? A pod that requires persistent storage and a stable network identity to maintain its state all the time, even during
pod restarts or rescheduling. Used for stateful applications such as databases or distributed file systems as these require a stable identity
and persistent storage to maintain data consistency.
DaemonSets is a controller that ensures only a single pod instances is running per node
• Particularly useful for running pods as system daemons or background processes that need to run on every node in the cluster.
• DaemonSets can be used for collecting logs, monitoring system performance, and managing network traffic across the entire cluster.
EKS Control plane resilience
Fully managed by AWS in a managed VPC with 99.5% uptime SLA. Zonally redundant and VPC level isolation . NLB is the
public endpoint for the EKS control plane to the customer without any cross-zone load balancing.
EKS Data plane
Supports multiple compute options starting from self-managed, partially managed or fully managed.
Self-managed Amazon EC2 instances: Runs in your own account, customer managed and
offers maximum flexibility and configurability. Spin up a worker node. EKS optimized AMIs
are provided. Upgrading nodes, pod
Amazon EKS managed nodes: Runs in your own account, AWS managed provisioning and
instance lifecycle management. Upgrading, gracefully terminating, moving pods across
nodes is taken care of by EKS
Karpenter: Next generation auto-scaler. Gives you the flexibility to choose the right
instance type. Helps reducing the cost by rebalancing the pods. If there are holes in the
nodes, it automatically moves pods and packs them into fewer nodes (binpacking). During
patching of nodes Karpenter will automatically move the pods to make way for the node
upgrade.
AWS Fargate: Single pod per node and fully managed, serverless and right sized compute.
AWS Managed OS, container runtime, storage, monitoring plugins. Provides granular
compute option with pod-based billing
EKS Data plane compute options - Managed Node Groups
MNGs automate the provisioning and lifecycle management of nodes (EC2 instances) for EKS clusters i.e. you don't need to
separately provision or register the EC2 instances that provide compute capacity to run your Kubernetes applications.
• Create, automatically update, or terminate nodes in the cluster with a single operation.
• Node updates and terminations automatically drain nodes to ensure that your applications stay available.
An open-source, flexible, high-performance Kubernetes cluster autoscaler built with AWS. Improves application availability and
cluster efficiency by rapidly launching right-sized compute resources in response to changing application load. Also provides
just-in-time compute resources to meet your application’s needs and automatically optimize a cluster’s compute resource
footprint to reduce costs and improve performance.
• Faster provisioning times mean that you don’t have to overprovision capacity like in the case of MNG which uses ASG
Karpenter consolidation feature
Consolidates pods into fewer compute resources as possible using an algorithm that takes into consideration, specifications
of the pods and the cost of disruption of nodes.
Karpenter allows optimized the compute costs using intelligent and dynamic instance selection and automatic workload
consolidation. It facilitates Kubernetes-native integrations with AWS compute services and features such as EC2 Spot and AWS
Graviton family of instances. It also allows consistently faster node launches which minimizes wasted time and cost.
Considerations:
• Karpenter performs particularly well for use cases that require rapid provisioning and deprovisioning large numbers of
diverse compute resources quickly. For example, this includes batch jobs to train machine learning models, run simulations,
or perform complex financial calculations. You can leverage custom resources of nvidia.com/gpu, amd.com/gpu, and
aws.amazon.com/neuron for use cases that require accelerated EC2 instances.
• Do not use Kubernetes Cluster Autoscaler at the same time as Karpenter because both systems scale up nodes in
response to un-schedulable pods. If configured together, both systems will race to launch or terminate instances for these pods.
EKS Data plane right sizing
Scale pods vertically and when it is at optimum size, scale them horizontally. When there is a demand for more pods and
pending pod queue is growing, EKS cluster level auto scaling needs to be considered.
EKS workload resilience within a DC
• Deploy Kubernetes workloads as Deployments • Requires a way to detect and replace unhealthy EC2
instead of individual Pods. This enables automatic pod instances automatically and scale Nodes based on
level recovery. workload requirements.
• Deploy multiple replicas of a workload to improve availability and • Use a Kubernetes autoscaler, such as the Cluster
use Horizontal Pod Autoscaling to scale replicas Autoscaler and Karpenter, ensures that the data plane
• Use probes to help Kubernetes detect failure in application code. scales as workloads scale.
• If pod startup times are high, consider • To detect failures in Nodes, you can install node-
overprovisioning replicas problem-detector and couple it Descheduler to drain
nodes automatically when a failure occurs.
• To ensure data plane scales faster and recover quickly
from failures
• Overprovision capacity at node level
• Reduce node startup time by using optimized AMIs
such as Bottlerocket or EKS Optimized Linux
• To reduce container startup times, reduce the size of the
EKS workload resilience at zone level
• Spread nodes across zones: Create node groups in multiple AZs when using Managed Node Groups. If nodes are provisioned using Karpenter, you are
already covered as Karpenter’s default behaviour is to provision nodes across AZs.
• Spread workloads across zones: Once capacity is available across AZs, you can use Pod Topology Spread Constraints to spread workloads across multiple
AZs.
• Inter-AZ cost implications: When the workload is distributed across zones, there will be intra-AZ data transfer costs. To reduce such costs while maintaining
multi-AZ resilience use techniques such as
• Topology aware routing:
• Service meshes:
EKS workload resilience at cluster level
Kubernetes critically relies on add-on functions such as networking, security and observability for its operation. Failure of any
of these components can render Kubernetes cluster inoperable. To prevent Kubernetes cluster becoming a SPOF, deploy
workloads across multiple clusters
Mission-critical workloads with stringent availability requirements may operate across multiple AWS Regions. This approach
protects against larger-scale disasters or regional outages.
EKS Observability
CloudWatch Container Insights ingests both metrics and logs into CloudWatch logs. Extraction of events happens as follows
1. CloudWatch agent or CloudWatch Agent for Prometheus: extracts metrics as performance log events
2. Fluent Bit log driver: Extracts log events
EKS provides five types of control plane logs. EKS service provides direct integration of these log sources into
CloudWatch logs. These log event types are not enabled by default and can be enabled individually.
1. Kubernetes API server component logs
2. Audit
3. Authenticator
4. Controller manager
5. Scheduler
• EKS Data plane logging uses a log aggregation tools - FluentBit or Fluentd.
• Kubernetes log aggregator tools run as DaemonSets and scrape container logs from nodes.
• CloudWatch Container Insights uses either Fluent Bit or Fluentd to collect logs and ship them to
CloudWatch Logs for storage.
• Fluent Bit and Fluentd support ingesting logs into many popular log analytics systems such as
Elasticsearch and InfluxDB giving the ability to change the storage backend for EKS logs by
modifying Fluent Bit or Fluentd’s log configuration.
EKS metrics
Containerized applications generate high cardinality metrics. CloudWatch Container Insights is specifically designed to
collect this high cardinality metrics efficiently. It combines all high cardinality metric values for all dimensions (across all
nodes, instances and applications) into a single performance log event for each time series and pushes them to CloudWatch
logs in Embedded Metric Format (EMF). CloudWatch logs will extract metric data from the log events and send them to
CloudWatch metrics.
CloudWatch agent will run as a sidecar alongside the pods and For instrumented applications using Prometheus, CloudWatch agent
collects node level and instance level metrics for Prometheus will collect the metric data in Prometheus format
and convert them into CloudWatch metric format and pushes to
CloudWatch
Execution time of provisioned container instances:
• GB-hours
Execution time of active container instances:
• vCPU-hours
• GB-hours
Automated deployments: Per Application per month
Build fee: per build-minute
• Provisioned container instances: When your application is deployed, you pay for the memory provisioned in each
container instance. Keeping your container instance's memory provisioned when your application is idle ensures it can deliver
consistently low millisecond latency. You can pause the provisioned container instances when not in use to prevent incurring this
cost.
• Active container instances: When your application is processing requests, you switch from provisioned container
instances to active container instances that consume both memory and compute resources.
• You pay for the compute and any additional memory consumed in excess of the memory allocated by your provisioned container instances.
• App Runner automatically scales the number of active container instances up and down to meet the processing requirements of your
application.
• App Runner Budget controls: Set a maximum limit on the number of active container instances your application uses so that costs do not exceed your budget.
App Runner operational benefits
App Runner abstracts away many complexities associated with setting up an application using a range of AWS services fully
integrated and managed within App Runner services accounts. It provides CI/CD, security (TLS, encryption at rest),
monitoring and Auto scaling capabilities fully managed.
• App Runner moves away from 24/7 always up and running compute model inherent in EC2, ECS and Fargate.
• Managed and simplified scaling: It provides automatic scaling without any configuration required based on the number of concurrent
requests and the maximum number of container instances configured
• Faster scaling: App Runner scaling is fast as all the container instances are pre-provisioned, turning on of the CPU of a provisioned instance
is near instantaneous compared to the cold start times of a Lambda function which requires code to be downloaded and instances warmed
up.
App Runner core capabilities
Logging
Deployment features Monitoring
Security
App Runner Auto scaling
With App Runner, multiple concurrent requests can be handled by a single container instance which can be controlled by
the user. Requests will start moving to an overflow queue if it goes beyond the pre-defined concurrent limit and App Runner
kicks off another container instance until it reaches the maximum active instances defined.
• There are no scaling rules you need to setup with App Runner (compared to EB)
• App Runner uses an Envoy Proxy load balancer internally to distribute concurrent
requests across container instances.
• When the application starts receiving more requests than what the maximum number of
container instances can handle, it returns HTTP status code 429 unlike all other Amazon
compute services which starts receiving more and more requests increasing the latency to
the end user.
• When there is no workload, App Runner pauses instances - CPU is turned off, but memory
is kept active so that once requests start flowing resuming is faster.
App Runner Auto scaling controls
App Runner pricing model
App Runner charges CPU and memory separately. For Active instances both resources are charged and when they move to
provisioned instances, you only pay for memory.
• Applications that have periods of little to no usage. When you are not running the application, you pay only for memory.
• Business front-end applications that are used during the day-time and not in the night
Serverless web hosting developer experience
App Runner supports full stack development, including both frontend and backend web applications that use HTTP and
HTTPS protocols. These applications include API services, backend web services, and websites. App Runner supports
container images as well as runtimes and web frameworks including Node.js and Python.
Deployment options:
• From container image: App Runner can immediately deploy a container image using the App Runner console or AWS CLI.
• Use your own CI/CD toolchain: If you have an existing CI/CD workflow that uses AWS CodePipeline, Jenkins, Travis CI, CircleCI, or another CI/CD toolchain, you
can easily add App Runner as your deployment target using the App Runner API or AWS CLI.
• Continuous deployment by App Runner: If you want App Runner to automatically provide continuous deployment for you, you can easily connect to your
existing container registry or source code repository and App Runner will automatically provide a continuous deployment pipeline for you.
Deployment architecture for Serverless web hosting
AWS App Runner is a secure, consistent solution for exposing web applications using the public endpoint or service URL.