0% found this document useful (0 votes)
20 views176 pages

AWS Serverless Compute

The document provides an overview of AWS Serverless services, particularly focusing on AWS Lambda, its architecture, lifecycle, and various features such as layers, container image support, and concurrency management. It discusses Lambda's execution phases, including initialization, invocation, and shutdown, as well as strategies for optimizing performance and managing function timeouts and throttling. Additionally, it covers metrics for monitoring Lambda functions and best practices for ensuring efficient execution and resource utilization.

Uploaded by

Prasad Kularatne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views176 pages

AWS Serverless Compute

The document provides an overview of AWS Serverless services, particularly focusing on AWS Lambda, its architecture, lifecycle, and various features such as layers, container image support, and concurrency management. It discusses Lambda's execution phases, including initialization, invocation, and shutdown, as well as strategies for optimizing performance and managing function timeouts and throttling. Additionally, it covers metrics for monitoring Lambda functions and best practices for ensuring efficient execution and resource utilization.

Uploaded by

Prasad Kularatne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 176

AWS Architect exam

Serverless
Short notes
AWS Serverless Services
Application integration
Compute

Data storage
Number of Requests per month
Execution time (GB seconds) = Time x GB of memory
Free tier: 1M requests | 400k GB-seconds per month

Serverless Compute

AWS Lambda
The ability to run applications and services without thinking about servers or capacity provisioning.
Lambda limits
Total number of concurrent function executions per Account per region
Lambda function anatomy

• Lambda processes a single event per Execution Environment


(container) at a given time per invocation. Execution environments
will be re-used across function invocations
• Function will be executed from top to bottom only the first time
• Subsequent times, only handler will be executed
• First load dependencies via importing packages, initialize the
database connections, file caches, retrieve environment variables,
configuration settings etc. Executed once per execution
environment, kept frozen for subsequent invocations
• Per Function configuration data -> Environment variables (Dev, Prod etc.)
• Cross-function configuration data -> Parameter Store / Secrets Manager
• Execute the Function handler, depending on the event data call the
required sub-functions that carries transformation logic
Lambda function anatomy

• To store dynamic variables, conf data, use;


• Lambda Environment variables
• AWS Parameter store
• AWS Secrets Manager

• Separate business logic from handler function


so that business logic can be re-used across
Lambda functions
• Break the business logic into distinct units and
put them in sub functions outside handler so
that they will be lazy loaded by the handler
only when required
Retrieving Secrets and Config data

System Manager Parameter Store


Environment variables

• When multiple functions shares a secret – cross function


• Parameter store is integrated with Secrets Manager - this
allows retrieval of parameters and secrets using a single API
call
Lambda layers
In a typical serverless application, common code (shared libraries, database connections, credentials etc.) may be shared across
multiple Lambda functions. Lambda layers provide a mechanism to share such code across functions.
• Lambda layer is a .zip file archive that contains supplementary code or data. Layers usually contain library
dependencies, a custom runtime, or configuration files.

Lots of duplicate code in


Pre-handlers (dependencies,
common configurations) and
Helper functions

• You can include up to five layers per function. You can use layers only with Lambda functions deployed as a .zip file archive. Lambda
functions packaged as container images do not support adding Lambda layers to the function configuration.
• For functions defined as a container image, package your preferred runtime and all code dependencies when you create the container image.
Creating and accessing Lambda layers

To create a layer, package your dependencies into a .zip file, similar to how you create a normal deployment package. Steps in
creating a Lambda layer are as follows,
1. Package your layer content: This means creating a .zip file archive.
2. Create the layer in Lambda: Publish the lambda layer version by specifying the zip file or S3 URL pointing to the zipped package.
3. Add the layer to the Lambda function(s): You can add up to five Lambda layers into your function configuration. When you add a
layer to a function, Lambda extracts the layer contents into the /opt directory in the function’s execution environment. This fives
function access to the layer contents

• Creating a Lambda layer. Make sure that the layers you add to a function are compatible with
the runtime and instruction set architecture of the function.

• Lambda functions packaged as container images do not support adding Lambda


layers to the function configuration.
Lambda Container image support

Allows developers to package and deploy container images as Lambda functions using familiar
container image build and deployment flows

• Containers have easier dependency


management and application building with
mature tooling such as Docker
• This capability allows developers to use a
consistent set of tools for both containers and
Lambda-based applications.
• You can deploy large applications either with
AWS provided or third-party images
Lambda extensions
Lambda extensions are a way to easily integrate Lambda with monitoring, observability, security, and governance tools

• You add extensions to .zip archive functions using Lambda layers or include them in the image for functions
deployed as container images.
• Extensions can be used for use-cases such as
1. Capturing diagnostic information before, during, and after function invocation
2. Automatically instrumenting your code without needing code changes
3. Fetching configuration settings or secrets before the function invocation
4. Detecting and alerting on function activity through hardened security agents, which can run as separate processes from the function

• Extensions can run in either of the following two modes


1. Internal extensions run as part of the runtime process, in-process with your code. They allow you to modify the start-up of the runtime process using language-specific
environment variables and wrapper scripts. Internal extensions enable use cases such as automatically instrumenting code.
2. External extensions allow you to run separate processes from the runtime but still within the same execution environment as the Lambda function. External extensions can start
before the runtime process and can continue after the runtime shuts down. External extensions enable use cases such as fetching secrets before the invocation, or sending
telemetry to a custom destination outside of the function invocation. These extensions run as companion processes to Lambda functions.
Lambda function lifecycle

INIT Phase:
• Lambda creates or unfreezes an execution environment with the configured resources, downloads the code for the function and all layers, initializes any
extensions, initializes the runtime, and then runs the function’s initialization code (the code outside the main handler).
• The Init phase happens either during the first invocation, or before function invocations if you have enabled provisioned concurrency.
• The Init phase is split into three sub-phases:
1. Extension Init - starts all extensions
2. Runtime Init - bootstraps the runtime
3. Function Init - runs the function's static code
• These sub-phases ensure that all extensions and the runtime complete their setup tasks before the function code runs.
INVOKE Phase:
• Lambda invokes the function handler. After the function runs to completion, Lambda prepares to handle another function
invocation.
SHUTDOWN Phase:
• If the Lambda function does not receive any invocations for a period of time, this phase initiates.
• Lambda shuts down the runtime, alerts the extensions to let them stop cleanly, and then removes the environment. Lambda
Lambda function lifecycle

• With the extension, lifecycle of Lambda function changes as follows;


1. An updated Init phase: There are now three discrete Init tasks: extensions Init, runtime Init, and function Init. This creates an order where extensions
and the runtime can perform setup tasks before the function code runs.
2. Greater control during invocation: During the invoke phase, as before, the runtime requests the invocation event and invokes the function handler. In
addition, extensions can now request lifecycle events from the Lambda service. They can run logic in response to these lifecycle events, and respond to
the Lambda service when they are done. The Lambda service freezes the execution environment when it hears back from the runtime and all
extensions. In this way, extensions can influence the freeze/thaw behavior.
3. Shutdown phase: we are now exposing the shutdown phase to let extensions stop cleanly when the execution environment shuts down. The Lambda
service sends a shut down event, which tells the runtime and extensions that the environment is about to be shut down.
Lambda function execution cycle

Cold start Warm start


Lambda Function Concurrency

Lambda Function Concurrency is the number of requests handled by the function at a given time. Each request is handle by a function instance

First Execution environment is warm and available for another function execution Function is warm started upon receipt of Request #6

Request #9 required a cold start as additional execution environment is required to handle the concurrency
Lambda Bursting
How fast Lambda functions can scale is determined by bursting & concurrency limits

While you are limited by the concurrency limit all the time, how fast Lambda can scale is determined by the Burst size. Lower the
burst size, more time it takes for the Lambda to scale the number of Execution environments up to the maximum concurrency
Lambda Concurrency vs. Throughput

Reducing the Execution time is critical to best utilize the


entitled Concurrency
• Dependent Service timeouts
• How fast dependent services can respond
• Whether synchronous response is required
• How optimized is the code invoking dependent
services

Estimated Concurrency
Given steady state traffic:
Limited by entitlement
Increase in Execution time causes increase in the required
Concurrency to handle the same volume
• Increased latency from dependent services, bad coding practices Average Average
result in increased Execution time Throughput Function Execution Time
• Higher the concurrency, number of execution contexts increase Represent the demand on the function
resulting in even higher load on dependent services
Lambda Execution Context re-use
Once the Lambda function is bootstrapped, it establishes the Execution context. Bootstrapped Lambda function may be invoked
multiple times and all that is saved in Execution context can be re-used across invocations.

• Initialize SDK clients and database connections outside of the function handler,
and cache static assets locally in the /tmp directory. Subsequent invocations
processed by the same instance of your function can reuse these resources. This
saves cost by reducing function run time.

• What can be saved in Execution context?


• Anything that takes time to initialize
• Save a file in memory
• Save cacheable API call results

• When a Lambda function is called for the first time, it executes a Cold Start
• Each Cold Start bootstraps of the Execution context - imports libraries, initializes global variables by executing any code
outside the handler function including those that initializes connections.
• Lambda has to cold start a function every time
• Function is invoked for the first time
• When the Function is updated
• Lambda function is started or modified or after 15 minutes (life time of the Micro VM provisioned for Lambda)

• Lambda handler code must assume that it is stateless, however, it can check if, for example, database connections are
saved in Execution context, before creating one to get the benefit of Execution Context re-use and reduce execution time
Lambda Provisioned concurrency
Use for Interactive and latency sensitive Lambda workloads

Lambda Provisioned concurrency initializes a requested number of execution environments so that they are prepared to
respond immediately to your function's invocations. A chargeable feature that is typically configured for interactive and
latency sensitive Lambda functions

Without any requests


Execution environments are
initialized up to function INIT

Normal cold start Initial start with provisioned concurrency


Scaling with Provisioned Concurrency
Use application auto scaling on utilized provisioned concurrency

• Rather than setting a fixed value for provisioned concurrency and pay for the Scaling type: Target tracking
irrespective of used or not, registering Lambda functions as scalable targets Scaling metric: Provisioned Concurrency utilization
Target: 70%
and applying target tracking scaling policy on Provisioned concurrency
utilization more cost effective use of provisioned concurrency can be
achieved
Provisioned Concurrency use-cases

• Reduces start time to the function handler to <100ms


• Soft limit of 500 provisioned execution environment creations per minute
• Requests above provisioned concurrency follow on-demand Lambda limits and behaviours for cold-starts bursting
and pricing
Lambda Reserved Concurrency
• Lambda service provides a shared pool of Concurrency per region
• Heavily utilized function may use up all the concurrency
• To protect mission critical functions from starvation for concurrency, reserved concurrency has been introduced
• Reserved Concurrency is configured per function

Reserved Concurrency ensures the function Reserved Concurrency restricts the function
always can use that amount of concurrency to its concurrency value

Use case: Restricting the function to its concurrency value

• Configuring Reserved Capacity when Lambda functions are


invoked asynchronously by AWS services (S3, SQS etc.) will help
smoothening the effect on the downstream services invoked by
those Lambda invocations

• With reserved concurrency, overload of messages receiving at


the SQS queue will not result in massive spikes in DynamoDB
WCU’s.
Reserved concurrency vs. Provisioned concurrency

• Two functions (DEV and PROD) configured with Reserved Concurrency,


they are not impacted by other functions using the shared concurrency • Provisioned Concurrency configured on PROD function ensures that
pool
improved response time to the callers as the functions will return
sooner
Lambda throttling
Lambda function can be throttled with the following exception due to it reaching concurrency limits or any other downstream
API reaching its limits
software.amazon.awssdk.services.lambda.model.TooManyRequestsException: Rate Exceeded. (Service: Lambda, Status Code: 429)

1. Verify which resource is throttled


• If Lambda function is throttled AWS/Lambda Throttles metric will have
non-zero values
• If not, throttling is happening on the API calls to downstream services in
the function code - check the function logs
• Implement exponential backoff when invoking the API

2. If Lambda function is throttled, check functions concurrency metrics


• Is ConcurrentExecutions metric and Throttles metric overlapping in time?
• Are you reaching Burst concurrency limits for the region? - Configure
provisioned concurrency
• Has the function duration increased? - Use Xray to find the root cause
• Does the function have increased error metrics?

Note: If there is increased function errors, due to frequent retries, concurrency limit can easily be reached.

3. Modify the calling application to have exponential backoff when calling the Lambda function to prevent the chances of
function throttling

4. Use Dead Letter queues if the caller is an async caller to protect your data due to function throttling
Lambda function timeouts
A Lambda function times out when it cannot process the request within the function timeout setting and logs “Task
Timeout” in the CloudWatch logs.
Typical causes Best practices to avoid timeouts
• Requester has configured insufficient timeout setting
for the function to do a cold start and process the 1. Make sure that your Lambda function is idempotent: Due to
request transient network issues client may resend the request causing
• Function does not have sufficient resources configured Lambda function to receive duplicate requests
• Backend service call takes longer preventing Lambda 2. Initialize your function's static logic outside of the function
function to complete the processing before the function handler
timeout is reached 3. Verify that the retry count and timeout settings on the AWS
SDK that you're using allow enough time for your function to
initialize
4. Verify that your Lambda function has enough system resources
Diagnosis 5. Verify that your Lambda function is configured to work within
the maximum timeout settings of any integrated AWS services:
• CloudWatch logs: Determine the request ID from the For example API Gateway has a maximum timeout of 29
log line that captures the Task Timeout and look for seconds, hence if Lambda function is invoked synchronously, it
clues in the application logs defined for the language must return within that timeout
environment 6. Confirm that there's a valid network path to the endpoint that
• Xray traces: Use traces to determine the causes of your function is trying to reach
delays if backend services at accessed 7. If required setup provisioned concurrency for the Lambda
• VPC Flow logs: To identify network issues relating to function
function execution
• Lambda Insights: Collects system level metrics such as
CPU, memory and diagnostics information pointing to
Lambda cold starts and worker shutdowns
Lambda function metrics

Invocation metrics Concurrency metrics


• Invocations • ConcurrentExecutions
• Errors • ProvisionedConcurrencyExecutions
• Throttles • ProvisionedConcurrencyUtilization
• ProvisionedConcurrencyInvocations • UnreservedConcurrencyExecutions
• ProvisionedConcurrencySpillOverInvocations

Performance metrics Async Invocation metrics


• Duration [supports percentiles] • AsyncEventsReceived
• PostRuntimeExtensionDuration • AsyncEventsAge
• IteratorAge [Kinesis Data Streams, DDB Streams] • AsyncEventsDropped
• OffsetLag [Kafka event sources]

• ConcurrentExecutions is the number of function instances that are processing events. If this number reaches your concurrent executions quota for the
Region, or the reserved concurrency limit on the function, then Lambda throttles additional invocation requests.
• OffsetLag for Kafka streams is the difference in offset between the last record written to a topic and the last record that your
function's consumer group processed
• IteratorAge metric measures the time between when a stream receives the record and when the event source mapping sends
the event to the function.
• AyncEventsAge metric represents the time between when Lambda successfully queues the event and when the function is
invoked. The value of this metric increases when events are being retried due to invocation failures or throttling.
Lambda invocation styles
Push model (via Lambda API) Pull model (pick from a stream or queue)

Internally Lambda service receives the


request and puts into an event queue and
acknowledges the request. Another
process picks the event from the queue
and triggers Lambda function

Poll based invocation model is


designed to allow you to integrate
with AWS Stream and Queue
based services with no code or
server management.
Lambda service will poll the
supported services on your
behalf, retrieve records, and invoke
Lambda synchronous invocations

• Lambda service sends the events directly to the function, waits for the response and sends the function's response back
to the invoker
• For functions with a long timeout, your client might be disconnected during synchronous invocation while it waits for a
response. Configure your HTTP client, SDK, firewall, proxy, or operating system to allow for long connections with timeout
or keep-alive settings
Lambda Asynchronous innovation model
Service such as S3, CloudWatch logs, EventBridge, SNS invokes Lambda functions asynchronously
These services invoke Event invoke Frontend Service of Lambda and send the event which is then authenticated and authorized

• Authorized request will be sent to an internal event queue implemented using SQS and then the caller is acknowledged

• These internal queues are constantly being polled by a number of Lambda pollers, once a message is detected in the
internal queue allocated to the poller, it uses the same synchronous invocation mechanism used when Lambda is
synchronously invoked by a service and sends the event to the function
Failure handling in Asynchronous invocations

Supports two built-in mechanisms to handle invocation errors - Dead Letter Queues (DLQ) and Destinations.
1. Dead Letter Queue (DLQ): Give you more control over message handling for all asynchronous invocations, including
those delivered via AWS events (S3, SNS, IoT, etc). Setup a DLQ by configuring the 'DeadLetterConfig' property when
creating or updating your Lambda function. You can provide an SQS queue or an SNS topic as the 'TargetArn' for your
DLQ, and AWS Lambda will write the event object invoking the Lambda function to this endpoint after the standard
retry policy (2 additional retries on failure) is exhausted.
2. Destinations: Gives you the ability to handle the Failure of function invocations along with their Success. When a
function invocation fails, such as when retries are exhausted or the event age has been exceeded (hitting its TTL),
Destinations routes the record to the destination resource for every failed invocation for further investigation or
processing.
• Destinations provide more useful capabilities by
• passing additional function execution information, including code exception stack traces
• Supports more destination services i.e. Lambda as a destination in addition to SNS and SQS
• Destinations and DLQs can be used together and at the same time although Destinations should be considered a more
preferred solution. If you already have DLQs set up, existing functionality does not change and Destinations does not
replace existing DLQ configurations. If both Destinations and DLQ are used for Failure notifications, function invoke errors
are sent to both DLQ and Destinations targets.
Lambda Asynchronous invocations

• If the function fails, Lambda automatically retries three times


• If the function cannot process the events due to concurrency issues or errors of the Lambda service, after returning HTTP 429 or 500
error, Lambda throttles the requests, puts the event back to the event queue and retries up to six hours with an exponential back-off

Tuning Async invokes

• It is possible to configure a destination for invocation records, separately for successful runs as well as failures (see diagram, where upon
a successful run, invocation record is pushed to the Event Bus of EventBridge)
Lambda Destinations for Async invocations
Enabling further actions based on result of Lambda function w/o need for coding

On success:
You can use this to monitor the
health of your serverless
applications via execution status
or build workflows based on the
invocation result.
On Failure:
Use either Lambda Destinations
or DLQs. Provide detailed failure
traces

• Route asynchronous function results as an Execution Record to a destination resource without writing additional
code.
• Execution record: [version, timestamp, request context, request payload, response context, and response payload]
• For each execution status such as Success or Failure you can choose one of four destinations: another Lambda function, SNS, SQS, or EventBridge.

• Lessen the coding effort to realize event-driven microservices architectural pattern using Lambda. Now Lambda
functions can communicate with each other via Destinations which are ideally suited for asynchronous communication
Lambda stream based (poller) invoke model

With Stream-based invoke model, Lambda function actively polls the stream/queue for events and then process them

• Lambda pollers will be configured to poll on the event sources defined in line with event source mapping settings.
• Pollers read the message/event record from the source, then filters them, batches them and invokes Lambda frontend
invoke service synchronously
• SQS and SNS are supported as event destinations for Kinesis data streams and DynamoDB streams
Lambda Event Source Mapping

Event Source mapping is a Lambda resource that reads


from an event source and invokes the Lambda function

• In the Lambda poller invocation model, Lambda


service pollers will read the message from the event
source and this behavior can be tuned using event
source mapping settings.
• Depending on the event source different tuning
parameters are available
Lambda Event Source Mapping

• When Lambda function is configured to read from an Event Source (Kinesis


Streams, DynamoDB Streams, SQS), Lambda service invokes the function for each
message or record or batch of them
• The Event Source Mapping defines how the Lambda service handles incoming
messages or records from the event source.
• An Event source mapping is an AWS Lambda resource that reads from an event
source and invokes a Lambda function. An Event source mappings process items
from a stream or queue for AWS services that don't invoke Lambda functions directly.
• Event source mapping component of the Lambda service uses the Lambda
execution role permission to read from the event source service (Kinesis Data
Streams, SQS queue etc.). It maps a function to a Stream or a Queue
Filtering event sources for Lambda functions

• Event filtering in an Event source mapping configuration allows Lamba service to filter messages or records before
invoking the Lamba function. This reduces the calls made to the Lambda function, simplifies code (you can now remove
the code to check the message or record to see if it should be processed) and reduced cost

• Supported for DynamoDB Streams, Kinesis Data Streams and SQS, filtered out messages will be read from the stream or
queue (deletes from event source in the case of SQS) and ensure they are not required any further

In the example, event filtering is used to ensure the correct Lambda


function is called depending on the event type
Tuning Lambda Stream invocation
Kinesis Data Streams and DynamoDB streams

Up to five (5) functions per stream*


Tuning retries
Once a second polling 1. MaximumRetryAttempts:

2. MaximumRecordAgeInSeconds:

3. BisectBatchOnFunctionError:
 Addresses the Poisson Pill problem where a single corrupt message can stall
processing from an entire shard
 Split the batch of stream records received such that only corrupted stream
record can be discarded. This allows you to easily separate the malformed data
record from the rest of the batch, and process the rest of data records
*In fact five consumers per stream includes KDFH and
Kinesis Analytics consumers
successfully

Up to two (2) functions per stream


Tuning for performance
Four times a second polling
4. Handling high throughput streaming of records
 Increase Parallelization factor (<10): In order to keep up with the high
throughput stream records, parallelize the invocation of Lambda function to
process the stream records from a single shard

5. Handling low throughput streaming of records


 Increase MaximumBatchingWindowInSeconds: ensures that Lambda service
waits for a defined time to ensure adequate number of records has arrived. Use
when latency is not a concern.
Tuning Lambda Stream invocation

Kinesis Data stream processed by Lambda stream pollers Inside a Lambda stream poller

• Stream pollers of Lambda service picks a record or batch or records from the stream and synchronously invokes Lambda functions.
• Multiple Lambda functions along with other consumers can consume the stream
Lambda for Streaming Analytics
Lambda networking - VPC Attached Lambda functions
Accessing public services for VPC attached Lambda functions

• Allow control of access to public services for Lambda functions


• Lambda function can only be invoked via Lambda service public API (via public Internet or Interface
endpoint), use of VPC to VPC NAT (V2N) ENI is to facilitate Lambda functions to integrate with VPC services
or on-premise services
• ENIs are created during function creation time to avoid long cold starts if done during invocation time
Default Lambda function vs. VPC attached Lambda function

• Strict exfiltration requirements: You need to assign policies to what traffic can go out from the VPC etc.
• Specific IP address: All traffic originating from Lambda function must come from a specific IP address, for
instance for traffic inspection purpose, by assigning the originating IP address into an allow-list of a traffic
inspection appliance
Lambda networking - Interface Endpoints for Lambda service

• VPC resources can establish connections with Lambda functions privately using interface endpoints without the need for
NAT Gateways or Public IP addresses and Internet Gateways
• You can call any of the Lambda API operations from your VPC. For example, you can invoke
the Lambda function by calling the Invoke API from within your VPC.
• Lambda purges idle connections over time, so you must use a keep-alive directive to maintain persistent connections.
Attempting to reuse an idle connection when invoking a function results in a connection error. To maintain your persistent
connection, use the keep-alive directive associated with your runtime.
Lambda Networking Best Practices

1. Right-size your VPC with sufficient IP addresses to


accommodate Lambda function V2N ENIs
2. Use Lambda IAM condition keys to control Lambda
function access to specific VPC’s
• lambda:VpcIds,
• lambda:SubnetIds
• lambda:SecurityGroupIds
Note: VPC endpoints are way cheaper compared to accessing AWS
public services via NAT Gateway
Using Lambda function condition keys to control access to VPC resources

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt159186333252",
"Action": ["lambda:CreateFunction","lambda:UpdateFunctionConfiguration"],
"Effect": "Deny",
"Resource": "*",
"Condition": {"ForAllValues:StringNotEquals": {"lambda:SubnetIds": ["subnet-046c0d0c487b0515b","subnet-
091e180fa55fb8e83"]}}
},
{
"Sid": "Stmt159186333253",
"Action": ["lambda:CreateFunction","lambda:UpdateFunctionConfiguration"],
"Effect": "Deny",
"Resource": "*",
"Condition": {"ForAllValues:StringNotEquals": {"lambda:SecurityGroupIds": ["sg-0a56588b3406ee3d3"]}}
}
]
}

• Associating the above IAM policy as an Service Control Policy to your account allows Lamda service to associate Lambda functions only
with the indicated Subnet/Security Group combinations
• Allows access to the specific private VPC only
• Allow access to MySQL databases which are in the given security groups in two AZ’s
Lambda security model It’s a resource-based policy

Execution Role: What Lambda function can do Function Policy: Who can invoke the Lambda function

• Execution role is created and assigned when the Lambda function is created
• Function policy is created when you add a trigger to the Lambda function
Use least privilege, for example above policy allows API gateway to invoke the Lambda function
only when a particular method is invoked
Lambda security best practices

1. Best practices for defining Lambda execution role

2. Use smaller functions and apply least privilege


• Best practice to define Lambda functions per API method
• This way Lambda function execution role can be limited to exactly what
method execution expects. For example, execution policy for GET /users
may only have read permissions to DynamoDB table or RDS table and not
write permissions. If one Lambda function runs for both GET and POST
methods, even for GET execution write permissions to the downstream
databases may be granted
Lambda security best practices

3. Store secrets used by Lambda code in Secrets manager


• Do not store secrets in environment variables of the function. Since these
needs to be managed per function basis, it can be very complicated. Also,
even if those environment variables are encrypted by Lambda service,
they are accessible to anyone who has access to the function

4. Centrally managed database credentials using secrets manager


• Lambda can use IAM based permissions to access database as RDS Proxy
will manage the database secrets
Lambda security best practices

5. Secure coding practices for Lambda function development

6. VPC enable Lambda functions if they want to access VPC resources


RDS Proxy for Lambda
Reduces connection handling load from RDS typically to support serverless concurrency

• Developer experience: no need to handle connection pooling, clean-up of idle connections etc. - lean code
• Load on database goes down, hence same database footprint to handle more connections
• Integrates with Secrets Manager for simple authentication
Lambda versions and aliases
Lambda deployments using CodeDeploy
• Lambda is integrated with CodeDeploy for automated rollout
with traffic shifting. CodeDeploy supports multiple traffic
shifting options
1. Canary
2. Linear
3. All at once
• CodeDeploy also supports Alarms and Hooks during
deployment
• Alarms: These instruct CloudWatch to monitor the deployment
and trigger an alarm if any errors occurred during rollout. Any
alarms would automatically roll back your deployment.
• Hooks give you the option to run pre-traffic and post-traffic test
functions that run sanity checks before traffic-shifting starts to the
new version and after traffic-shifting completes

• SAM template can be used to configure CodeDeploy traffic


shifting policies
• uses CodeDeploy under the covers to implement Lambda Alias
Pre-defined deployment configurations supported by Code Deploy for Lambda deployments traffic shifting feature to implement canary deployments
Lambda Best Practices
Lambda encrypts environment variables
Summary of Lambda Best Practices Function Configuration
• Performance test your Lambda function to determine optimal memory
configuration
Function Code
• Load test your Lambda function to determine optimal timeout value
• Take advantage of execution environment reuse to improve the performance of
• Use most restrictive permissions when setting IAM policies
your function. Initialize SDK clients and database connections outside of the
function handler. • Be familiar with Lambda quotas do not forget the payload size
• To avoid potential data leaks across invocations, don’t use the execution • Delete Lambda functions that are no longer in use as they count towards
environment to store user data, events, or other information with security deployment package size limit
implications.
• When using SQS as an event source ensure Visibility Timeout of the queue does
• Use a keep-alive directive to maintain persistent connections. Lambda purges not exceed estimated function invoke time.
idle connections over time.
• Use environment variables to pass operational parameters to your function. Logging, monitoring & streams
• Control the dependencies in your function's deployment package. • Use CloudWatch metrics and Alarms instead of creating or updating a metric
• Minimize your deployment package size to its runtime necessities. from within your Lambda function code.

• Reduce the time it takes Lambda to unpack deployment packages authored in • Leverage your logging library and AWS Lambda Metrics and Dimensions to catch
Java by putting your dependency .jar files in a separate /lib directory. app errors.

• Minimize the complexity of your dependencies. Prefer simpler frameworks that • Use Cost Anomaly Detection to detect unusual activity on your account.
load quickly on execution environment startup. • Test with different batch and record sizes so that the polling frequency of each
• Avoid using recursive code in your Lambda function, wherein the function event source is tuned to how quickly your function is able to complete its task.
automatically calls itself until some arbitrary criteria is met. • Use Batch window to avoid invoking the function with a small number of
• Do not use non-documented, non-public APIs in your Lambda function code. records

• Write idempotent code. • Increase Kinesis stream processing throughput by adding shards.
• Use CloudWatch on IteratorAge to determine if your Kinesis stream is being
processed. For example, configure a CloudWatch alarm with a maximum setting
to 30000 (30 seconds).
Lambda Best Practices

• Memory
• Power of the Lambda function is determined by the memory allocated to the function
• Use Lambda Power tuning tool to load test against different memory configuration and determine the optimal memory footprint for the function
• Adding more memory will reduce to overall cost of running the function up to a point, beyond which cost will start increasing
• Timeout
• Dictates how long a function can run before Lambda terminates the Lambda function (< 900s)
• Most functions will fail fast before the maximum timeout is reached, hence it is important to determine the optimal timeout value. Function will
be charged for the duration it is running
• Load testing is the best way to determine the optimal timeout value for the function
• Concurrency
• Concurrency is the number of invocations the function runs at any given moment
• Three types: Unreserved concurrency (at least 100 per account/region), Reserved concurrency and Provisioned concurrency
• Limit the concurrency in line with the ability of backend resources to handle the peak workload
• Reserve the concurrency for critical functions that have to honour SLA’s by handling the peak workload
• Provision concurrency to cater to temporary load increases
Lambda Best practices
Best practices for testing Lambda functions to tune memory, timeout and concurrency

1. Run performance tests that simulate peak levels of invocations.


• View the metrics for the amount of throttling that occurs during
performance peaks.

2. Determine whether the existing backend can handle the speed


of requests sent to it.
• Don't test in isolation. If you’re connecting to Amazon Relational
Database Service (Amazon RDS), ensure that you test that the
concurrency levels for your function can be processed by the
database.

3. Does your error handling work as expected? CloudWatch metrics relating to concurrency
• Tests should include pushing the application beyond the
concurrency settings to verify correct error handling.
Lambda Best Practices Using purpose-built services instead of Lambda
Use concise function logic

Evaluate the above AWS services based on following parameters,


and select the best one that suites the serverless application

• Ensure that purpose-built services are used for event transport and not Lambda, use
Lambda only when a transformation logic needs to be applied
• Keep memory foot-print small by reading only what is required using filters
• Ensure I/O is optimized in the target service to prevent Lambda waiting beyond
what it should
• Different concurrency models supported by the AWS event transport services may
need to be considered when selecting how the events may be pushed to Lambda
• Stream based transport allows massive concurrency using batches and the number of
shards concurrency can easily be scaled
• SNS or API Gateway can push events faster across the Lambda functions
Lambda Best Practices
Increasing per function compute power

• Choosing more memory would increase the compute cost, but in most cases would disproportionately reduce function
execution time hence overall cost
• Multi-threading the function may achieve gains if the function is CPU bound or I/O bound when you increase memory beyond
1.8GB (attracting additional CPU cores)
Lambda Best Practices
Keep orchestration logic outside function and only do
business logic

• Orchestration logic inside function will unnecessarily adds to the execution time. Handover the workflow orchestration
logic to Step functions
Lambda Best Practices
Tune per function concurrency

• This is done by setting up Reserved Concurrency


• When per function concurrency is set to a value “n”, total concurrency for the region is reduced by “n” and all remaining
function executions will share a pool of concurrency of “100-n”
• Controls the number of execution environment that can run in parallel per function so that downstream services are not
getting overwhelmed
• Use the kill switch during planned downtimes to indicate upstream services to pause
Serverless Application Design Best Practices
Lambda scaling considerations for Synchronous
invocations

Cost considerations:
• Lambda execution time
• Lambda invocation rate
• CW logs/metrics
Prevent overloading backend for synchronous invocations

Implement API Gateway throttling


(Need to implement client retry)

• Load secrets in pre-handler code


• Cache secrets with expiry

• Use RDS Proxy


* • Migrate to Aurora
serverless (use Data API)
• Migrate to DDB

*
Use CW embedded metrics
(Push to CW logs to reduce rate)

• If Synchronous invocation is really required, implement the enhancements at each dependent service
• Check if Synchronous really required? Do you need to know the request was processed successfully or durably stored for
processing is sufficient? If so resort to Asynchronous invocation patterns.
* • If feedback on process state is required, still use asynchronous invocation and use [Polling | Webhooks | Web Sockets]
Think Asynchronously
Store first, process later

“StoreFirst”
“Store First”

“Process Later”

• Store the request durably in an appropriate AWS service (SQS, SNS, EventBridge, S3) and then process later
Converting Synchronous to Asynchronous

Note: DDB streams is used to trigger events to start translation and once
finished to start transcription

Input payload
Use Lambda destinations to consolidate error handling

• Rather than handling errors in each Lambda function separately, it will be cleaner to handle them via a Lambda destination.
• Lambda destinations are supported on EventBridge, another Lambda function, SNS and SQS.
• Define another Lambda function as the destination for the Lambda functions executing business logic. Error handling
Lambda function can persist the errors in a DDB table or a queue which can be serviced and appropriate corrective actions
can be taken periodically by another Lambda function
Offload non-business processes to the right AWS managed services

• Do not build Lambda into a monolith. In order to reduce cost and efficiency of Lambda processing, only get
business logic into the Lambda function. Offload all other processing into purpose-built AWS services
Implement Zero Trust by implementing micro-perimeters

• By moving away from large monolithic Lambda functions into smaller purpose-built Lambda functions and offloading other processing logic
into AWS services, it is possible to make the security perimeter smaller and attack surface smaller. Now these micro-perimeters can be
protected using Lambda function policies and access policies
• Decouple business logic from security posture by implementing roles and permissions.
Combine best practices in real-life applications

• Order manager service, converts the ordering request into an event which will be handled via step function workflow
• Barista staff gets updates on each step completed in the order fulfilment process by subscribing to a Web Socker connection implemented by AWS Manage
service IoT Core (without building the web socket connection programmatically)
Optimize Serverless
When API Response is not required to continue processing

1. Integrate API Gateway with SQS, Lambda to pick events from SQS 2. Offload Orchestration code to Step functions
Optimize Serverless
When API Response is required to continue
processing

1. Client polling

2. Client is notified through Web Sockets 3. Client is notified via Web Hook
Note: Can be used when the client is trusted (allowing to subscribe via SNS)
Serverless security best practices
Serverless Application Use Cases
Serverless Application Use Cases
Event-driven web application backend
Serverless Application Use Cases
Event-driven real-time file processing
Serverless Application Use Cases
ETL pipeline for daily air quality measurements (min-max-avg.)
Serverless Application Use Cases
Creating an index for files using meta data and store in ES for fast indexing
Serverless Compute

Serverless Application Model (SAM)


Opensource framework used to build Serverless Applications in AWS
SAM template: Infrastructure as Code for Serverless applications
SAM CLI: Tool for local development, testing and deployment of Serverless applications
Serverless Application Model
CloudFormation extension optimized for serverless

• Step functions
• HTTP APIs
• REST APIs
• DynamoDB tables
• Lambda layers
• Applications

SAM Template = [Lambda functions + Event Source Mappings + Serverless Resources]

Benefits:
1. Allows deploying all related resources for Serverless application as a single versioned entity. In a single deployment configuration it is easy to share
configurations, such as timeouts, memory size etc. across a number of resources
2. Since SAM template define infrastructure config, it is easy to enforce best practices such as code review style verification, use of AWS tooling such as
CodeDeploy for safe deployment and AWS-Xray for enabling tracing
3. Since SAM CLI provides a Lambda-like local execution environment, it provides way to catch early in development cycle, the issues that may arise in
executing your code in cloud
4. Provides deep integration with development tools – CodeSuite, AWS Cloud9 IDE, Jenkins plugin etc.
SAM templates
AWS SAM Transform
Tells CloudFormation to transform the code in SAM template

• SAM is built on-top of CloudFormation, when CloudFormation encounters a SAM template (it recognizes the same through “Transform:”
section), it examines the code and converts the code that relates to serverless application into CloudFormation code (create
CloudFormation resource types from serverless resource types) and uses any non-serverless code as it is
• SAM templates can also contain non serverless resources
AWS SAM Serverless resource types (7)

1. aws::serverless::function

Attach Policy
template to the
Lambda function

Dynamically pass the


table name

AWS resources that


acts as Event Sources
For Lambda
Pre-defined policy templates to be
used in SAM templates
AWS SAM Serverless resource types
2. aws::serverless::Api 3. aws::serverless::Httpapi 4. aws::serverless::SimpleTable

5. aws::serverless::LayerVersion 6. aws::serverless::StateMachine

7. aws::serverless::Application
AWS SAM Globals

Applies to every function in the template

Only function specific parameters are


defined within function resource type
SAM Best Practices - Re-usable templates
Single template for consistency, use parameters for environment specifics

Externalize environment
specific parameters to
Secrets Manager or SSM
Parameter store
SAM Best Practices - Re-usable templates
Use pseudo parameters and intrinsic functions

Generate certificate if
“CreateCert” condition is
Pseudo parameters true

Intrinsic functions

• Intrinsic functions?
• Pseudo parameters?

Ref. https://www.youtube.com/watch?v=QBBewrKR1qg
SAM CLI
• Deploys Docker container locally with capabilities to test and debug serverless applications and validate SAM templates
• You can locally test and "step-through" debug your serverless applications before uploading your application to the AWS Cloud.
• In the SAM CLI debug mode, you can attach a Debugger and step through the code line by line, see the values of various variables, and fix issues the
same way you would for any other application.
• You can verify whether your application is behaving as expected, debug what's wrong, and fix any issues, before going through the steps
of packaging and deploying your application.
• Since SAM CLI emulates the Lambda service endpoint locally, it is easy to author integration tests, run them against the Lambda function
locally verify its functionality before deploying into Cloud. Same integration tests can be modified to test the same function in the cloud
SAM CLI
• CLI tool for local development, debugging, testing, deploying and monitoring of serverless applications
• Supports API Gateway “proxy-style” + Lambda Service API testing
• Response objects and function logs available on your local machine
• Lambda execution environment is mimicked using docker lambda images
• Can help build native dependencies
Fargate compute & memory (Standard and SPOT)
• vCPU-hours (per second billing)
• GB-hours (per second billing)
Ephemeral storage:
• GB-hours (per second billing)

AWS Fargate
Serverless compute engine for ECS and EKS
AWS Fargate
• AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building
applications without managing servers.
• Fargate is compatible with ECS and EKS
AWS ECS Fargate mode
Deploying a container image

1.Push the Docker image to Amazon Elastic Container Registry (ECR).


2.Create a task definition based on the above image with the desired CPU, memory, and port
configuration.
3.Create a Fargate cluster associated with a VPC and subnet.
Note: The cluster will not run EC2 instances but is used for routing the traffic to the workload.
4.Launch an ALB and point the listener to the container port.
5.Create a service definition with desired task count and associate it with the ALB
ECS storage options in Fargate mode

Option-1: Layered storage (container specific)

• Both as ephemeral storage options. For persistent


options use RDS, DynamoDB or S3

Option-2: Volume storage (shared across containers)


IAM permissions for Fargate tasks 2. Application permissions (Task Role)
[Permissions required by task to connect to AWS services]

1. ECS Cluster permissions 3. Housekeeping permissions


[Task execution permissions, Task creation permissions etc.] [Permissions required by task to connect to AWS services]
Housekeeping permissions for Fargate tasks

1. Execution Role for house-keeping tasks


[write to CW logs, pull images from ECR]

2. Service-linked role for ECS service


[creating ENI and associating with ELB]
Fargate pricing options
With AWS Fargate, you pay only for the amount of vCPU, memory, and storage resources consumed
by the containerized applications.

Fargate compute savings


plans

Fargate spot

• Fargate provides compute capacities that are very granular and can closely match the resource requirements.
• Per second billing with one minute minimum
• Tasks and Pods will run on right-sized compute environments. More than 50 different task/pod configurations available
• vCPU and memory resources are calculated from the time your container
images are pulled until the Amazon ECS task or EKS pod terminates, rounded
up to the nearest second. A minimum charge of 1 minute applies.

• 20 GB of ephemeral storage is available for all Fargate Tasks and Pods by


default - only pay for any additional storage that you configure.
Fargate use-cases
• Fargate is well suited for situations where it is required to minimize the overhead of managing EC2 instances,
patching them.
1. Large workloads requiring minimum management overhead
2. Small workloads with occasional bursts
3. Tiny workloads that does not fit into a smallest of the EC2 instances
4. Batch workloads and any other workloads that are periodic
• For large workloads with consistent demand for CPU and memory, benefits of Reserved instances may outweigh
the benefits of Fargate
• Some of the workloads that benefits from Fargate launch type are
• Web Service workloads
• Data processing
• Inference workloads
• API Server workloads
Compute Engine comparison
AWS offers three compute engines - EC2, Fargate and Lambda. Each compute engine is optimized for specific type of
workload.
• EC2 is a good choice for steady state workloads that benefits from specialized instances and does not have faster scaling
requirements. It has the highest operational overheads in terms of managing the infrastructure and security of the OS.
• Fargate is a good choice for workloads that have faster scaling requirements, and you are looking for reducing operational overhead.
It supports sub-CPU configurations and suited mostly for dynamic container-based workloads.
• Lambda is designed for event-driven workloads that runs in bursts with lots of idle time. It scales the fastest and has specific support
for consuming and processing events.

EC2 Lambda
 Workloads with little to no idle Fargate  Workloads with long idle periods
 Workloads with little to no idle time  Minimizing operational overhead
time.  Minimize operational overhead  Security posture maintenance
 Steady state and predictable
 Security posture needs to be limited need to be limited - only secure
 Workloads that would benefit from
–only secure container image the application code
specialized CPUs or GPUs not yet  Faster scaling requirements  Faster scaling requirements
available for Fargate / Lambda.  Burst handling capability

Cost considerations
1. When right sized with constraints, EC2 has the best cost.
2. When constraints are smaller than the smallest EC2 instance, then Fargate’s flexibility of rightsizing provides better cost.
3. Lambda starts saving money over EC2 once it runs half or less of the time.
4. Lambda saves money over Fargate once it runs a quarter or less of the time.
Elastic Container Registry (ECR)
AWS managed container image registry service that is secure, scalable, and reliable
Elastic Container Registry
Fully managed Container artifact registry service. ECR supports private repositories with resource-based permissions using
IAM so that specified users or EC2 instances can access the container repository and images. It supports integration with all
container orchestration platforms (ECS, EKS, Self-managed) and compute platforms (EC2, Fargate, On-prem). With ECR you
can use your preferred CLI to push, pull, and manage Docker images, Open Container Initiative (OCI) images, and OCI
compatible artifacts.
Features of ECR:
• Lifecycle policies help with managing the lifecycle of the
images in your repositories. You define rules that result in
the cleaning up of unused images. You can test rules
before applying them to your repository.

• Image scanning helps in identifying software vulnerabilities


in your container images. Each repository can be
configured to scan on push. This ensures that each new
image pushed to the repository is scanned. You can then
retrieve the results of the image scan.

• Cross-Region and cross-account replication makes it easier for you to have your images where you need them. This is configured as a
registry setting and is on a per-Region basis. For more information, see Private registry settings.

• Pull through cache rules provide a way to cache repositories in an upstream registry in your private Amazon ECR registry. Using a pull
through cache rule, Amazon ECR will periodically reach out to the upstream registry to ensure the cached image in your Amazon ECR
private registry is up to date.
Components of ECR
ECR consists of repositories which are used to securely store container images. Governance of the ECR repository is managed
by policies.
1. Repository policy defines the who is permitted to access the images in the repository
2. Lifecycle policy governs how many version of each image (tagged or untagged) shall be maintained in the repository.

• Registry: ECR private registry is provided to each AWS


account; you can create one or more repositories in your
registry and store Docker images, OCI images, and OCI
compatible artifacts in them.
• Authorization token: A client must authenticate to an Amazon
ECR private registry as an AWS user before it can push and pull
images.
• Repository: ECR repository contains your Docker images, OCI
images, and OCI compatible artifacts.
• Repository policy: You can control access to your repositories
and the contents within them with repository policies.
• Image: You can push and pull container images to your
repositories. You can use these images locally on your
development system, or you can use them in Amazon ECS task
definitions and Amazon EKS pod specifications.
ECR Image scanning
Scans ECR image for security vulnerabilities. Supports two modes of scanning.

Basic Scanning:
• Free scanning activated only upon image push
• If the image needs to be scanned again, it needs to be pushed
again.
• Scans only operating system runtime using hosted scanning
software called Clare

Enhanced Scanning:
• Amazon Inspector executes the scanning of the image once
pushed to the registry and every time it finds a vulnerability
through its wide-range of vulnerability feeds (Continuous
scanning)
• Can enforce scanning on multiple accounts with AWS
Organizations integration.
• Scan not only OS software (runtime), but also programming
language packages per image layer
ECR Cross Region Replication
Allows replicating ECR private registries across regions and accounts. When turned on, all private ECR repositories in the
registry will automatically copy images to multiple other repositories in different accounts and/or regions, reducing pull
latency that make your containers start up faster as they can now pull images in-region.
ECR Pull through cache
Images from public registries are used in most of the container applications, either during build to use as the base image or
to use as a side-car for the container application. Pull through cache creates a repository that caches the images from public
registry and makes those images directly from the private registry of your own.

First pull creates the cache

Subsequent pulls reads the cache


ECR Public
Highly available public registry where images can be shared across organizations. Provider organization publishes images and
consumer organization and access them from the ECR Public gallery and consume them as from any other public registry.

• Once the image is pushed to the ECR public, AWS internally replicates them to other regions and provisions CloudFront
distributions for faster image access.
• Docker official images are available to download from ECR public gallery
Elastic Container Services (ECS)
Highly scalable container management service that makes it easy to run, stop and manager
containers on a cluster
AWS proprietary Container orchestration engine
Container compute

ECS mode Fargate mode


ECS Terms

• Task is a collection of containers that needs to be managed


together. Task usually represents an application or an
application component.
• Task definition is a blueprint for the task. It defines the
containers (image, how much of CPU and memory is
required, which ports to expose etc.) that makes up the task
and how those containers interact with each other
• Service is one or more tasks to run from a given task
definition
• Service definition is a blueprint for the service. It defines
minimum and maximum number of tasks to run, whether
they can run on the same container instance or not, how
they should be auto-scaled
• Cluster is a collection of one or more container instances
that will be managed by ECS control plane
ECS Task Definition
• The Docker image to use with each container in your task
• How much CPU and memory to use with each task or each container
within a task
• The launch type to use, which determines the infrastructure on which
your tasks are hosted (Fargate, EC2, External)
• The Docker networking mode to use for the containers in your task
• The logging configuration to use for your tasks
• Whether the task should continue to run if the container finishes or
fails
• The command the container should run when it is started
• Any data volumes that should be used with the containers in the task
• The IAM role that your tasks should use

• Single task definition can support up to 10 container definitions


• Collocate the containers under the same task definition if they do not want to
scale independently
• Scaling unit is ECS tasks, ECS Service defined the desired number of tasks for
the application
ECS-Managed Tags
With Amazon ECS-managed tags, Amazon ECS automatically tags all newly launched tasks and any Amazon EBS volumes
attached to the tasks with the cluster information and either the user-added task definition tags or the service tags.

Enabling ECS Managed tags in Task Definition Tagging supported by ECS resources

• When ECS Managed tags option is enabled and when you launch
• A standalone task: ECS will automatically add following tags
• aws:ecs:clustername = <name of the cluster>
• All task definition tags added by the users
• An ECS service: ECS will automatically add the following tags
• aws:ecs:clustername = <name of the cluster> and aws:ecs:servicename = <name of the ECS service>
• All task definition tags added by the users
• All service tags added by the users
ECS Launch types

launch type: Fargate Launch type: EC2


ECS launch types - External (ECS Anywhere)

Amazon ECS Anywhere provides


support for registering
an external instance such as an
on-premises server or virtual
machine (VM), to your Amazon
ECS cluster.

External instances are optimized for


running applications that generate
outbound traffic or process data.

External launch type allows you to


create services or run tasks on your
external instances.

Launch type: External


Operations in ECS clusters

Building and publishing a container image into ECR

Scheduling of tasks
ESC Container Agent
(Fargate Agent in Fargate mode)

1. ECS control plane requests ECS agent to schedule a task


2. ECS agent pulls the image from the registry as per the task definition
3. ECS agent schedules the tasks in the EC2 instance
ECS Control plane vs. Data plane

In Fargate mode EC2 instances are pre-provisioned in a number of VPC’s


maintained by AWS with Amazon Linux OS pre-installed “fargate agent” and
docker engine
AWS ECS underlying architecture

ECS objects and how they relate


Data Plane

ECS agent allows communication with the


EC2 instances in the cluster to start, stop,
and monitor containers as requested by a
user or scheduler.
Cluster management engine co-ordinates
the activities in the cluster based on the
Control Plane cluster state. It manages the state by storing
state information (single truth) in a key/value
store where a journal of state changes are
maintained with optimistic locking

Architectural considerations:

• Cluster management engine is decoupled from scheduler, allowing custom schedulers to schedule the Tasks such that based on user defined metrics,
resources can be allocated – service priority (container running high priority service will be scheduled in before a low priority request)

• Cluster manager maintains state in a key value store which maintains a time ordered sequence of state transitions and manages the concurrency control
of two requests for cluster resources using optimistic concurrency control. This optimistic model helps maintain high availability, scalability of the cluster
and low latency for state transitions

Ref. https://www.allthingsdistributed.com/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html
ECS Task Placement strategies
Allow developers more control over where the Tasks should run 1. Placement constraints

2. Placement strategies

• Default placement will be spread across AZs and place on the instance with least
number of tasks running
• Placement strategies can be chained to ensure tasks are placed in the way the
workload requires
• Identification of the placement strategies and constraints (which instance type,
which AMI etc.) needs to be done after carrying out load testing on the container
based application

Chaining of placement strategies


ECS Task Placement strategies 1. Task placement: Across AZ’s and then Binpack
How the placement engine decides where to place the Task

Available instances

Candidate instances 2. Task placement: Affinity and Anti-affinity

• A task placement strategy (binpack | random | spread) is an


algorithm for selecting instances for task placement or tasks
for termination.
• Task placement strategies can be specified when either running a
task or creating a new service.
• The task placement strategies can be updated for existing
services as well.
ECS Task Placement Engine
Applying constraints & placement strategies to a Service

Service placement: One service bin-packed, other spread-placed


Building custom scheduler for ECS

Blox is an Open Source custom scheduler developed. It uses stream of cluster state change events it receives
from ECS in order to make scheduling decisions
ECS Networking: bridge mode vs. awsvpc mode

Bridge mode:
• Relies on Docker internal networking
• Containers will not have routable IP addresses
• Containers share the instance ENI
• Performance challenges due to multiple translations
• Lack of fine-grained security controls
• SG cannot be applied at container level

Awsvpc mode (task networking mode):


• Container will have its own ENI, hence routable IP
address addressable outside VPC
• Higher performance
• Finer grained security controls as SG can be applied
at container level
ECS Networking in VPC mode
ECS log routing
Logs generated by the containers can be pushed to CloudWatch or any other destination using a log router. ECS natively
supports two log routers
1. Awslogs log driver routes the STDOUT and STDERR I/O streams to CloudWatch logs
2. FireLens log driver routes logs generated in the containers to AWS services (including CloudWatch logs) and Partner products

• ECS container agent implements the awslogs log driver to send container logs to
CloudWatch logs.
• Fargate launch type: Before your containers can send logs to CloudWatch, you must specify the
awslogs log driver for containers in your task definition.
• EC2 launch type: Ensure that ECS optimized AMI is used in the container instances. If a custom
AMI is used, ensure that ecs-init package is at 1.9.0-1 or higher

• FireLens for ECS runs as a side car and uses ECS task definition parameters to route logs to
an AWS service or APN destination for log storage and analytics.
awslogs driver configuration • FireLens works with Fluentd and Fluent Bit.
• Can append, transform and filter of log events before sending to the destination

Container definition for App that


uses FireLens for log routing

Container definition for FireLens side car


Observability in ECS
ECS Agent provides container level metrics and Fluent Bit provides the custom logs generated from inside the containers.
AWS Xray provides request/response tracing.
In an ECS environment, Envoy proxies can provide vital metrics to diagnose communication and performance problems for
inter-service communication. They can publish metrics to CloudWatch as well as trades to Xray.
ECS observability using Container Insights

Task level CPU and memory utilization metrics are pushed to


CloudWatch and can be visualized using CloudWatch
Container Insights
Configuring resources required for ECS tasks

ECS resources - CPU, memory and network - must be defined at Task level as well as at Container level

• When more than one subnet is defined, Fargate will


launch tasks in in multiple AZ’s
• Tasks will need Internet access to access ECR repository
to pull the images as well as push the CloudWatch logs
to CloudWatch service
• Tasks may have only outbound Internet access or public
Internet access
ELB integration with ECS tasks
ALB is the preferred mode of load balancing traffic to an ECS service.

• Configure ALB and Task security groups so that they can talk
to each other
• If the ECS is configured in AWSVPC mode, ECS tasks are
presented as ENI’s, therefore they shall be registered with
ELB as IP address targets
1. Create a target group of IP address type
2. Associate task definitions to the target group of the load
balancing by creating ECS service.

• Tasks will act like EC2 instances and ECS service will act like
Auto Scaling Group when configured as a Scalable resource
using AWS Application Auto scaling
Multiple ALB target group support for ECS

Allows you to attach a single Amazon ECS service running on either EC2
or AWS Fargate, to multiple target groups of the same ELB or multiple
ELBs.

• Use cases:
1. Allow same ECS service to be exposed via two Load Balancers:
typical use-case is to expose an ECS service to public internet as well
as internal VPC based clients. Refer diagram.
2. Expose multiple ports from the same container: for example, Jenkins
container may expose port 8080 for web interface and port 5000 for
API.
3. Expose multiple containers listening on different ports per task:
When a task consists of more than one container with each one
listening on different ports. Each container can be configured in a
different target group on the same ELB.
Interconnecting services in ECS
Supports multiple service discovery capabilities
1. ECS Service discovery: provides basic service discovery
capabilities using Route53. Under the covers, the services will be
registered with AWS Cloud Map which acts as an interface to add
DNS names of the services to a Route53 private hosted zone. No
support for traffic telemetry and other smart discovery
capabilities.

2. Elastic Load Balancer: Adds advanced traffic routing and traffic


telemetry capabilities but sits in the request path hence adds
latency to inter-service communication. Also adds provisioning
and management overheads makes.

3. App Mesh: Collects traffic telemetry, fine grained traffic control,


authentication and encryption. Added complexity as it requires
managing the proxy and needs to reason on the configuration.

4. ECS Service Connect: Evolution of ECS Service discovery and


offer simplified developer experience.
ECS Service Connect
Combines simplicity of ECS Service Discovery with traffic telemetry provided by ELBs and traffic resilience offered by AWS
App Mesh. Works across multiple ECS clusters in the same name space.

• Adds a layer of resilience to your ECS service


communication and get traffic insights with no
changes to your application code.

• Refer and connect to your services by


logical names using a namespace provided by
AWS Cloud Map and automatically distribute
traffic between ECS tasks without deploying and
configuring load balancers.

• Set safe defaults for traffic resilience, such as


health checking, automatic retries for 503
errors, and connection draining, for each of
your ECS services.
Reliable service-to-service Rich traffic telemetry
communication
• Use ECS console as an easy-to-use dashboards
with real-time network traffic metrics for
operational convenience and simplified
debugging.

Reliable deployments
Create an ECS Service Connect service
You enable Service Connect for an ECS service when it is created by adding a “ServiceConnectConfiguration” stanza. It defines
the namespace to which the ECS service will be registered in Cloud Map, friendly discoverable name and an optional Client
Alias. Client Alias can be used to override the discovery name and used in migration scenarios to retain the previous name

When an ECS service is created with Service Connect enabled, ECS carries out the
following tasks,
1. Requests ECS agent to initiate the task
2. Since it is a Service Connect enabled task, ECS agent instantiates Service Connect
Agent (which consists of an Envoy proxy and an agent, this agent monitors the
health of Service Connect agent, collects, aggregates metrics from Envoy proxy
and periodically sends them to CloudWatch)
3. ECS service configures the Service Connect agent based on the
ServiceConnectConfiguration provided
4. ECS fetches the existing services registered in the namespace from Cloud Map and
passes it on to Service Connect agent
5. ECS registers the discoverable name of the task in the namespace in Cloud Map so
that it is discoverable by other services
ECS Service Connect request flow
All communications to the tasks, ingress as well as egress, happens via ECS Service Connect agent. It essentially acts as a
proxy for each task.

• Any ingress connection to the task (10.1.2.1:8080) will automatically be redirected to the Service Connect agent of that task which will act
as a proxy to handle the required communication with the target container.
• Also, any outbound request to Service Connect enabled tasks will be redirected to the Server Connect agent running in that task.
Reliable inter-service communication

ECS Service connect handles failures and errors of the tasks in the target service transparently to the requesting service.

When a task has failed, Service Connect agent of the requesting When a task returns an error, Service Connect agent of the
task detects the failure and marks it as bad. The request will be requesting task detects the error and after a number of attempts
retried to another healthy task automatically and transparently to to connect, will mark it as bad and redirects traffic to a healthy
the requesting service task in the target service.
Robust deployments
When there is a new deployment, Service connect agent will detect the tasks which will be in de-provisioning stage and
redirects the request to healthy tasks in the Blue service and once all tasks are deprovisioned, the requests will be redirected
to the tasks in the Green service.
ECS Service Auto Scaling
ECS uses Application Auto Scaling to increase and decrease the task count for an ECS service automatically.
• Scaling policies supported
• Target tracking scaling policies: Increase or decrease the number of
tasks that your service runs based on a target value for a specific metric.

• Step scaling policies: Increase or decrease the number of tasks that your
service runs based on a set of scaling adjustments, known as step
adjustments, that vary based on the size of the alarm breach.

• Scheduled Scaling: Increase or decrease the number of tasks that your


service runs based on the date and time.

• Scaling during deployments: Application Auto Scaling turns


off scale-in processes while Amazon ECS deployments are in
progress. However, scale-out processes continue to occur,
unless suspended, during a deployment.

• To disable scale-out processes during deployments, suspend


the scaling activity
• Call the register-scalable-target command, specifying the resource ID, namespace,
and scalable dimension. Specify true for both DynamicScalingInSuspended and
DynamicScalingOutSuspended.

• When the task count is increases, if there is a need for additional underlying
compute capacity
• Compute capacity will be automatically increase/decrease for Fargate launch type
• Configure ECS Cluster Auto Scaling to automatically increase/decrease compute
capacity for EC2 launch type
ECS Cluster Auto Scaling (CAS)
Capability of ECS to manage the scaling of EC2 Auto Scaling Groups. CAS relies on ECS capacity providers, which provide the
link between your ECS cluster and the ASGs you want to use. It uses a Capacity Provider Reservation metric to determine
how and when to scale-out and scale-in
• Capacity Provide reservation is the ratio of how big the ASG needs to be to how big it actually is, expressed as a
percentage

• M is the number of EC2 instances in the ASG that


is required to fulfil the workload demand and N is
the number of instances currently running in the
ASG. If M=N, no scaling is required, M>N scaling
out is required

• Each ASG is associated with a capacity provider,


and each such capacity provider has only one ASG

• However, single ECS cluster can have multiple


Capacity Providers.

Capacity Provide Strategy defines how each capacity provider contributes (using weights) to the overall capacity of the cluster
ECS Service Auto Scaling and ECS Cluster Auto Scaling

ECS Service Auto Scaling ECS Cluster Auto Scaling


Automatically increase or decrease task count Capacity provider points to an EC2 Auto Scaling
within the allocated container instances based Group, if the CapacityProviderReservation
on CloudWatch metrics exceeds the defined threshold EC2 dynamic auto
(e.g., ECSServiceAverageCPUUtilization) scaling action will be triggered and EC2
Define Min/Max/Average task count instances will be added
Scaling policies supported: Target tracking, Step Scaling policies supported: Target tracking
auto scaling and scheduled auto scaling

Capacity Provider represents the infrastructure the ECS tasks will run on
Capacity Provider = ASG | FARGATE | FARGATE_SPOT

ECS Cluster Capacity Provider


• Single ECS cluster can have one or more capacity providers associated with it. Capacity Provider strategy determines how the tasks will be distributed across the Capacity Providers
ECS storage options - Container Storage vs. Volume Storage

Containers by definition are ephemeral and stateless, hence do not required persistent storage. However, there are wide
range of use-cases that require stateful containers. Traditional solution to this was to use service-to-service communication
to store the state (such as in an S3 bucket or in a database)

With advancements in container runtimes to manage


storage, now containers can use tightly coupled
volume storage to store state.
1. EC2 instance store ties the container to the host, will
lose access to storage if rebooted
2. EBS ties container to the AZ, however not possible with
Fargate launch type as storage configuration is not
possible
3. EFS allow cross AZ access to state
EFS integration with ECS

Allows ECS tasks (on both EC2 and Fargate launch types) to natively map an EFS file system endpoint
transparently without further infrastructure configurations.

• This pattern allows a task to pull required data via service-to-


service communication with S3 and share with all other tasks by
copying the data into EFS

• Earlier, every task has to copy data into their local storage and
use, also every time a task is restarted, it had to pull the data
from S3.
ECS integration with CI/CD pipelines

CodeBuild supports pulling builder images from image repository, build a new container image by merging the code from the
code repository and upload it back to the repository. CodeDeploy supports B/G deployments into the ECS. Similar integrations
are supported with other CI/CD tools as well.
ECS use cases: Batch processing
ECS vs. EKS
ECS integrated Services

AWS App Mesh


• AWS managed Service Mesh
• Provides communication infrastructure for services
to communicate

AWS Elastic Container Registry


(ECR) AWS Elastic Container Service
• Managed Docker registry service (ECS)
• Support IAM controlled access to images

AWS CloudFormation
Elastic Kubernetes Service (EKS)
Fully managed upstream Kubernetes compatible container orchestrator
EKS solution portfolio

• EKS Distro is used for development environments or for production environments when there is a mature management tooling available
• All other EKS solution options rely on EKS Distro.
• EKS Anywhere uses EKS Distro and provides automation tooling that simplifies cluster creation, administration and operations on your own
infrastructure on-premises. Further, Amazon EKS Anywhere provides default configurations on operating system and networking and
brings additional opinionated tooling you would need to run Kubernetes in production.
Amazon EKS Distro
A distribution of the same open-source Kubernetes and dependencies deployed by Amazon EKS, helping you to manually run Kubernetes
clusters anywhere. Challenges it addresses: Without a vendor supported distribution of Kubernetes, you have to
spend a lot of effort tracking updates, determining compatible versions of Kubernetes and its dependencies,
testing them for compatibility, and maintaining pace with the Kubernetes release cadence.

EKS Distro include


• Kubernetes components used by EKS and supported by EKS Anywhere
• Security patches, tooling and configurations that will be adopted by Kubernetes upstream
• Backported fixes for EKS versions supported (EKS provides extended version support for upstream Kubernetes)

• Does not include eksctl, Amazon official distribution of CSI and CNI plugins, Amazon controllers for
Kubernetes. Also does not include integration components with AWS services such as IAM
Authenticator.
• AWS integrations supported by EKS Distro:
• Amazon EKS Distro is aligned with Amazon EKS versions and components and is supported by the Amazon
EKS operations dashboard.
• Provides copies of builds in Amazon S3 and ECR for developers creating Kubernetes clusters on AWS.
• EKS Distro has been tested for use with Amazon Linux 2, Bottlerocket, and AWS Outposts.
• EKS Distro will support ECR Public repositories as a secure, fast source for you to download EKS Distro for use
within AWS Regions or on premises.

What is included in EKS Distro


EKS Anywhere
A deployment option for Amazon EKS that helps you create
and operate Kubernetes clusters on-premises and automation
tooling for cluster lifecycle operations
Provides an installable software package for creating and
operating on-premises Kubernetes clusters that are based on
Amazon EKS Distro.
• Simplifies creation and operation of Kubernetes cluster
• Automates cluster management
• Provides centralized management of all EKS environments through
EKS console

Provides an installer and CLI for cluster life cycle


management.
Offers cluster bootstrap of opinionated EKS Distro clusters
with default configurations for Node OS, container runtime,
and CNI, along with opinionated tooling like GitOps
components.
Provides tooling for cluster upgrade, cluster scaling, and
diagnostic gathering.
OAuth and OpenID Connect federation
Integrate with AWS IAM using aws-iam-authenticator
Optionally use the EKS Connector to connect the clusters to
EKS Anywhere Deployment architectures

Clusters can be deployed as stand-alone clusters which are managed independently by admin runtime or through long-lived
EKS Anywhere management cluster
Standalone cluster: If you are only running a
single EKS Anywhere cluster, you can deploy a
standalone cluster. EKS Anywhere management
components on the same cluster that runs
workloads.
• Standalone clusters must be managed with the
eksctl CLI. A standalone cluster is effectively a
management cluster, but in this deployment type,
only manages itself.

Management cluster with separate workload cluster: If you


plan to deploy multiple EKS Anywhere clusters, it’s
recommended to deploy a management cluster with separate
workload clusters.
• With this deployment type, the EKS Anywhere management
components are only run on the management cluster, and the
management cluster can be used to perform cluster lifecycle
operations on a fleet of workload clusters.
• The management cluster must be managed with the eksctl CLI,
whereas workload clusters can be managed with the eksctl CLI
or with Kubernetes API-compatible clients such as kubectl,
GitOps, or Terraform.
• Management cluster is required to support third party
automation tooling such as terraform etc.
• Bare metal clusters can be provisioned either with control
plane and data plane collocated (e.g., telecom RAN use-
cases) or control plane deployed separately
EKS Anywhere integrations
Curated packages provided as part of the EKS Anywhere enterprise license that provides additional functionalities such as
image registry, ingress controller, load balancer, auto-scaler etc. for running mission critical on-prem workloads

Amazon will do version compatibility testing of each of these packages with new versions of EKS. All these packages are
supported under AWS enterprise support
EKS on Outposts
Use Amazon EKS to run on-premises Kubernetes applications on AWS Outposts. Two deployment options
• Extended clusters: Run the Kubernetes control plane in an AWS Region and nodes on your Outpost.
• Local clusters: Run the Kubernetes control plane and nodes on your Outpost.
For both deployment options, the Kubernetes control plane is fully managed by AWS. You can use the same Amazon EKS
APIs, tools, and console that you use in the cloud to create and run Amazon EKS on Outposts.

• If you're concerned about the quality of the network connection from your Outposts to the parent
AWS Region and require high availability through network disconnects, use the local cluster deployment
option.
• Extended configuration option is suitable if you can invest in reliable, redundant network connectivity from
your Outpost to the AWS Region. The quality of the network connection is critical for this option. The
way that Kubernetes handles network disconnects between the Kubernetes control plane and nodes might
EKS Anywhere vs. EKS on Outposts

EKS Anywhere, Amazon EKS on Outposts provides a means of running Kubernetes clusters using EKS software on-premises.
The main differences are that:

Area EKS on Outpost EKS Anywhere


Infrastructure Amazon provides hardware Mostly customer provided
Control plane EKS Control Plane is fully managed by Customers are responsible for managing
AWS Control Plane lifecycle with EKS Anywhere
automation tooling
Management tooling Use with the same console, APIs, and Use the eksctl CLI to manage their clusters,
tools used to run Amazon EKS clusters in optionally connect their clusters to the EKS
AWS Cloud console for observability. Use Infrastructure as
Code tools such as Terraform and GitOps
Connectivity Is a regional AWS service. Requires Standalone software offering and can run
requirements consistent reliable connection to AWS entirely disconnected from AWS Cloud
Region

• The primary interfaces for EKS Anywhere are the EKS Anywhere Custom Resources. Amazon EKS does not have a CRD-
based interface today.
Kubernetes Deployments vs. ReplicaSets

Deployment is responsible for managing a set of replica pods. It provides features such as rolling updates, rollbacks, and scaling
of the number of replicas. Deployments also have self-healing capabilities, which ensure that the desired number of replicas are
always running and healthy.
ReplicaSet is a lower-level object in Kubernetes that is responsible for ensuring that a specified number of replicas are always
running. It is used by deployments to manage the replica pods but can also be used directly in some scenarios. ReplicaSets
provide the ability to scale the number of replicas and replace failed pods with new ones.
Kubernetes StatefulSets and DaemonSets

StatefulSets is a controller that manages the deployment and scaling of stateful pods.
• What are stateful pods? A pod that requires persistent storage and a stable network identity to maintain its state all the time, even during
pod restarts or rescheduling. Used for stateful applications such as databases or distributed file systems as these require a stable identity
and persistent storage to maintain data consistency.

DaemonSets is a controller that ensures only a single pod instances is running per node
• Particularly useful for running pods as system daemons or background processes that need to run on every node in the cluster.
• DaemonSets can be used for collecting logs, monitoring system performance, and managing network traffic across the entire cluster.
EKS Control plane resilience
Fully managed by AWS in a managed VPC with 99.5% uptime SLA. Zonally redundant and VPC level isolation . NLB is the
public endpoint for the EKS control plane to the customer without any cross-zone load balancing.
EKS Data plane
Supports multiple compute options starting from self-managed, partially managed or fully managed.
Self-managed Amazon EC2 instances: Runs in your own account, customer managed and
offers maximum flexibility and configurability. Spin up a worker node. EKS optimized AMIs
are provided. Upgrading nodes, pod

Amazon EKS managed nodes: Runs in your own account, AWS managed provisioning and
instance lifecycle management. Upgrading, gracefully terminating, moving pods across
nodes is taken care of by EKS

Karpenter: Next generation auto-scaler. Gives you the flexibility to choose the right
instance type. Helps reducing the cost by rebalancing the pods. If there are holes in the
nodes, it automatically moves pods and packs them into fewer nodes (binpacking). During
patching of nodes Karpenter will automatically move the pods to make way for the node
upgrade.

AWS Fargate: Single pod per node and fully managed, serverless and right sized compute.
AWS Managed OS, container runtime, storage, monitoring plugins. Provides granular
compute option with pod-based billing
EKS Data plane compute options - Managed Node Groups

MNGs automate the provisioning and lifecycle management of nodes (EC2 instances) for EKS clusters i.e. you don't need to
separately provision or register the EC2 instances that provide compute capacity to run your Kubernetes applications.
• Create, automatically update, or terminate nodes in the cluster with a single operation.
• Node updates and terminations automatically drain nodes to ensure that your applications stay available.

• Every managed node is provisioned as part


of an EC2 Auto Scaling group that's
managed for you by EKS.
• Every resource including Amazon EC2 instances
and Auto Scaling groups run within your AWS
account.

The Auto Scaling group of a managed node
group spans every subnet that you specify
when you create the group.
• Amazon EKS tags manage node group
resources so that they are configured to use
• Managed Node Groups canthe
alsoKubernetes cluster
be provisioned using auto
Launch scaler
templates for higher
levels of customization.
 Provide bootstrap arguments at deployment of a node, such as extra
Kubelet arguments
 Assign IP addresses to Pods from a different CIDR block than the IP
address assigned to the node.
 Deploy your own custom AMI to nodes.
 Deploy your own custom CNI to nodes.
EKS Data plane compute options - Karpenter

An open-source, flexible, high-performance Kubernetes cluster autoscaler built with AWS. Improves application availability and
cluster efficiency by rapidly launching right-sized compute resources in response to changing application load. Also provides
just-in-time compute resources to meet your application’s needs and automatically optimize a cluster’s compute resource
footprint to reduce costs and improve performance.

• Tightly integrated with Kubernetes scheduler semantics and EC2


fleet API. It automatically provisions new nodes in response to un-
schedulable pods.
• Evaluates the aggregate resource requirements of the pending
pods and chooses the optimal instance type to run them.
• It will automatically scale-in or terminate instances that are left
with only daemon-set pods to reduce waste.
• Supports a consolidation feature which will actively move pods
around and either delete or replace nodes with cheaper versions
to reduce cluster cost.
Case for Karpenter
Karpenter works in tandem with Kubernetes scheduler, and the compute provider to
• Dynamically choose and launch best-suited compute resources
• Terminate underutilized nodes
• Use Kubernetes-native scheduling constraints (anti-affinity rules, topology spread) to achieve availability requirements.

• Faster provisioning times mean that you don’t have to overprovision capacity like in the case of MNG which uses ASG
Karpenter consolidation feature
Consolidates pods into fewer compute resources as possible using an algorithm that takes into consideration, specifications
of the pods and the cost of disruption of nodes.

• Karpenter will poll the API Server every 10 seconds


(consolidation poll period) to see if there is an opportunity for
consolidation. If there is, looks for any annotations that prevent
consolidation and select candidate nodes and sort them by cost
of disruption.

• Karpenter considers the Age of the node (node TTL), number of


pods in the node and pod deletion cost (can be considered in
pod specification) in calculating the disruption cost.
Automatic lowering of compute costs with Karpenter

Karpenter allows optimized the compute costs using intelligent and dynamic instance selection and automatic workload
consolidation. It facilitates Kubernetes-native integrations with AWS compute services and features such as EC2 Spot and AWS
Graviton family of instances. It also allows consistently faster node launches which minimizes wasted time and cost.

Considerations:
• Karpenter performs particularly well for use cases that require rapid provisioning and deprovisioning large numbers of
diverse compute resources quickly. For example, this includes batch jobs to train machine learning models, run simulations,
or perform complex financial calculations. You can leverage custom resources of nvidia.com/gpu, amd.com/gpu, and
aws.amazon.com/neuron for use cases that require accelerated EC2 instances.
• Do not use Kubernetes Cluster Autoscaler at the same time as Karpenter because both systems scale up nodes in
response to un-schedulable pods. If configured together, both systems will race to launch or terminate instances for these pods.
EKS Data plane right sizing
Scale pods vertically and when it is at optimum size, scale them horizontally. When there is a demand for more pods and
pending pod queue is growing, EKS cluster level auto scaling needs to be considered.
EKS workload resilience within a DC

Resilience at Deployment layer Resilience at Node level

• Deploy Kubernetes workloads as Deployments • Requires a way to detect and replace unhealthy EC2
instead of individual Pods. This enables automatic pod instances automatically and scale Nodes based on
level recovery. workload requirements.
• Deploy multiple replicas of a workload to improve availability and • Use a Kubernetes autoscaler, such as the Cluster
use Horizontal Pod Autoscaling to scale replicas Autoscaler and Karpenter, ensures that the data plane
• Use probes to help Kubernetes detect failure in application code. scales as workloads scale.
• If pod startup times are high, consider • To detect failures in Nodes, you can install node-
overprovisioning replicas problem-detector and couple it Descheduler to drain
nodes automatically when a failure occurs.
• To ensure data plane scales faster and recover quickly
from failures
• Overprovision capacity at node level
• Reduce node startup time by using optimized AMIs
such as Bottlerocket or EKS Optimized Linux
• To reduce container startup times, reduce the size of the
EKS workload resilience at zone level

• Spread nodes across zones: Create node groups in multiple AZs when using Managed Node Groups. If nodes are provisioned using Karpenter, you are
already covered as Karpenter’s default behaviour is to provision nodes across AZs.
• Spread workloads across zones: Once capacity is available across AZs, you can use Pod Topology Spread Constraints to spread workloads across multiple
AZs.
• Inter-AZ cost implications: When the workload is distributed across zones, there will be intra-AZ data transfer costs. To reduce such costs while maintaining
multi-AZ resilience use techniques such as
• Topology aware routing:
• Service meshes:
EKS workload resilience at cluster level

Kubernetes critically relies on add-on functions such as networking, security and observability for its operation. Failure of any
of these components can render Kubernetes cluster inoperable. To prevent Kubernetes cluster becoming a SPOF, deploy
workloads across multiple clusters

For successful multi-cluster deployments

1. Use GitOps or similar centralized deployment techniques for


avoiding inconsistencies in multi-cluster environments. This
will help avoid configuration drift across cluster to make
deployments, upgrades, and troubleshooting easier

2. Use ELBs to distribute traffic across clusters. When using


ALBs for HTTP/S traffic endpoints across clusters, use
weighted target groups to distribute traffic across clusters.
This technique is useful to support maintenance activities of
entire clusters by completely diverting traffic as required.
EKS workload resilience at region level

Mission-critical workloads with stringent availability requirements may operate across multiple AWS Regions. This approach
protects against larger-scale disasters or regional outages.
EKS Observability
CloudWatch Container Insights ingests both metrics and logs into CloudWatch logs. Extraction of events happens as follows
1. CloudWatch agent or CloudWatch Agent for Prometheus: extracts metrics as performance log events
2. Fluent Bit log driver: Extracts log events

• When CloudWatch Container Insights receives log events from


Fluent Bit, it organizes them into separate log groups following
the best practices
• Application logs
• Host logs
• Data plane logs

• CloudWatch Container Insights aggregates and correlates both


logs and metrics and builds observability dashboards.
EKS logging
Control plane logging is integrated with CloudWatch logs.

EKS Control Plane logging

EKS provides five types of control plane logs. EKS service provides direct integration of these log sources into
CloudWatch logs. These log event types are not enabled by default and can be enabled individually.
1. Kubernetes API server component logs
2. Audit
3. Authenticator
4. Controller manager
5. Scheduler

EKS Data Plane logging

• EKS Data plane logging uses a log aggregation tools - FluentBit or Fluentd.
• Kubernetes log aggregator tools run as DaemonSets and scrape container logs from nodes.
• CloudWatch Container Insights uses either Fluent Bit or Fluentd to collect logs and ship them to
CloudWatch Logs for storage.
• Fluent Bit and Fluentd support ingesting logs into many popular log analytics systems such as
Elasticsearch and InfluxDB giving the ability to change the storage backend for EKS logs by
modifying Fluent Bit or Fluentd’s log configuration.
EKS metrics
Containerized applications generate high cardinality metrics. CloudWatch Container Insights is specifically designed to
collect this high cardinality metrics efficiently. It combines all high cardinality metric values for all dimensions (across all
nodes, instances and applications) into a single performance log event for each time series and pushes them to CloudWatch
logs in Embedded Metric Format (EMF). CloudWatch logs will extract metric data from the log events and send them to
CloudWatch metrics.

Node and instance level metrics Application-level metrics

CloudWatch agent will run as a sidecar alongside the pods and For instrumented applications using Prometheus, CloudWatch agent
collects node level and instance level metrics for Prometheus will collect the metric data in Prometheus format
and convert them into CloudWatch metric format and pushes to
CloudWatch
Execution time of provisioned container instances:
• GB-hours
Execution time of active container instances:
• vCPU-hours
• GB-hours
Automated deployments: Per Application per month
Build fee: per build-minute

AWS App Runner


Fully managed application service that lets you build, deploy, and run web applications and API services without prior
infrastructure or container experience.
AWS App Runner
Provides an easier way to run your web application (including API services, backend web services, and websites) without
managing any infrastructure or container orchestration required. Start from an existing container image, container registry,
source code repository, or existing CI/CD workflow to a fully running containerized web application on AWS in minutes.

• Provisioned container instances: When your application is deployed, you pay for the memory provisioned in each
container instance. Keeping your container instance's memory provisioned when your application is idle ensures it can deliver
consistently low millisecond latency. You can pause the provisioned container instances when not in use to prevent incurring this
cost.

• Active container instances: When your application is processing requests, you switch from provisioned container
instances to active container instances that consume both memory and compute resources.
• You pay for the compute and any additional memory consumed in excess of the memory allocated by your provisioned container instances.
• App Runner automatically scales the number of active container instances up and down to meet the processing requirements of your
application.

• App Runner Budget controls: Set a maximum limit on the number of active container instances your application uses so that costs do not exceed your budget.
App Runner operational benefits
App Runner abstracts away many complexities associated with setting up an application using a range of AWS services fully
integrated and managed within App Runner services accounts. It provides CI/CD, security (TLS, encryption at rest),
monitoring and Auto scaling capabilities fully managed.

• App Runner moves away from 24/7 always up and running compute model inherent in EC2, ECS and Fargate.
• Managed and simplified scaling: It provides automatic scaling without any configuration required based on the number of concurrent
requests and the maximum number of container instances configured
• Faster scaling: App Runner scaling is fast as all the container instances are pre-provisioned, turning on of the CPU of a provisioned instance
is near instantaneous compared to the cold start times of a Lambda function which requires code to be downloaded and instances warmed
up.
App Runner core capabilities

Deployments Logging & Monitoring

Logging
Deployment features Monitoring

Security
App Runner Auto scaling
With App Runner, multiple concurrent requests can be handled by a single container instance which can be controlled by
the user. Requests will start moving to an overflow queue if it goes beyond the pre-defined concurrent limit and App Runner
kicks off another container instance until it reaches the maximum active instances defined.

• There are no scaling rules you need to setup with App Runner (compared to EB)
• App Runner uses an Envoy Proxy load balancer internally to distribute concurrent
requests across container instances.
• When the application starts receiving more requests than what the maximum number of
container instances can handle, it returns HTTP status code 429 unlike all other Amazon
compute services which starts receiving more and more requests increasing the latency to
the end user.
• When there is no workload, App Runner pauses instances - CPU is turned off, but memory
is kept active so that once requests start flowing resuming is faster.
App Runner Auto scaling controls
App Runner pricing model
App Runner charges CPU and memory separately. For Active instances both resources are charged and when they move to
provisioned instances, you only pay for memory.

• Applications that have periods of little to no usage. When you are not running the application, you pay only for memory.
• Business front-end applications that are used during the day-time and not in the night
Serverless web hosting developer experience
App Runner supports full stack development, including both frontend and backend web applications that use HTTP and
HTTPS protocols. These applications include API services, backend web services, and websites. App Runner supports
container images as well as runtimes and web frameworks including Node.js and Python.

Deployment options:
• From container image: App Runner can immediately deploy a container image using the App Runner console or AWS CLI.
• Use your own CI/CD toolchain: If you have an existing CI/CD workflow that uses AWS CodePipeline, Jenkins, Travis CI, CircleCI, or another CI/CD toolchain, you
can easily add App Runner as your deployment target using the App Runner API or AWS CLI.
• Continuous deployment by App Runner: If you want App Runner to automatically provide continuous deployment for you, you can easily connect to your
existing container registry or source code repository and App Runner will automatically provide a continuous deployment pipeline for you.
Deployment architecture for Serverless web hosting

AWS App Runner is a secure, consistent solution for exposing web applications using the public endpoint or service URL.

You might also like