0% found this document useful (0 votes)
120 views172 pages

Aws Certified Devops Slides v14

This document is a training resource for individuals enrolled in the AWS Certified DevOps Engineer Professional course by Stephane Maarek, intended for personal use and exam preparation. It covers key concepts such as Continuous Integration, Continuous Delivery, and the use of various AWS services like CodeBuild, CodeDeploy, and CloudFormation for automating software development and deployment processes. The document emphasizes hands-on practice and understanding of real-world applications in preparation for the certification exam.

Uploaded by

solomon raju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views172 pages

Aws Certified Devops Slides v14

This document is a training resource for individuals enrolled in the AWS Certified DevOps Engineer Professional course by Stephane Maarek, intended for personal use and exam preparation. It covers key concepts such as Continuous Integration, Continuous Delivery, and the use of various AWS services like CodeBuild, CodeDeploy, and CloudFormation for automating software development and deployment processes. The document emphasizes hands-on practice and understanding of real-world applications in preparation for the certification exam.

Uploaded by

solomon raju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 172

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.

com
Disclaimer: These slides are copyrighted
and strictly for personal use only
• This document is reserved for people enrolled into the
AWS Certified DevOps Engineer Professional course by Stephane
Maarek.

• Please do not share this document, it is intended for personal use and
exam preparation only, thank you.

• If you’ve obtained these slides for free on a website that is not


the course’s website, please reach out to
[email protected]!

• Best of luck for the exam and happy learning!


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Certified DevOps
Engineer Professional
Course
DOP-C01

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Please do not skip this
•lecture
ADVANCED, PROFESSIONAL-LEVEL COURSE
• Do the AWS Certified Developer course & certification at a pre-requisite
• It’ll be easier if you do the AWS Certified SysOps course & certification as well

• ALL HANDS-ON
• The AWS DevOps exam is hard and tests you on real-world experience (min
2 years)
• This course provides you the opportunity to practice a lot

• TAKE YOUR TIME


• Practice as much as possible at work
• Take notes for features or services you didn’t know about

• Happy learning, and good luck for your exam!


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Domain 1 - SDLC
Automation
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Continuous
Integration
• Developers push the code to a
code
repository often (GitHub /
CodeCommit / Bitbucket / etc…)
• A testing / build server checks the Tell developer
Push code
code as results of build
often
soon as it’s pushed (CodeBuild /
Jenkins CI
/ etc…)
• The developer gets feedback Code
• Find bugs early, fix bugs Build Server
about the tests and checks that Repository
have passed Get code
• Deliver faster/ failed
as the code is build & test
tested
• Deploy often
• Happier developers, as they’re
unblocked
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Continuous
Delivery
• Ensure that the software can be
Push code
often
released reliably whenever Code
needed. Repository
Get code
• Ensures deployments happen build & test
often and are quick Build Server
• Shift away from “one release
Deploy every
every 3 months” to ”5 passing build
Deployment
releases a day” Server
• That usually means automated
deployment
• Spinnak Application Application Application
• er
CodeDeploy Server v1 Server v1 Server v1
• Etc…
Jenkins CD
Application Application Application
Server v2 Server v2 Server v2
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Continuous Delivery vs
Continuous
Deployment
• Continuous Delivery:
• Ability to deploy often using automation
• May involve a manual step to “approve” a deployment
• The deployment itself is still automated and repeated!

• Continuous Deployment:
• Full automation, every code change is deployed all the way to
production
• No manual intervention of approvals

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Technology Stack for
CICD Code Build Test Deploy Provision

AWS AWS Elastic Beanstalk


AWS CodeBuild
CodeCommit

User Managed
GitHub EC2 Instances
Or 3rd party Jenkins CI AWS CodeDeploy Fleet
code Or 3rd party CI servers (CloudFormation)
repository

Orchestrate: AWS CodePipeline

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeCommi
t• Version control is the ability to understand the various
changes that happened to the code over time (and
possibly roll back).
• All these are enabled by using a version control system
such as Git
• A Git repository can live on one’s machine, but it
usually lives on a central online repository
• Benefits are:
• Collaborate with other developers
• Make sure the code is backed-up somewhere
• Make sure it’s fully viewable and auditable
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeCommi
•tGit repositories can be expensive.
• The industry includes:
• GitHub: free public repositories, paid Push code
private ones often
• BitBucket Code
• Etc... Repository

• And AWS CodeCommit:


• private Git repositories
• No size limit on repositories (scale
• seamlessly)
Code only in AWS Cloud account => increased security and
• Fully managed, highly available
compliance
• Secure (encrypted, access control, etc…)
• Integrated with Jenkins / CodeBuild / other CI tools

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeBuild
•Overview
Fully managed build service
• Alternative to other build tools such as Jenkins
• Continuous scaling (no servers to manage or provision – no
build queue)
• Pay for usage: the time it takes to complete the builds
• Leverages Docker under the hood for reproducible builds
• Possibility to extend capabilities leveraging our own base
Docker images
• Secure: Integration with KMS for encryption of build artifacts,
IAM for build permissions, and VPC for network security,
CloudTrail for API calls logging

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodeBuild
•Overview
Source Code from GitHub / CodeCommit / CodePipeline /
S3…
• Build instructions can be defined in code (buildspec.yml file)
• Output logs to Amazon S3 & AWS CloudWatch Logs
• Metrics to monitor CodeBuild statistics
• Use CloudWatch Events to detect failed builds and trigger
notifications
• Use CloudWatch Alarms to notify if you need “thresholds”
for failures
• CloudWatch Events / AWS Lambda as a Glue
© Stephane•Maarek
SNS notifications
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS
CodeDeploy
• We want to deploy our
application automatically to
v1 v2
many EC2 instances
• There are several ways to
v1 v2
handle deployments using
open source tools
(Ansible,Terraform, Chef, v1 v2

Puppet, etc…)
• We can use the managed v1 v2

Service AWS CodeDeploy

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS CodeDeploy – Steps to make it
•work
Each EC2 Machine (or On Premise
GitHub
machine) must be running
the CodeDeploy Agent Source code
• The agent is continuously + Amazon S3
appspec.yml 1. push
polling AWS CodeDeploy for
work to do file
• CodeDeploy sends 4. Download code
appspec.yml + appspec.yml file
file. 2. Trigger
• Application is pulled from deployment
GitHub or S3
• EC2 will run the
deployment 3. poll EC2 instances + agent
instructions
• CodeDeploy Agent will
report of
success / failure of
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS
CodeDeploy
• EC2 instances are grouped by deployment group (dev /
test / prod)
• Lots of flexibility to define any kind of deployments
• CodeDeploy can be chained into CodePipeline and use
artifacts from there
• CodeDeploy can re-use existing setup tools, works with any
application, auto scaling integration
• Note: Blue / Green only works with EC2 instances (not on
premise)
• Support for AWS Lambda deployments, EC2
• CodeDeploy does not provision resources
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CodePipelin
e
• Continuous delivery
• Visual workflow
• Source: GitHub / CodeCommit / Amazon S3
• Build: CodeBuild / Jenkins / etc…
• Load Testing: 3rd party tools
• Deploy: AWS CodeDeploy / Beanstalk /
CloudFormation / ECS…
• Made of stages:
• Each stage can have sequential actions and / or parallel
actions
• Stages examples: Build / Test / Deploy / Load Test / etc…
• Manual approval can be defined at any stage
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Technology Stack for
CICD Code Build Test Deploy Provision

AWS AWS Elastic Beanstalk


AWS CodeBuild
CodeCommit

User Managed
GitHub EC2 Instances
Or 3rd party Jenkins CI AWS CodeDeploy Fleet
code Or 3rd party CI servers (CloudFormation)
repository

Orchestrate: AWS CodePipeline

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS CodePipeline
•Artifacts
Each pipeline stage can create ”artifacts”
• Artifacts are passed stored in Amazon S3 and passed on
to the next stage

Source Build Deploy


trigger deploy
(CodeCommit) (CodeBuild) (CodeDeploy)

Output artifacts
Output artifacts

Input artifacts

Input artifacts
Amazon S3
bucket

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins on
AWS
• Open Source CICD tool
• Can replace CodeBuild, CodePipeline & CodeDeploy
• Must be deployed in a Master / Slave configuration
• Must manage multi-AZ, deploy on EC2, etc...
• All projects must have a “Jenkinsfile” (similar to
buildspec.yml) to tell Jenkins what to do

• Jenkins can be extended on AWS thanks to many


plugins!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins Master / Slave (build
farm)

From whitepaper: Jenkins on AWS


https://d1.awsstatic.com/whitepapers/jenkins-on-aws.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins on
AWS

Source: https://aws.amazon.com/getting-
started/projects/setup-jenkins-build-server/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins Master /
Slave

From whitepaper: Jenkins on AWS


https://d1.awsstatic.com/whitepapers/jenkins-on-aws.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins with
CodePipeline

From whitepaper: Jenkins on AWS


https://d1.awsstatic.com/whitepapers/jenkins-on-aws.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins with
ECS

From whitepaper: Jenkins on AWS


https://d1.awsstatic.com/whitepapers/jenkins-on-aws.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins with Device
Farm

From whitepaper: Jenkins on AWS


https://d1.awsstatic.com/whitepapers/jenkins-on-aws.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins with AWS
Lambda

From whitepaper: Jenkins on AWS


https://d1.awsstatic.com/whitepapers/jenkins-on-aws.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Jenkins with
CloudFormation

From whitepaper: Jenkins on AWS


https://d1.awsstatic.com/whitepapers/jenkins-on-aws.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Infrastructure as
Code
• Currently, we have been doing a lot of manual work
• All this manual work will be very tough to
reproduce:
• In another region
• in another AWS account
• Within the same region if everything was deleted

• Wouldn’t it be great, if all our infrastructure was…


code?
• That code would be deployed and create / update /
delete our infrastructure
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What is
•CloudFormation
CloudFormation is a declarative way of outlining your
AWS Infrastructure, for any resources (most of them
are supported).
• For example, within a CloudFormation template, you
say:
• I want a security group
• I want two EC2 machines using this security group
• I want two Elastic IPs for these EC2 machines
• I want an S3 bucket
• I want a load balancer (ELB) in front of these machines

• Then CloudFormation creates those for you, in the


right order, with the
exact configuration that you specify
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Benefits of AWS CloudFormation
(1/2)
• Infrastructure as code
• No resources are manually created, which is excellent for control
• The code can be version controlled for example using git
• Changes to the infrastructure are reviewed through code

• Cost
• Each resources within the stack is stagged with an identifier so you can
easily see how much a stack costs you
• You can estimate the costs of your resources using the CloudFormation
template
• Savings strategy: In Dev, you could automation deletion of templates
at 5 PM and recreated at 8 AM, safely
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Benefits of AWS CloudFormation
•(2/2)
Productivity
• Ability to destroy and re-create an infrastructure on the cloud on the fly
• Automated generation of Diagram for your templates!
• Declarative programming (no need to figure out ordering and
orchestration)

• Separation of concern: create many stacks for many apps, and


many layers. Ex:
• VPC stacks
• Network stacks
• App stacks

• Don’t re-invent the wheel


• Leverage existing templates on the web!
• Leverage the documentation
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
How CloudFormation
Works
• Templates have to be uploaded in S3 and then
referenced in CloudFormation
• To update a template, we can’t edit previous ones. We
have to re- upload a new version of the template to
AWS
• Stacks are identified by a name
• Deleting a stack deletes every single artifact that was
created by CloudFormation.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Deploying CloudFormation
•templates
Manual way:
• Editing templates in the CloudFormation Designer
• Using the console to input parameters, etc

• Automated way:
• Editing templates in a YAML file
• Using the AWS CLI (Command Line Interface) to deploy the
templates
• Recommended way when you fully want to automate your
flow

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation Building
Blocks
Templates components (one course section for each):
1. Resources: your AWS resources declared in the template
(MANDATORY)
2. Parameters: the dynamic inputs for your template
3. Mappings: the static variables for your template
4. Outputs: References to what has been created
5. Conditionals: List of conditions to perform resource creation
6. Metadata

Templates helpers:
7. References
8. Functions

© Stephane Maarek
Note:

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com


This is an introduction to
•CloudFormation
It can take over 3 hours to properly learn and master
CloudFormation
• This section is meant so you get a good idea of how it
works
• We’ll be slightly less hands-on than in other sections

• We’ll learn everything we need to answer questions for the


exam
• The exam does not require you to actually write
CloudFormation
© Stephane•Maarek
The exam expects you to understand how to read
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Introductory
•Example
We’re going to create a simple EC2
instance.
• Then we’re going to create to add an
Elastic IP to it
• And we’re going to add two security
groups to it
• For now, forget about the code syntax.
• We’ll look at the structure of the files later
on
• We’ll see how in no-time, we are able to get started with
CloudFormation!
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
YAML Crash
Course • YAML and JSON are the languages
you can use for CloudFormation.
• JSON is horrible for CF
• YAML is great in so many ways
• Let’s learn a bit about it!

• Key value Pairs


• Nested objects
• Support Arrays
• Multi line strings
• Can include comments!
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What are
•resources?
Resources are the core of your CloudFormation template
(MANDATORY)
• They represent the different AWS Components that will be
created and configured
• Resources are declared and can reference each other

• AWS figures out creation, updates and deletes of resources


for us
• There are over 224 types of resources (!)
• Resource types identifiers are of the form:
AWS::aws-product-name::data-type-name
© Stephane Maarek
How do I find

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com


resources
•documentation?
I can’t teach you all of the 224 resources, but I can teach
you how to learn how to use them.
• All the resources can be found here:
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserG
uide/aw
s-template-resource-type-ref.html
• Then, we just read the docs 
• Example here (for an EC2 instance):
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserG
uide/aw
s-properties-ec2-instance.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Analysis of CloudFormation
Template
• Going back to the example of the introductory section, let’s
learn why it was written this way.
• Relevant documentation can be found here:
• http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGu
ide/aws-
properties-ec2-instance.html
• http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGu
ide/aws-
properties-ec2-security-group.html
• http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGu
ide/aws-
properties-ec2-eip.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
FAQ for
resources
• Can I create a dynamic amount of resources?
No, you can’t. Everything in the CloudFormation
template has to be declared.You can’t perform code
generation there

• Is every AWS Service supported?


 Almost. Only a select few niches are not there yet
 You can work around that using AWS Lambda Custom
Resources

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What are
parameters?
• Parameters are a way to provide inputs to your AWS
CloudFormation template
• They’re important to know about if:
• You want to reuse your templates across the company
• Some inputs can not be determined ahead of time
• Parameters are extremely powerful, controlled, and can
prevent errors from happening in your templates thanks to
types.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
When should you use a
•parameter?
Ask yourself this:
• Is this CloudFormation resource configuration likely to change in
the future?
• If so, make it a parameter.

• You won’t have to re-upload a template to change its


content 

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Parameters
Settings
Parameters can be controlled by all these
settings:
• Type: • ConstraintDescription
• String (String)
• Number • Min/MaxLength
• CommaDelimitedList
• • Min/MaxValue
List<Type>
• AWS Parameter (to help catch • Defaults
invalid values – match against • AllowedValues (array)
existing values in the AWS
Account) • AllowedPattern (regexp)
• Description • NoEcho (Boolean)
• Constraints
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
How to Reference a
Parameter
• The Fn::Ref function can be leveraged to reference
parameters
• Parameters can be used anywhere in a template.
• The shorthand for this in YAML is !Ref
• The function can also reference other elements within
the template

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Concept: Pseudo
•Parameters
AWS offers us pseudo parameters in any CloudFormation
template.
• These can be used at any time and are enabled by
Reference Value Example Return Value
default
AWS::AccountId 1234567890
[arn:aws:sns:us-east-
AWS::NotificationARNs 1:123456789012:MyTopic]
AWS::NoValue Does not return a value.
AWS::Region us-east-2

arn:aws:cloudformation:us-east-
1:123456789012:stack/MyStack/1c2fa62
AWS::StackId 0-982a-11e3-aff7-50e2416294e0
AWS::StackName MyStack

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What are
•mappings?
Mappings are fixed variables within your CloudFormation
Template.
• They’re very handy to differentiate between different
environments (dev vs prod), regions (AWS regions), AMI
types, etc
• All the values are hardcoded within the template
• Example:

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
When would you use mappings vs
parameters ? when you know in advance all the values
• Mappings are great
that can be taken and that they can be deduced from
variables such as
• Region
• Availability Zone
• AWS Account
• Environment (dev vs prod)
• Etc…
• They allow safer control over the template.

• Use parameters when the values are really user specific

© Stephane Maarek
Fn::FindInMap

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com


Accessing Mapping
V alues
• We use Fn::FindInMap to return a named value from a
specific key
• !FindInMap [ MapName, TopLevelKey, SecondLevelKey ]

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What are
•outputs?
The Outputs section declares optional outputs values that we can
import into other stacks (if you export them first)!
• You can also view the outputs in the AWS Console or in using the
AWS CLI
• They’re very useful for example if you define a network
CloudFormation, and output the variables such as VPC ID and
your Subnet IDs
• It’s the best way to perform some collaboration cross stack, as
you let expert handle their own part of the stack
• You can’t delete a CloudFormation Stack if its outputs are being
referenced by another CloudFormation stack

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Outputs
Example
• Creating a SSH Security Group as part of one
template
• We create an output that references that
security group

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Cross Stack
Reference
• We then create a second template that leverages that
security group
• For this, we use the Fn::ImportValue function
• You can’t delete the underlying stack until all the references
are deleted too.

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What are conditions used
for?
• Conditions are used to control the creation of resources
or outputs based on a condition.
• Conditions can be whatever you want them to be, but
common ones are:
• Environment (dev / test / prod)
• AWS Region
• Any parameter value
• Each condition can reference another condition,
parameter value or mapping

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
How to define a
condition?

• The logical ID is for you to choose. It’s how you name


condition
• The intrinsic function (logical) can be any of the
following:
• Fn::And
• Fn::Equals
• Fn::If
• Fn::Not

© Stephane Maarek
Fn::Or
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Using a
Condition
• Conditions can be applied to resources /
outputs / etc…

© Stephane Maarek
CloudFormation

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com


Must Know Intrisic
Functions
• Ref
• Fn::GetAtt
• Fn::FindInMap
• Fn::ImportValue
• Fn::Join
• Fn::Sub
• Condition Functions (Fn::If, Fn::Not, Fn::Equals,
etc…)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Fn::Ref
• The Fn::Ref function can be leveraged to reference
• Parameters => returns the value of the parameter
• Resources => returns the physical ID of the underlying resource
(ex: EC2 ID)
• The shorthand for this in YAML is !Ref

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Fn::GetA
•ttAttributes are attached to any resources you create
• To know the attributes of your resources, the best place
to look at is the documentation.
• For example: the AZ of an EC2 machine!

© Stephane Maarek
Fn::FindInMap

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com


Accessing Mapping
V alues
• We use Fn::FindInMap to return a named value from a
specific key
• !FindInMap [ MapName, TopLevelKey, SecondLevelKey ]

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Fn::ImportVa
lue
• Import values that are exported in other
templates
• For this, we use the Fn::ImportValue
function

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Fn::Join
• Join values with a
delimiter

• This creates
“a:b:c”

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Function Fn::Sub
• Fn::Sub, or !Sub as a shorthand, is used to substitute
variables from a text. It’s a very handy function that will
allow you to fully customize your templates.
• For example, you can combine Fn::Sub with References or
AWS Pseudo variables!
• String must contain ${VariableName} and will substitute
them

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Condition
Functions

• The logical ID is for you to choose. It’s how you name


condition
• The intrinsic function (logical) can be any of the
following:
• Fn::And
• Fn::Equals
• Fn::If
• Fn::Not

© Stephane Maarek
Fn::Or
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
User Data in EC2 for
CloudFormation
• We can have user data at EC2 instance launch through
the console
• We can also include it in CloudFormation

• The important thing to pass is the entire script through the


function Fn::Base64

• Good to know: user data script log is in /var/log/cloud-init-


output.log

• Let’s see how to do this in CloudFormation


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
cfn-
init
• AWS::CloudFormation::Init must be
in the Metadata of a resource CloudFormation Service
• With the cfn-init script, it helps
make complex EC2

Retrieve init
configurations readable

launch
• The EC2 instance will query the

data
CloudFormation service to get
init data
EC2 instance
•• Let’s
Logs go
seeto /var/log/cfn-init.log
how it works through a
sample CloudFormation
Run cfn-init

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
cfn-signal & wait Wait condition

•conditions
We still don’t know how to tell
CloudFormation that the EC2 CloudFormation Service
instance got properly configured
after a cfn-init

Retrieve init

Signal from
• For this, we can use the cfn-signal script!

cfn-signal
launch
• We run cfn-signal right after cfn-init

data
• Tell CloudFormation service to keep on
going or fail
• We need to define WaitCondition: EC2 instance
• Block the template until it receives a signal
from cfn- signal
• We attach a CreationPolicy (also works on Run cfn-init
EC2, ASG)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Wait Condition Didn't Receive the
Required Number of Signals from an
•Amazon EC2
Ensure that the AMI Instance
you're using has the AWS CloudFormation helper
scripts
installed. If the AMI doesn't include the helper scripts, you can also
download them to your instance.
• Verify that the cfn-init & cfn-signal command was successfully run on
the instance.
You can view logs, such as /var/log/cloud-init.log or /var/log/cfn-
init.log, to help you debug the instance launch.
• You can retrieve the logs by logging in to your instance, but you
must disable
rollback on failure or else AWS CloudFormation deletes the
instance after your stack fails to create.
• Verify that the instance has a connection to the Internet. If the
instance is in a VPC,
the instance should be able to connect to the Internet through a NAT
device if it's is in a private subnet or through an Internet gateway if
it's in a public subnet.
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Rollbacks on
•failures
Stack Creation Fails: (CreateStack API)
• Default: everything rolls back (gets deleted). We can look
at the log
OnFailure=ROLLBACK
• Troubleshoot: Option to disable rollback and manually
troubleshoot
OnFailure=DO_NOTHING
• Delete: get rid of the stack entirely, do not keep anything
OnFailure=DELETE

• Stack Update Fails: (UpdateStack API)


• The stack automatically rolls back to the previous known
working state
• Ability to see in the log what happened and error messages
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Nested
•stacks
Nested stacks are stacks as part of other stacks
• They allow you to isolate repeated patterns / common
components in separate stacks and call them from other
stacks
• Example:
• Load Balancer configuration that is re-used
• Security Group that is re-used
• Nested stacks are considered best practice
• To update a nested stack, always update the parent (root
stack)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ChangeSet
s
• When you update a stack, you need to know what
changes before it happens for greater confidence
• ChangeSets won’t say if the update will be successful

1. Create 2. View 4. Execute


Change set Change set Change set

Original stack AWS CloudFormation


change set change set

3. (optional) Create
Additional change sets
From: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-
changesets.html
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Retaining Data on
•Deletes
You can put a DeletionPolicy on any resource to control what
happens when the CloudFormation template is deleted
• DeletionPolicy=Retain:
• Specify on resources to preserve / backup in case of CloudFormation
deletes
• To keep a resource, specify Retain (works for any resource / nested stack)

• DeletionPolicy=Snapshot:
• EBS Volume, ElastiCache Cluster, ElastiCache ReplicationGroup
• RDS DBInstance, RDS DBCluster, Redshift Cluster
• DeletePolicy=Delete (default behavior):
• Note: for AWS::RDS::DBCluster resources, the default policy is Snapshot
• Note: to delete an S3 bucket, you need to first empty the bucket of its
content
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Termination Protection on
•Stacks
To prevent accidental deletes of CloudFormation
templates, use TerminationProtection

• Let’s see this quickly!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation Custom Resources
•(Lambda)
You can define a Custom Resource
in CloudFormation to address any CloudFormation
Custom Resource
of these use cases:
Create, update, delete
• An AWS resource is yet not covered
(new service for example)
• An On-Premise resource AWS Lambda Function

• Emptying an S3 bucket before


being deleted API calls
• Fetch an AMI id
• Anything you want…! Whatever you want

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation Custom Resources
•(Lambda)
The Lambda Function will get invoked only if there is a
Create, Update or Delete event, not every time you run
the template

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk Deployment
Modes
Single Instance High Availability with Load Balancer
Great for dev Great for prod

Availability Zone 1 Availability Zone 1 ALB Availability Zone 2


Elastic IP
Auto Scaling Group

EC2 Instance EC2 Instance EC2 Instance

RDS Master RDS Master RDS Standby

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Beanstalk Deployment Options for
Updates
• All at once (deploy all in one go) – fastest, but instances aren’t
available to serve traffic for a bit (downtime)
• Rolling: update a few instances at a time (bucket), and then
move onto the next bucket once the first bucket is healthy
• Rolling with additional batches: like rolling, but spins up new
instances to move the batch (so that the old application is
still available)
• Immutable: spins up new instances in a new ASG, deploys version
to these instances, and then swaps all the instances when
everything is healthy
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk
Deployment All at once
• Fastest deployment v1 v2

• Application has
v1 v2
downtime
• Great for quick
v1 v2
iterations in
development
environment v1 v2

• No additional cost

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk
Deployment Rolling
• Application
running
is
below

Bucket (size 2)
v1 v2 v2 v2
capacity
• Can set
the v1 v2 v2 v2
bucket
size

Bucket (size 2)
• Applicatio v1 v1 v1 v2
n is
running v1 v1 v1 v2
both
versions
simultaneou
sly
© Stephane•Maarek
No
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk Deployment
Rolling with additional
batches
• Application is
running at v1 v1 v2 v2 v2 v2
capacity
• Can set the v1 v1 v2 v2 v2 v2
bucket size
• Application is
running v1 v1 v1 v1 v2 v2
both versions
simultaneousl v1 v1 v1 v1 v2 v2
y
• Small
additional new v2 v2 v2 v2 v2 terminated

cost
• Additional batch is
new
v2 v2 v2 v2 v2 terminated

removed at the
end of the
deployment
• Longer
© Stephane Maarek
deployment
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk
Immutable
Deployment Current ASG Current ASG Current ASG Current ASG

• Zero downtime v1 v1 v1

V1 terminated
• New Code is deployed to v1 v1 v1
new instances on a
temporary ASG v1 v1 v1

• High cost, double


v2 v2
capacity v2
• Longest deployment v2 v2
v2
• Quick rollback in case of v2 v2
failures (just terminate v2
new ASG)
Temp ASG
© Stephane•Maarek Temp ASG
Great for prod
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk
Deployment Blue / Green
• Not a “direct feature” of Elastic

Environment “blue”
v1
Beanstalk
• Zero downtime and release v1
facility

90
v1
• Create a new “stage”

%
environment and deploy v2 Web traffic

there

Environment “green”
• The new environment (green) v2

%
Amazon

can be Route 53

10
validated independently and roll v2
back if issues
• Route 53 can be setup using v2
weighted
policies to redirect a little bit of
© Stephane Maarek
traffic to the stage environment
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk Deployment Summary
from AWS Doc
• https://docs.aws.amazon.com/elasticbeanstalk/
latest/dg/using- features.deploy-existing-
version.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Lambda
V ersions
• When you work on a Lambda
function, we work on $LATEST $LATEST
• When we’re ready to publish a (mutable)
Lambda function, we create a
version
• Versions are immutable
• Versions have increasing version
numbers V1 V2
(Immutable) (Immutable)
• Versions get their own ARN
(Amazon Resource Name)
• Version = code + configuration
(nothing can be changed -
immutable)
• Each version of the lambda
function can be accessed
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Lambda
Aliases
• Aliases are ”pointers” to Users
Lambda function versions
• We can define a “dev”,
”test”,
“prod” aliases and have DEV Alias PROD Alias TEST Alias
them point at different (mutable) (mutable) (mutable)
lambda versions
• Aliases are mutable 5%
• Aliases enable Blue / Green 95%
deployment by assigning
weights to lambda functions
$LATEST V1 V2
• Aliases enable stable (mutable) (Immutable) (Immutable)
configuration of our event
triggers / destinations
• Aliases have their own ARNs
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Deployment
Stages
• Making changes in the API Gateway does not mean
they’re effective
• You need to make a “deployment” for them to be in
effect
• It’s a common source of confusion
• Changes are deployed to “Stages” (as many as you
want)
• Use the naming you like for stages (dev, test, prod)
• Each stage has its own configuration parameters
• Stages can be rolled back as a history of deployments is
kept
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Stages v1 and
v2 API breaking change

https://api.example.com/v1 v1 Stage
V1

V1 Client

https://api.example.com/v2 v2 Stage
New URL!
V2

V2 Client

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Stage
•VStage
ariables
variables are like environment variables for API
Gateway
• Use them to change often changing configuration values
• They can be used in:
• Lambda function ARN
• HTTP Endpoint
• Parameter mapping templates
• Use cases:
• Configure HTTP endpoints your stages talk to (dev, test,
prod…)
• Pass configuration parameters to AWS Lambda through mapping
templates
• Stage variables are passed to the ”context” object in
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway Stage Variables & Lambda
•Aliases
We create a stage variable to indicate the corresponding
Lambda alias
• Our API gateway will automatically invoke the right Lambda
function!
PROD Alias
TEST Alias 5% Lambda alias changes
Prod Stage
Test Stage 95%
No API Gateway changes V1
V2
100%
DEV Alias
Dev Stage 100%
$LATEST

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
API Gateway – Canary
Deployment
• Possibility to enable canary deployments for any stage
(usually prod)
• Choose the % of traffic the canary channel receives
v1
95% Prod Stage

5% Prod Stage Canary


Client v2

• Metrics & Logs are separate (for better monitoring)


• Possibility to override stage variables for canary
• This is blue / green deployment with AWS Lambda & API
Gateway
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Step Functions – When to
Use?
• Use to design workflows
• Easy visualizations
• Advanced Error Handling and Retry mechanism outside
the code
• Audit of the history of workflows
• Ability to “Wait” for an arbitrary amount of time
• Max execution time of a State Machine is 1 year
• Example:
• Payment Workflow
• Complex flows
• Long running workflows (days) to go over the Lambda limit of
15 minutes
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What is
Docker?
• Docker is a software development platform to
deploy apps
• Apps are packaged in containers that can be run on
any OS
• Apps run the same, regardless of where they’re
run
• Any machine
• No compatibility issues
• Predictable behavior
• Less work
• Easier to maintain and deploy

© Stephane Maarek Works with any language, any OS, any technology
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Server (ex: EC2 Instance)
Docker on an
OS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Where Docker images are
stored?
• Docker images are stored in Docker
Repositories

• Public: Docker Hub


https://hub.docker.com/
• Find base images for many technologies or
OS:
• Ubuntu
• MySQL
• NodeJS, Java…

• Private: Amazon ECR (Elastic


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Docker versus Virtual
•Machines
Docker is ”sort of” a virtualization technology, but not
exactly
• Resources are shared with the host => many containers on
one server
Apps Apps Apps
Container Container Container

Guest OS Guest OS Guest OS Container Container Container


(VM) (VM) (VM)
Container Container Container

Hypervisor Docker Daemon

Host OS Host OS (EC2 Instance)

Infrastructure Infrastructure

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Getting Started with
Docker
• Download Docker at: https://
www.docker.com/get-started

build run

Docker Image
Dockerfile Docker Container

push pull

Docker Hub Amazon ECR

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Docker Containers
Management
• To manage containers, we need a container
management platform

• Three choices:
• ECS: Amazon’s own platform
• Fargate: Amazon’s own Serverless platform
• EKS: Amazon’s managed Kubernetes (open source)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ECS Clusters
•Overview
ECS Clusters are logical grouping of EC2 instances
• EC2 instances run the ECS agent (Docker container)
• The ECS agents registers the instance to the ECS
cluster
• The EC2 instances run a special AMI, made
specifically for ECS
ECS Agent
register

ECS Cluster

EC2 instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ECS Task
www

Host port

•Defi
Tasksndefinitions
itions are metadata 8080
80
Container port
httpd
in JSON form to tell ECS how to
run a Docker Container
• It contains crucial information
around:
• Image Name
• Port Binding for Container and ECS Agent

Host
• Memory and CPU required
• Environment variables
• Networking information EC2 instance
• IAM Role
• Logging configuration (ex
© Stephane Maarek CloudWatch)
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ECS
Service
• ECS Services help define how many tasks should run and
how they should be run
• They ensure that the number of tasks desired is running
across our fleet of EC2 instances.
• They can be linked to ELB / NLB / ALB if needed

• Let’s make our first service!

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ECS Service with Load
Balancer 32698 32667
80 80
httpd www httpd

32657 32713
80 80
httpd httpd

Application
Load Balancer
ECS Agent With dynamic
ECS Agent
port forwarding

EC2 instance EC2 instance

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC
•RSo far we’ve been using Docker images from Docker
Hub (public)
• ECR is a private Docker image repository
• Access is controlled through IAM (permission errors =>
policy)
• You need to run some commands to push pull:
• $(aws ecr get-login --no-include-email --region eu-west-1)
• docker push
1234567890.dkr.ecr.eu-west-1.amazonaws.com/demo:latest
• docker pull
1234567890.dkr.ecr.eu-west-1.amazonaws.com/demo:latest
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Fargate
• When launching an ECS Cluster, we have to create our EC2
instances
• If we need to scale, we need to add EC2 instances
• So we manage infrastructure…

• With Fargate, it’s all Serverless!


• We don’t provision EC2 instances
• We just create task definitions, and AWS will run our
containers for us
• To scale, just increase the task number. Simple! No more
EC2 
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk +
ECS
• You can run Elastic Beanstalk in Single & Multi Docker
Container mode
• Multi Docker helps run multiple containers per EC2 instance
in EB
• This will create for you:
• ECS Cluster
• EC2 instances, configured to use the ECS Cluster
• Load Balancer (in high availability mode)
• Task definitions and execution
• Requires a config file Dockerrun.aws.json at the root of source
code
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Beanstalk +
ECS
Elastic Beanstalk Environment

ECS Cluster + ASG


EC2 EC2
Load Balancer Instance Instance
beanstalk-url:80 php php
Port 80 Container Container

nginx nginx
beanstalk-url:1234
Container Container

Port 1234 other other


Container Container

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Kinesis
Overview
• Kinesis is a managed alternative to Apache Kafka
• Great for application logs, metrics, IoT, clickstreams
• Great for “real-time” big data
• Great for streaming processing frameworks (Spark, NiFi,
etc…)
• Data is automatically replicated to 3 AZ

• Kinesis Streams: low latency streaming ingest at scale


• Kinesis Analytics: perform real-time analytics on streams
using SQL
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis
Example Amazon
Kinesis
Click
Streams

IoT
devices
Amazon S3
Amazon Kinesis Amazon Kinesis Amazon Kinesis bucket
Metrics Streams Analytics Firehose
& Logs

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Streams
Overview
• Streams are divided in ordered Shards /
Partitions Shard 1
producers Shard 2 consumers
Shard 3

• Data retention is 1 day by default, can go up to 7


days
• Ability to reprocess / replay data
• Multiple applications can consume the same stream
• Real-time processing with scale of throughput
• Once data is inserted in Kinesis, it can’t be deleted
(immutability)
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Streams
•Shards
One stream is made of many different shards
• Billing is per shard provisioned, can have as many shards
as you want
• Batching available or per message calls.
• The number of shards can evolve over time (reshard /
merge)
• Records are ordered per shard Shard 2
Shard 1

producers Shard 3 consumers


Shard 4

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Streams
•Records
Data Blob: data being sent,
serialized as
bytes. Up to 1 MB. Can Data Blob
represent anything (up to 1MB)

Bytes
• Record Key:
• sent alongside a record, helps to
group records in Shards. Same key
= Same shard.
• Use a highly distributed key to Record Key
avoid the “hot partition”
problem
Sequence Number

• Sequence number: Unique


identifier for each records put in
shards. Added by Kinesis after
ingestion
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Streams Limits to
know
• Producer:
• 1MB/s or 1000 messages/s at write PER SHARD
• “ProvisionedThroughputException” otherwise
• Consumer Classic:
• 2MB/s at read PER SHARD across all consumers
• 5 API calls per second PER SHARD across all
consumers
• = if 3 different applications are consuming, possibility of throttling
• Data Retention:
• 24 hours data retention by default
• Can be extended to 7 days
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis
•Producers
Kinesis SDK
• Kinesis
Producer SDK
Library (KPL)
•• Kinesis Agent
CloudWatch Kinesis Producer Library (KPL)

Logs Amazon Kinesis


Streams
• 3rd party libraries:
Spark, Log4J Kinesis Agent
Appenders, Flume,
Kafka Connect,
NiFi…
CloudWatch Logs
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis
•Consumers
Kinesis SDK Fireho
se
• Kinesis Client Library AWS
Lambda
(KCL)
• Kinesis Connector
• Library
Kinesis Consumer Library
AWS Lambda Amazon Kinesis
(KCL)
•• Kinesis Firehose
3rd party libraries: Spark, Streams

Log4J Appenders, Flume,


Kafka Connect…
SDK

Kinesis Collector Library

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Kinesis
KCL
• KCL uses DynamoDB to
checkpoint offsets sage
s
s
me

Checkpoint progress
• KCL uses DynamoDB to e Amazon
m
nsu
Kinesis–

Co
enabled app

track other workers and


share the work amongst Consume messages

shards
Amazon
Kinesis–
enabled app

• Great for reading in a


distributed manner
Amazon
DynamoDB

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Kinesis Data
•Firehose
Fully Managed Service, no administration
• Near Real Time (60 seconds latency minimum for non full
batches)
• Load data into Redshift / Amazon S3 / ElasticSearch / Splunk
• Automatic scaling
• Data Transformation through AWS Lambda (ex: CSV =>
JSON)
• Supports compression when target is Amazon S3
(GZIP, ZIP, and SNAPPY)
• Pay for the amount of data going through Firehose
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Firehose
SDK
Diagram Amazon S3
Kinesis Producer Library (KPL)
Lambda function

Kinesis Agent Redshift

Kinesis Data Streams


ElasticSearch
CloudWatch Logs & Events
Amazon
Kinesis
Data
IoT rules actions Firehose

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Kinesis Data Streams vs
Firehose
• Streams
• Going to write custom code (producer / consumer)
• Real time (~200 ms latency for classic)
• Must manage scaling (shard splitting / merging)
• Data Storage for 1 to 7 days, replay capability, multi
consumers
• Use with Lambda to insert data in real-time to ElasticSearch
(for example)

• Firehose
• Fully managed, send to S3, Splunk, Redshift, ElasticSearch
• Serverless data transformations with Lambda
• Near real time (lowest buffer time is 1 minute)
• Automated Scaling
• No data storage

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Kinesis Data
Analytics
• Perform real-time analytics on Kinesis Streams
using SQL
• Kinesis Data Analytics:
• Auto Scaling
• Managed: no servers to provision
• Continuous: real time
• Pay for actual consumption rate
• Can create streams out of the real-time
queries

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
All kind of
•Logs
Application Logs
• Logs that are produced by your application code
• Contains custom log messages, stack traces, and so on
• Written to a local file on the filesystem
• Usually streamed to CloudWatch Logs using a CloudWatch Agent on
EC2
• If using Lambda, direct integration with CloudWatch Logs
• If using ECS or Fargate, direct integration with CloudWatch Logs
• If using Elastic Beanstalk, direct integration with CloudWatch Logs

• Operating System Logs (Event Logs, System Logs)


• Logs that are generated by your operating system (EC2 or on-
premise instance)
• Informing you of system behavior (ex: /var/log/messages or
/var/log/auth.log)
• Usually streamed to CloudWatch Logs using a CloudWatch Agent
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
All kind of
Logs
• Access Logs
• list of all the requests for individual files that people have
requested from a website
• Example for httpd: /var/log/apache/access.log
• Usually for load balancers, proxies, web servers, etc…
• AWS provides some access logs

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Managed
Logs
• Load Balancer Access Logs (ALB, NLB, CLB) => to S3
• Access logs for your Load Balancers

• CloudTrail Logs => to S3 and CloudWatch Logs


• Logs for API calls made within your account

• VPC Flow Logs => to S3 and CloudWatch Logs


• Information about IP traffic going to and from network interfaces in your VPC

• Route 53 Access Logs => to CloudWatch Logs


• Log information about the queries that Route 53 receives

• S3 Access Logs => to S3


• Server access logging provides detailed records for the requests that are made to a
bucket

• CloudFront Access Logs => to S3


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Amazon
ElasticSearch
• May be called Amazon ES at the exam

• Managed version of ElasticSearch (open source


project)
• Needs to run on servers (not a serverless
offering)
• Use cases:
• Log Analytics
• Real Time application monitoring
• Security Analytics
• Full Text Search
• Clickstream Analytics

© Stephane Maarek
Indexing
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
ElasticSearch + Kibana +
Logstash
• ElasticSearch: provide search and indexing capability
• You must specify instance types, multi-AZ, etc

• Kibana:
• Provide real-time dashboards on top of the data that sits in ES
• Alternative to CloudWatch dashboards (more advanced
capabilities)

• Logstash:
• Log ingestion mechanism, use the “Logstash Agent”
• Alternative to CloudWatch Logs (you decide on retention and
granularity)
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Search
patterns DynamoDB

DynamoDB Table DynamoDB Stream Lambda Function Amazon ES

API to retrieve items API to search items

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Elastic Search
patterns CloudWatch
Logs Real time

CloudWatch Logs Subscription Filter Lambda Function Amazon ES


(managed by AWS)

Near Real Time

CloudWatch Logs Subscription Filter Kinesis Data Firehose Amazon ES

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Systems Manager
Overview
• Helps you manage your EC2 and On-Premise systems at
scale
• Get operational insights about the state of your
infrastructure
• Easily detect problems
• Patching automation for enhanced compliance
• Works for both Windows and Linux OS
• Integrated with CloudWatch metrics / dashboards
• Integrated with AWS Config
© Stephane•Maarek
Free service
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Systems Manager
Features
• Resource Groups Action:
• Insights: • Automation (shut down EC2, create AMIs)
• Insights Dashboard
• Inventory: discover and audit • Run Command
the software installed • Session Manager
• Compliance • Patch Manager
• Parameter Store • Maintenance Windows
• State Manager: define and
maintaining configuration of OS
and applications

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
How Systems Manager
works
• We need to install the
SSM agent onto the SSM
systems we control
• Installed by default on
Amazon
Linux AMI & some
Ubuntu AMI
• If an instance can’t
be controlled with
SSM, it’s SSM Agent SSM Agent SSM Agent
probably an issue
with the SSM agent!
• Make sure the EC2 EC2 Instance EC2 Instance On Premise VM
instances
have a proper IAM
role to allow SSM
actions
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Service
•Catalog
Users that are new to AWS have too many options, and
may create stacks that are not compliant / in line with the
rest of the organization

• Some users just want a quick self-service portal to launch a


set of
authorized products pre-defined by admins

• Includes: virtual machines, databases, storage options, etc…

• Enter AWS Service Catalog!


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Service Catalog
diagramProduct Control
ADMIN TASKS
Portfolio

CloudFormation Collection of Products IAM Permissions to


Templates Access Portfolios
USER TASKS

Product List Provisioned Products

launch

Authorized by IAM Ready to use


Properly Configured
Properly Tagged
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Service
Catalog
• Create and manage catalogs of IT services that are approved on AWS
• The “products” are CloudFormation templates
• Ex:Virtual machine images, Servers, Software, Databases, Regions, IP
address ranges
• CloudFormation helps ensure consistency, and standardization by
Admins
• They are assigned to Portfolios (teams)
• Teams are presented a self-service portal where they can launch the
products
• All the deployed products are centrally managed deployed services
• Helps with governance, compliance, and consistency
• Can give user access to launching products without requiring deep
AWS knowledge
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
EC2 Instance
•Compliance
AWS Config
• ensure instance has proper AWS configuration (not open SSH
port, etc)
• Audit and compliance over time
• Inspector
• Security Vulnerabilities scan from within the OS using the agent
• Or outside network scanning (no need for the agent)
• Systems Manager
• Run automations, patches, commands, inventory at scale
• Service Catalog
• Restrict how the EC2 instances can be launched to minimize
configurations
• Helpful to onboard beginner AWS users
• Configuration Management
• SSM, Opsworks, Ansible, Chef, Puppet, User Data
• Ensure the EC2 instances have proper configuration files

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
GuardDut
•yIntelligent Threat discovery to Protect AWS Account
• Uses Machine Learning algorithms, anomaly detection, 3rd
party data
• One click to enable (30 days trial), no need to install
software

• Input data includes:


• CloudTrail Logs: unusual API calls, unauthorized deployments
• VPC Flow Logs: unusual internal traffic, unusual IP address
• DNS Logs: compromised EC2 instances sending encoded data within
DNS queries

• Notifies you in case of findings


• Integration with AWS Lambda
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Cost Allocation
•Tags
With Tags we can track resources that relate to each
other
• With Cost Allocation Tags we can enable detailed
costing reports
• Just like Tags, but they show up as columns in Reports
• AWS Generated Cost Allocation Tags
• Automatically applied to the resource you create
• Starts with Prefix aws: (e.g. aws: createdBy)
• They’re not applied to resources created before the
activation
• User tags
• Defined by the user
• Starts with Prefix user :

• Cost Allocation Tags just appear in the Billing Console


© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Data
Protection
• TLS for in transit encryption
• ACM to manage SSL / TLS certificates
• Load Balancers
• ELB, ALB & NLB provide SSL termination
• Possible to have multiple SSL certificates per ALB
• Optional SSL/TLS encryption between ALB and EC2 instances
(else, HTTP)
• CloudFront with SSL
• All AWS services expose HTTPS endpoints
• You *could* (but *shouldn’t*) use HTTP with S3

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Data
Protection At Rest
Encryption
• S3 encryption
• SSE-S3: Server Side encryption using AWS’ key
• SSE-KMS: Server Side encryption using your own KMS key
• SSE-C: Server Side encryption by providing your own key (AWS won’t keep it)
• Client side encryption: send encrypted content to AWS, no knowledge of key
• Possibility to enable default encryption on S3 through setting
• Possibility to enforce encryption through S3 bucket policy (x-amz-server-side-
encryption)
• Glacier is encrypted by default
• One quick setting for: EBS, EFS, RDS, ElastiCache, DynamoDB, etc
• Usually uses either service encryption key or your own KMS key
• Category of data:
• PHI = protected health information
• PII = personally-identifying information

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Network
Protection
• Direct Connect: private, direct connection between site and
AWS
• Public internet: use a VPN
• Site-to-Site VPN supports Internet Protocol security (IPsec) VPN
connections (for linking on-premise to the cloud)

• Network ACL: stateless firewall at the VPC level


• WAF (Web Application Firewall): web security rules against
exploits
• Security Groups: stateful firewall on the instance’s
underlying hypervisor
© Stephane Maarek

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Coverage for Domain
5
• Troubleshoot issues and determine how to restore
operations
• CloudWatch, CloudFormation, Rollbacks, etc.
• Determine how to automate event management and
alerting + Apply concepts required to set up event-
driven automated actions
• CloudWatch Events+++, CloudWatch Alarms, SNS
• Automated Healing:
• CloudFormation (triggered by an alarm)
• Beanstalk (easier)
• OpsWorks (automatic host replacement, manages the
infrastructure)
• Autoscaling (we'll see in this section)
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Deployment
Strategies Auto
Scaling and ALB
• In place (one LB, one TG, one
ASG)
ALB ALB

Auto Scaling group Auto Scaling group

CodeDeploy
Instance, running v1 Same instance,
running v2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Deployment
Strategies Auto
Scaling and ALB
• Rolling (one LB, one TG, one ASG, new
instances)
ALB

Auto Scaling group

Instance, running v1 New Instance, running v2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Deployment
Strategies Auto
Scaling

and ALB
Replace (one LB, one TG, two ASG, new
instances)
ALB

Auto Scaling group New Auto Scaling group

Instance, running v1 New instance, v2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Deployment
Strategies Auto
Scaling

and ALB
Blue / Green (two LB, two TG, two ASG, new
instances, R53)

Amazon Route 53 record


Simple, Weighted

ALB ALB

Auto Scaling group Auto Scaling group

Instance, running v1 New instance, v2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Deployment
strategies
• Read more here:

• Blue/Green Deployments on AWS whitepaper, August


2016
• https://d1.awsstatic.com/whitepapers/
AWS_Blue_Green_Deployments.pdf

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB
Patterns S3
Metadata Index
writes

Amazon S3 Lambda Function DynamoDB Table

API for object metadata


- Search by date
- Total storage used by a customer
- List of all objects with certain attributes
- Find all objects uploaded within a date range

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
DynamoDB
patterns Elastic
Search

DynamoDB Table DynamoDB Stream Lambda Function Amazon ES

API to retrieve items API to search items

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi AZ in
•AWS
Services where Multi-AZ must be enabled manually:
• EFS, ELB, ASG, Beanstalk: assign AZ
• RDS, ElastiCache: multi-AZ (synchronous standby DB for
failovers)
• Aurora:
• data is stored automatically across multi-AZ
• Can have multi-AZ for the DB itself (same as RDS)
• ElasticSearch (managed): multi master
• Jenkins (self deployed): multi master

• Service where Multi-AZ is implicitly there:


• S3 (except OneZone-Infrequent Access)
• DynamoDB
• All of AWS’ proprietary, managed services

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
What about
EB S?is tied to a single AZ
• EBS Availability Zone 1 Availability Zone 2

• How can you make EBS “multi-


Auto Scaling group
AZ” ? Old Instance Min/max/desired = 1 New Instance

• ASG with 1 min/max/desired


• Lifecycle hooks for Terminate:
make a snapshot of the EBS Terminate Hook Launch Hook

volume EBS Volume EBS Volume


• Lifecycle hook for start:
copy the snapshot, create backup create

an EBS, attach to instance Snapshot

• Note: for PIOPS volumes (io1),


to get max performance after
snapshot, read the entire
volume once (pre- warming of
IO blocks)
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi Region
Services
• DynamoDB Global Tables (multi-way replication, enabled by
Streams)
• AWS Config Aggregators (multi region & multi account)
• RDS Cross Region Read Replicas (used for Read & DR)
• Aurora Global Database (one region is master, other is for Read
& DR)
• EBS volumes snapshots, AMI, RDS snapshots can be copied to
other regions
• VPC peering to allow private traffic between regions
• Route53 uses a global network of DNS servers
• S3 Cross Region Replication
• CloudFront for Global CDN at the Edge Locations
• Lambda@Edge for Global Lambda function at Edge Locations
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi Region with Route
53 • Health Check => automated DNS
failovers:
Amazon Route 53 record 1. Health checks that monitor an
(latency, geoproximity, etc)
endpoint (application, server,
Health Check Health Check
other AWS resource)
2. Health checks that monitor other
health checks (calculated health
ALB ALB checks)
(full control !!) – e.g. throttles of
3. DynamoDB,
Health checks custom metrics,CloudWatch
that monitor etc
Auto Scaling group Auto Scaling group
alarms

Instance Instance
Health Checks are integrated with CW
Region 1 Region 2 metrics

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudFormation
StackSets
• Create, update, or delete stacks across multiple accounts and
regions
with a single operation
• Administrator account to create StackSets
• Trusted accounts to create, update, delete stack
instances from StackSets
• When you update a stack set, all associated stack instances
are updated throughout all accounts and regions.
• Ability to set a maximum concurrent actions on targets (#
or %)
• Ability to set failure tolerance (# or %)
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Disaster Recovery
Overview
• Any event that has a negative impact on a company’s
business continuity or finances is a disaster
• Disaster recovery (DR) is about preparing for and
recovering from a disaster
• What kind of disaster recovery?
• On-premise => On-premise: traditional DR, and very
expensive
• On-premise => AWS Cloud: hybrid recovery
• AWS Cloud Region A => AWS Cloud Region B
• Need to define two terms:
• RPO: Recovery Point Objective
• RTO: Recovery Time Objective

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
RPO and
RTO
Data loss Downtime

RPO Disaster RTO

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Disaster Recovery
•Strategies
Backup and Restore
• Pilot Light
• Warm Standby
• Hot Site / Multi Site
Approach
Faster RTO

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Backup and Restore (High
RPO) Corporate data
center
AWS Cloud AWS Cloud

Amazon EC2

lifecycle
AWS Storage Gateway Amazon S3

AWS Snowball Glacier


AMI
AWS Cloud

EBS Scheduled regular


snapshots
Redshi Amazon RDS
ft Snapshot
RDS

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Disaster Recovery – Pilot
•Light
A small version of the app is always running in the
cloud
• Useful for the critical core (pilot light)
• Very similar to Backup and Restore
• Faster than Backup and Restore as critical systems are
already up Corporate data AWS Cloud
center Route 53

EC2 (not running)

Data Replication

RDS (running)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Warm
Standby
• Full system is up and running, but at
minimum size
• Upon disaster, we can scale to production
load
Reverse
Route 53
proxyCorporate data
ELB
AWS Cloud
center
App
Server
EC2 Auto Scaling failover
(minimum)
Master Data Replication
DB
RDS Slave (running)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi Site / Hot Site
•Approach
Very low RTO (minutes or seconds) – very
expensive
• Full ProductionCorporate
Scale data is active
running AWS
active
and On
AWS Cloud
center

Premise
Reverse
Route 53
proxy
ELB

App
Server
EC2 Auto Scaling failover
(production)
Master Data Replication
DB
RDS Slave (running)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
All AWS Multi
Region
AWS Cloud AWS Cloud
active active

Route 53

ELB ELB

EC2 Auto Scaling EC2 Auto Scaling failover


(production) (production)
Data Replication

Aurora Global Aurora Global (slave)


(master)

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Disaster Recovery
Tips
• Backup
• EBS Snapshots, RDS automated backups / Snapshots, etc…
• Regular pushes to S3 / S3 IA / Glacier, Lifecycle Policy, Cross Region
Replication
• From On-Premise: Snowball or Storage Gateway
• High Availability
• Use Route53 to migrate DNS over from Region to Region
• RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3
• Site to Site VPN as a recovery from Direct Connect
• Replication
• RDS Replication (Cross Region), AWS Aurora + Global Databases
• Database replication from on-premise to RDS
• Storage Gateway
• Automation
• CloudFormation / Elastic Beanstalk to re-create a whole new environment
• Recover / Reboot EC2 instances with CloudWatch if alarms fail
• AWS Lambda functions for customized automations
• Chaos
• Netflix has a “simian-army” randomly terminating EC2

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi-Region Disaster Recovery
Checklist
• Is my AMI copied? Is it stored in the parameter store?
• Is my CloudFormation StackSet working and tested to work
in another region ?
• What's my RPO and RTO?
• Are Route53 Health Checks working correctly? Tied to a
CW Alarm?
• How can I automate with CloudWatch Events to Trigger
some Lambda functions and perform a RDS Read Replication
promotion ?
• Is my data backed up? RPO & RTO? EBS, AMI, RDS, S3
CRR, Global DynamoDB Tables, RDS & Aurora Global
Read Replicas
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Backups & Multi-Region
DR
• EFS Backup:
• AWS Backup with EFS (frequency, when, retain time, lifecycle policy)
- managed
• EFS to EFS backup https://aws.amazon.com/solutions/efs-to-efs-
backup-solution/
• Multi-region idea: EFS => S3 => S3 CRR => EFS
• Route 53 Backup:
• Use ListResourceRecordSets API for exports
• Write your own script for imports into R53 or other DNS
provider
• Elastic Beanstalk Backup:
• Saved configurations using the eb cli or AWS console
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
On-Premise strategy with
AWS
• Ability to download Amazon Linux 2 AMI as a VM (.iso format)
• VMWare, KVM,VirtualBox (Oracle VM), Microsoft Hyper-V
• VM Import / Export
• Migrate existing applications into EC2
• Create a DR repository strategy for your on-premise VMs
• Can export back the VMs from EC2 to on-premise
• AWS Application Discovery Service
• Gather information about your on-premise servers to plan a
migration
• Server utilization and dependency mappings
• Track with AWS Migration Hub
• AWS Database Migration Service (DMS)
• replicate On-premise => AWS , AWS => AWS, AWS => On-
premise
• Works with various database technologies (Oracle, MySQL,
DynamoDB, etc..)
• AWS Server Migration Service (SMS)
• Incremental replication of on-premise live servers to AWS
© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS
•Organizations
Global service
• Allows to manage multiple AWS accounts
• The main account is the master account – you can’t change
it
• Other accounts are member accounts
• Member accounts can only be part of one organization
• Consolidated Billing across all accounts - single payment
method
• Pricing benefits from aggregated usage (volume discount for
EC2, S3…)
© Stephane•Maarek
API is available to automate AWS account creation
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi Account
Strategies
• Create accounts per department, per cost center, per dev /
test / prod, based on regulatory restrictions (using SCP), for
better resource isolation (ex:VPC), to have separate per-account
service limits, isolated account for logging

• Multi Account vs One Account Multi VPC


• Use tagging standards for billing purposes
• Enable CloudTrail on all accounts, send logs to central
S3 account
• Send CloudWatch Logs to central logging account
© Stephane•Maarek
Establish Cross Account Roles for Admin purposes
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Organizational Units (OU) -
Examples
Business Unit Environmental Lifecycle Project-based

https://aws.amazon.com/answers/account-management/aws-
multi-account-billing-strategy/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS
Organization Root OU

Master Account

Dev OU Prod OU

Finance OU HR OU

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Service Control Policies
(SCP)
• Whitelist or blacklist IAM actions
• Applied at the OU or Account level
• Does not apply to the Master Account
• SCP is applied to all the Users and Roles of the Account, including
Root user
• The SCP does not affect service-linked roles
• Service-linked roles enable other AWS services to integrate with AWS
Organizations and can't be restricted by SCPs.
• SCP must have an explicit Allow (does not allow anything by
default)
• Use cases:
• Restrict access to certain services (for example: can’t use EMR)
© Stephane Maarek
• Enforce PCI compliance by explicitly disabling services
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
SCP
Hierarchy
FullAWSAccess SCP Root OU
• Master Account
• Can do
anything
• (no SCP
DenyAccessAthena SCP Master Account • Account
apply)
A • Can do anything
• EXCEPT access
DenyRedshift SCP Prod OU Redshift (explicit
Deny from OU)
• Account
AuthorizeRedshift SCP B • Can do anything
Account A • EXCEPT access Redshift
(explicit Deny from Prod
OU)
DenyAWSLambda SCP HR OU Finance OU • EXCEPT access Lambda
(explicit Deny from HR
• Account
OU)
Account B Account C C • Can do anything
• EXCEPT access Redshift
(explicit Deny from Prod
OU)

© Stephane Maarek
SCP Examples

NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com


Blacklist and Whitelist
strategies

More examples: https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_example-scps.html

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
AWS Organization – Moving
Accounts
To migrate accounts from one organization to another
Org A Org B

1. Remove the member account from


the old organization
2. Send an invite to the new
organization
3. Accept the invite to the new organization
from the member account

If you want the master account of the old


organization to also join the new organization, do
the following:
4. Remove the member accounts
from the organizations using
procedure above
5. Delete the old organization
6. Repeat the process above to invite the old
master account to the new org

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Multi Account with
AWS
• Any cross account action requires to define IAM “trust”
• IAM roles can be assumed cross account
• no need to share IAM creds
• Uses AWS Security Token Service (STS)
• CodePipeline – cross account invocation of CodeDeploy for
example
• AWS Config – aggregators
• CloudWatch Events – Event Bus = multi accounts events
• CloudFormation – StackSets

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
CloudWatch Logs: Centralize
Logs Multi Account & Multi
Region

https://aws.amazon.com/blogs/architecture/stream-amazon-cloudwatch-logs-to-
a-centralized-account-for-audit-and-analysis/

© Stephane Maarek
NOT FOR DISTRIBUTION © Stephane Maarek www.datacumulus.com
Next
•steps
Congratulations, you have covered all the domains!
• Make sure you revisit the lectures and practice as much
as possible

• A good extra resource to do is the AWS Exam Readiness


course at:
• https://www.aws.training/Details/eLearning?id=34146
• Another good resource is to read the AWS DevOps blog:
• https://aws.amazon.com/blogs/devops/

• The DevOps exam is hard, and tests experience…


• Practice, practice, practice!
© Stephane Maarek

You might also like