APP306 - Using AWS CloudFormation For Deployment and Management at Scale
APP306 - Using AWS CloudFormation For Deployment and Management at Scale
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Who are we?
Fifth largest site in UK, 55th Globally
• Top 20 in News, Sport, Arts, Childrens
Source: Alexa
What do our services do?
Deploy at scale
• > 300 deployments per day
• 60,000 deployments in first 18 months
Deploy robustly
• All key video transcoding and packaging for BBC iPlayer
• Pipeline delivering election results to BBC News
• Live text for all BBC Sport events
How are Yavor and I involved?
We build tools for the full development lifecycle
Leading to:
• Greater delta between releases
• Longer feedback loops
• High stress around emergency changes
Infrastructure is a constrained resource
Leading to:
• Inflexibility to changing requirements
• Shared tenancy of hardware, weak software isolation
Three emerging trends
Continuous delivery
– Can we build better quality things, faster?
Cloud
– Can we reduce our costs or increase our agility?
DevOps
– Can we strike a better balance of freedom and
responsibility for engineers?
The grappling hook
The grappling hook
• Take two teams: one product, one platform
The not-so-good
• We learned many lessons about how to build, fewer about why…
Storming the tower
The platform pendulum
Restriction Freedom
The platform pendulum
Predictability Chaos
The platform pendulum
Slow Fast
Tools
Establishing principles
• Establish strong defaults for the way things get
built and create tooling for that
• Assume that there will be use cases where the
defaults don’t fit
Managing infrastructure at scale
• Repeatability
– Never “spin it up in the console and hope”
• Flexibility
– Teams are going to need that obscure service
• StackOverflow-ability
– If there is a well-known way of expressing it in the
world, use it
Managing deployment at scale
• Repeatability
– All instances should be identical
• Robustness
– Look for fail-safe mechanisms
• Resilience
– Minimize dependencies at instance startup
Handling support at scale
• Access
– Engineers should have access to the services they run
• Patterns
– Create patterns and templates for core infrastructure pieces
• Support
– Ask developers to take “the phone”
The rest is just software…
Inside the machine
Version Control How we deploy
Pull
Build binaries in a reproducible way;
build them once; automate everything
Commit
Push
Build Jenkins Repos
Registe
r Deploy
Cosmos
Test
Promote
Bake
Bakery
Live
Infrastructure
provisioning
Hardware is now software, embrace it
and treat it that way!
v1
v2
v3
Application infrastructure
Let’s look at what an application might look like and how we can define it with AWS
CloudFormation
service-0.1.0.json resources-0.1.0.json
The best way to form clouds
Registe
r
serv ice stack
Updates
Cosmos
Upd Test
ates
serv
Bake ice
s tack
Bakery
Live
The Bakery
Environment
configuration
Service
Software binary
Base OS
2 step snapshotting
snap-432jrse
snap-w3r153r
Re-baking for different environments
snap-456qwf
snap-w3r153r
snap-w3r153r
snap-w3r153r
Version Control How we deploy
Pull
Cosmos bakes an AMI and then
updates the service stack…
Commit
Push
Build Jenkins Repos
Registe
r
serv ice stack
Updates
Cosmos
Upd Test
ates
serv
Bake ice
s tack
Bakery
Live
…what actually happens
TEST LIVE
"UpdatePolicy": { "UpdatePolicy": {
"AutoScalingRollingUpdate": { "AutoScalingRollingUpdate": {
"PauseTime": "PT0S", "PauseTime": "PT15S",
"MaxBatchSize": "5", "MaxBatchSize": "2",
"MinInstancesInService": "0" "MinInstancesInService": "2"
} }
} }
Version Control Let’s see it in
Pull
action!
Commit
Push
Build Jenkins Repos
Registe
r
serv ice stack
Updates
Cosmos
Upd Test
ates
serv
Bake ice
s tack
Bakery
Live
Demo time
Let’s deploy one of our services and
see what happens…
AWS CloudFormation beyond the app
Defining our core infrastructure
Private Public
Each AWS account is setup an
Amazon Virtual Private Cloud
spreading across the three Availability
Zones; the VPC contains three private
eu-west-1b
Private Public
Environments
Development and production
environments are built in separate
Production Development accounts to bring full isolation from
API and resource limits
Bastions
In Closing…
Recapping
Scale
• > 300 deployments per day
• 50,000 deployments in first 18 months
Speed
• Time from laptop to live reduced from 2 days to 10 minutes
Commitment
• All key video transcoding and packaging for BBC iPlayer
• Pipeline delivering election results to BBC News
• Live text for all BBC Sport events
Want to know more?
• We’re starting to share our work: https://github.com/bbc
• Come and talk to us, or our colleagues this week
• We’re hiring, in London and Salford, UK: http
://www.bbc.co.uk/careers
Please give us your feedback on this session.
Complete session evaluations and earn re:Invent swag.
APP306 http://bit.ly/awsevals