SlideShare a Scribd company logo
Immutable Infrastructure Isn’t the Answer
Sam Bashton
Immutable infrastructure isn’t the answer
Who am I?
• Sam Bashton
• Ran a cloud (AWS + GCP) consultancy firm until 2016
when it was acquired by Claranet Group
• Working with config management (Puppet) since 2007
• Working with AWS since 2009
• Working with GCP since 2014
Immutable infrastructure isn’t the answer
What is this talk about?
• How we tried to use immutable infrastructure
• How and why it wasn’t right for us
• What we do instead
Business Model
• Charge customer for building super reliable infrastructure
• Charge customers a monthly support fee
Hard won experience
• Migrated over 1000 apps to public cloud
• Variety of approaches to managing infra and deploying
code
- Including Immutable Infrastructure
• 2011 onwards
Immutable infrastructure isn’t the answer
Immutable infrastructure isn’t the answer
Immutable infrastructure isn’t the answer
Terraform
AWS Concepts
• Each customer in one or more region
• Each region has two or more data centres (‘availability
zones’)
- Most have three
• SLA says that no more than one data centre will be down
at once in a region
“Region Unavailable” and “Region Unavailability” mean that more than one Availability Zone in
which you are running an instance or task (one or more containers), as applicable, within the same
Region, is “Unavailable” to you.
Data Lives in Services
• We use AWS services to store all state
- RDS (MySQL, Postgres, Oracle, MS SQL)
- Elasticache (Redis)
- DynamoDB
- AWS Elasticsearch
• The instances in question are ‘stateless’
Immutable infrastructure isn’t the answer
Immutable
immutable
/ɪˈmjuːtəb(ə)l/
unchanging over time or unable to be
changed
What is immutable infrastructure?
• Automatically build a golden image
• New infrastructure using the new image replaces the old
infrastructure
Why would I want to do that?
• Unit of deployment becomes a machine image
• Test the artifact and have confidence it’ll be the same in
production
Blue/Green Deployments
Canary Deployments
Fudgetown
• All the images are the same, except..
- We need to specify a different database location in each environment
• And we need to specify it in an XML config file
- We have different sizes of machine in each environment, and need to using
different JVM settings
Why not just build lots of images?
• Image building is automatic - why don’t we just build an
image for each environment?
Why not just build lots of images?
• Unit of deploy is a machine
image
• Images are created via an
imperative set of
commands
- Shell Script
- Ansible
• What is in each image?
What is different?
Immutable-ish
• Scripts at startup handle differences
• Consul cluster?
- consul-template
Fudgetown
Fudgetown
• Many dozens of microservices
• All with configuration files
- XML, yaml, ini, other
Fudgetown
• Multiple processes make up a single ‘service’
• All have to be started in a specific order
Fudgetown
• Deploying changes takes much longer
- ~10-15 minutes for a Packer build and deployment to test infra
• Tests on minor changes take a lot longer
Fudgetown
• We don’t know what the
state of our instances is,
or should be
• We don’t know whether
config files were written
successfully
• It takes ages to test
things
Back to the drawing board
• Doing the thing the ‘cool kids’ say they are doing is not
the path to technical success
• Our customers care whether their app is working, not how
What do we actually need?
• Infrastructure and configuration in a known and verifiable
state
• Self-healing
• Fault tolerant - should continue to work even if a whole
data centre (‘AZ’) fails
• Autoscaling which works every time
• New instances which provision quickly (autoscaling)
• Automated deployments
- Possibly Canary, Blue/Green
• Nice to have: quick to test changes
What do we actually need?
Instance configuration in a known state
• We need a way to describe configuration on the machine
• A declarative language
• Should tell us if something went wrong
Immutable infrastructure isn’t the answer
Except..
• Puppet master doesn’t lend itself to an autoscaling world
- Performance bottleneck bringing up new instances
- Single point of failure
- Especially in the zone failure scenario
The rules
• Terminating an instance should always automatically give
you a replacement which works
- Even if external repos are down
• CentOS mirrors
• EPEL
• Elasticsearch yum repo
• Gem
• Pip
• We should expect data centre (‘AZ’) failure
How do we do it?
• Packer - base common AMI
• Puppet
• S3
• yum/apt
• Jenkins
Jenkins
Packer
• Build a base image
• Generally common to all roles
• Sometimes will have per-role AMIs
• pip/gem dependencies generally installed here
- Easier than building a package, even with FPM
• Install big RPMs here to save time at provisioning
Masterless Puppet
• Put the Puppet manifests and modules on an instance
• Run puppet apply
Distributing Puppet
• Puppet needs to be on every instance
• Build an RPM/DEB containing Puppet manifests/modules
• Add to a RPM/DEB repo in S3
• Script at startup (cloud-init) installs Puppet
• Puppet runs from systemd❤
External Repos
• Mirror CentOS, etc repos in S3
• Repos are copied as part of deployment process
- Dev repos continually updated
- When code is promoted to next step (eg staging), repos also copied
- OS upgrades are a part of the normal deployment process
Repos in S3
• Puppet, application code
in yum repos in S3
• Repo created from a
Terraform module
• Just drop your RPM in, it
handles metadata
generation
https://registry.terraform.io/
modules/claranet/s3-yum-
repo/aws
Config updates
• AWS provides SSM
• SSM triggers updating Puppet RPM, running Puppet
• ~120 seconds from commit to Puppet run finishing
Success
• We have been using this approach for 6+ years
• Tried other approaches
• Always came back for apps unsuitable for containerisation
Your problems are not my problems
• Have lovely 12 factor apps?
• Why are you wasting time building infrastructure?!
Immutable infrastructure isn’t the answer
Career advice
• You don’t get paid to build infrastructure
• ‘Serverless’ isn’t NoOps
• Understanding distributed systems and their many failure
modes the path to future success
Conclusions
• Concentrate on the desired outcome, not what somebody
at a conference said worked for them
• Find the things that will give you the most success most
easily, then iterate
• Architect for ease of management
• Don’t be constrained by ‘best practice’
• Don’t be embarrassed by ‘ugly hacks’ when they solve real
problems
Conclusions

More Related Content

PPTX
Integration-Monday-Terraform-Serverless
PDF
Five Years of EC2 Distilled
PPTX
An Ops Primer to Productionalizing Datameer
ZIP
Cooking up a Cloud
PDF
Puppet Camp Melbourne 2014:
PDF
The NBN Puppet Journey
PDF
Armada - the way to ship microservices
PDF
Journey towards serverless infrastructure
Integration-Monday-Terraform-Serverless
Five Years of EC2 Distilled
An Ops Primer to Productionalizing Datameer
Cooking up a Cloud
Puppet Camp Melbourne 2014:
The NBN Puppet Journey
Armada - the way to ship microservices
Journey towards serverless infrastructure

What's hot (20)

PPTX
Infrastructure as Code - Getting Started, Concepts & Tools
PPTX
Manage your environment with DSC
PPTX
EDB Failover Manager for Seamless Failover & Switchover
PDF
Immutable infrastructure with Boxfuse
PPTX
Extending Ansible - Ansible Benelux meetup - Amsterdam 11-02-2016
PPTX
Apple M1 & Ionic: Should I switch?
PPTX
Benchmarking like a pro
PPTX
Ansible benelux meetup - Amsterdam 27-5-2015
KEY
Standardizing and Managing Your Infrastructure - MOSC 2011
PDF
Puppet camp LA and Phoenix 2015: Keynote
PPTX
Openstack hk-summit-upgrades-talk
PDF
JAMF User Group September 2015
PDF
NDev Talk - Serverless Design Patterns
PPTX
Building a PaaS with Docker and AWS
PDF
UEMB260: Provisioning: Under the Hood
PDF
Inrastructure as Code
PDF
Building Creative Product Extensions with Experience Manager
PPTX
You don’t need DTAP + Backbase implementation - Amsterdam 17-12-2015
PPTX
Ansible training | redhat Ansible 2.5 Corporate course - GOT
PDF
High performance in react native
Infrastructure as Code - Getting Started, Concepts & Tools
Manage your environment with DSC
EDB Failover Manager for Seamless Failover & Switchover
Immutable infrastructure with Boxfuse
Extending Ansible - Ansible Benelux meetup - Amsterdam 11-02-2016
Apple M1 & Ionic: Should I switch?
Benchmarking like a pro
Ansible benelux meetup - Amsterdam 27-5-2015
Standardizing and Managing Your Infrastructure - MOSC 2011
Puppet camp LA and Phoenix 2015: Keynote
Openstack hk-summit-upgrades-talk
JAMF User Group September 2015
NDev Talk - Serverless Design Patterns
Building a PaaS with Docker and AWS
UEMB260: Provisioning: Under the Hood
Inrastructure as Code
Building Creative Product Extensions with Experience Manager
You don’t need DTAP + Backbase implementation - Amsterdam 17-12-2015
Ansible training | redhat Ansible 2.5 Corporate course - GOT
High performance in react native
Ad

Similar to Immutable infrastructure isn’t the answer (20)

PDF
DevOps Fest 2020. immutable infrastructure as code. True story.
PDF
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
PDF
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
PDF
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
PDF
Lessons learned from writing over 300,000 lines of infrastructure code
PDF
Immutable infrastructure with Docker and containers (GlueCon 2015)
PPTX
Evolving Infrastructure
PPTX
Introduction to DevOps on AWS
PDF
Continuous Deployment @ AWS Re:Invent
PPTX
Immutable infrastructure tsap_v2
PDF
Immutable Infrastructure: Rise of the Machine Images
PDF
Greenfields tech decisions
PDF
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
PDF
#SREcon Immutable Infrastructure: rethinking configuration mgmt
PDF
SRECon 18 Immutable Infrastructure
PPTX
Continuous Delivery in the Cloud
PPTX
Continuous Delivery in the AWS Cloud
PDF
Immutable infrastructure - Plain Concepts DevOps day
PDF
What we talk about when we talk about DevOps
PDF
Infrastructure as Code Patterns
DevOps Fest 2020. immutable infrastructure as code. True story.
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Puppet Camp London 2014: Chasing AMI: baking Amazon machine images with Jenki...
OSDC 2018 | Migrating to the cloud by Devdas Bhagat
Lessons learned from writing over 300,000 lines of infrastructure code
Immutable infrastructure with Docker and containers (GlueCon 2015)
Evolving Infrastructure
Introduction to DevOps on AWS
Continuous Deployment @ AWS Re:Invent
Immutable infrastructure tsap_v2
Immutable Infrastructure: Rise of the Machine Images
Greenfields tech decisions
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
#SREcon Immutable Infrastructure: rethinking configuration mgmt
SRECon 18 Immutable Infrastructure
Continuous Delivery in the Cloud
Continuous Delivery in the AWS Cloud
Immutable infrastructure - Plain Concepts DevOps day
What we talk about when we talk about DevOps
Infrastructure as Code Patterns
Ad

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Hybrid model detection and classification of lung cancer
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
August Patch Tuesday
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
project resource management chapter-09.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Approach and Philosophy of On baking technology
PPTX
A Presentation on Artificial Intelligence
Enhancing emotion recognition model for a student engagement use case through...
Tartificialntelligence_presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Hybrid model detection and classification of lung cancer
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
August Patch Tuesday
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
project resource management chapter-09.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Hindi spoken digit analysis for native and non-native speakers
Heart disease approach using modified random forest and particle swarm optimi...
DP Operators-handbook-extract for the Mautical Institute
Approach and Philosophy of On baking technology
A Presentation on Artificial Intelligence

Immutable infrastructure isn’t the answer

  • 1. Immutable Infrastructure Isn’t the Answer Sam Bashton
  • 3. Who am I? • Sam Bashton • Ran a cloud (AWS + GCP) consultancy firm until 2016 when it was acquired by Claranet Group • Working with config management (Puppet) since 2007 • Working with AWS since 2009 • Working with GCP since 2014
  • 5. What is this talk about? • How we tried to use immutable infrastructure • How and why it wasn’t right for us • What we do instead
  • 6. Business Model • Charge customer for building super reliable infrastructure • Charge customers a monthly support fee
  • 7. Hard won experience • Migrated over 1000 apps to public cloud • Variety of approaches to managing infra and deploying code - Including Immutable Infrastructure • 2011 onwards
  • 12. AWS Concepts • Each customer in one or more region • Each region has two or more data centres (‘availability zones’) - Most have three • SLA says that no more than one data centre will be down at once in a region “Region Unavailable” and “Region Unavailability” mean that more than one Availability Zone in which you are running an instance or task (one or more containers), as applicable, within the same Region, is “Unavailable” to you.
  • 13. Data Lives in Services • We use AWS services to store all state - RDS (MySQL, Postgres, Oracle, MS SQL) - Elasticache (Redis) - DynamoDB - AWS Elasticsearch • The instances in question are ‘stateless’
  • 16. What is immutable infrastructure? • Automatically build a golden image • New infrastructure using the new image replaces the old infrastructure
  • 17. Why would I want to do that? • Unit of deployment becomes a machine image • Test the artifact and have confidence it’ll be the same in production
  • 20. Fudgetown • All the images are the same, except.. - We need to specify a different database location in each environment • And we need to specify it in an XML config file - We have different sizes of machine in each environment, and need to using different JVM settings
  • 21. Why not just build lots of images? • Image building is automatic - why don’t we just build an image for each environment?
  • 22. Why not just build lots of images? • Unit of deploy is a machine image • Images are created via an imperative set of commands - Shell Script - Ansible • What is in each image? What is different?
  • 23. Immutable-ish • Scripts at startup handle differences • Consul cluster? - consul-template
  • 25. Fudgetown • Many dozens of microservices • All with configuration files - XML, yaml, ini, other
  • 26. Fudgetown • Multiple processes make up a single ‘service’ • All have to be started in a specific order
  • 27. Fudgetown • Deploying changes takes much longer - ~10-15 minutes for a Packer build and deployment to test infra • Tests on minor changes take a lot longer
  • 28. Fudgetown • We don’t know what the state of our instances is, or should be • We don’t know whether config files were written successfully • It takes ages to test things
  • 29. Back to the drawing board • Doing the thing the ‘cool kids’ say they are doing is not the path to technical success • Our customers care whether their app is working, not how
  • 30. What do we actually need? • Infrastructure and configuration in a known and verifiable state • Self-healing • Fault tolerant - should continue to work even if a whole data centre (‘AZ’) fails
  • 31. • Autoscaling which works every time • New instances which provision quickly (autoscaling) • Automated deployments - Possibly Canary, Blue/Green • Nice to have: quick to test changes What do we actually need?
  • 32. Instance configuration in a known state • We need a way to describe configuration on the machine • A declarative language • Should tell us if something went wrong
  • 34. Except.. • Puppet master doesn’t lend itself to an autoscaling world - Performance bottleneck bringing up new instances - Single point of failure - Especially in the zone failure scenario
  • 35. The rules • Terminating an instance should always automatically give you a replacement which works - Even if external repos are down • CentOS mirrors • EPEL • Elasticsearch yum repo • Gem • Pip • We should expect data centre (‘AZ’) failure
  • 36. How do we do it? • Packer - base common AMI • Puppet • S3 • yum/apt • Jenkins
  • 38. Packer • Build a base image • Generally common to all roles • Sometimes will have per-role AMIs • pip/gem dependencies generally installed here - Easier than building a package, even with FPM • Install big RPMs here to save time at provisioning
  • 39. Masterless Puppet • Put the Puppet manifests and modules on an instance • Run puppet apply
  • 40. Distributing Puppet • Puppet needs to be on every instance • Build an RPM/DEB containing Puppet manifests/modules • Add to a RPM/DEB repo in S3 • Script at startup (cloud-init) installs Puppet • Puppet runs from systemd❤
  • 41. External Repos • Mirror CentOS, etc repos in S3 • Repos are copied as part of deployment process - Dev repos continually updated - When code is promoted to next step (eg staging), repos also copied - OS upgrades are a part of the normal deployment process
  • 42. Repos in S3 • Puppet, application code in yum repos in S3 • Repo created from a Terraform module • Just drop your RPM in, it handles metadata generation https://registry.terraform.io/ modules/claranet/s3-yum- repo/aws
  • 43. Config updates • AWS provides SSM • SSM triggers updating Puppet RPM, running Puppet • ~120 seconds from commit to Puppet run finishing
  • 44. Success • We have been using this approach for 6+ years • Tried other approaches • Always came back for apps unsuitable for containerisation
  • 45. Your problems are not my problems • Have lovely 12 factor apps? • Why are you wasting time building infrastructure?!
  • 47. Career advice • You don’t get paid to build infrastructure • ‘Serverless’ isn’t NoOps • Understanding distributed systems and their many failure modes the path to future success
  • 48. Conclusions • Concentrate on the desired outcome, not what somebody at a conference said worked for them • Find the things that will give you the most success most easily, then iterate
  • 49. • Architect for ease of management • Don’t be constrained by ‘best practice’ • Don’t be embarrassed by ‘ugly hacks’ when they solve real problems Conclusions