Immutable infrastructure isn’t the answer

Immutable Infrastructure Isn’t the Answer
Sam Bashton

Who am I?
• Sam Bashton
• Ran a cloud (AWS + GCP) consultancy firm until 2016
when it was acquired by Claranet Group
• Working with config management (Puppet) since 2007
• Working with AWS since 2009
• Working with GCP since 2014

What is this talk about?
• How we tried to use immutable infrastructure
• How and why it wasn’t right for us
• What we do instead

Business Model
• Charge customer for building super reliable infrastructure
• Charge customers a monthly support fee

Hard won experience
• Migrated over 1000 apps to public cloud
• Variety of approaches to managing infra and deploying
code
- Including Immutable Infrastructure
• 2011 onwards

AWS Concepts
• Each customer in one or more region
• Each region has two or more data centres (‘availability
zones’)
- Most have three
• SLA says that no more than one data centre will be down
at once in a region
“Region Unavailable” and “Region Unavailability” mean that more than one Availability Zone in
which you are running an instance or task (one or more containers), as applicable, within the same
Region, is “Unavailable” to you.

Data Lives in Services
• We use AWS services to store all state
- RDS (MySQL, Postgres, Oracle, MS SQL)
- Elasticache (Redis)
- DynamoDB
- AWS Elasticsearch
• The instances in question are ‘stateless’

Immutable
immutable
/ɪˈmjuːtəb(ə)l/
unchanging over time or unable to be
changed

What is immutable infrastructure?
• Automatically build a golden image
• New infrastructure using the new image replaces the old
infrastructure

Why would I want to do that?
• Unit of deployment becomes a machine image
• Test the artifact and have confidence it’ll be the same in
production

Fudgetown
• All the images are the same, except..
- We need to specify a different database location in each environment
• And we need to specify it in an XML config file
- We have different sizes of machine in each environment, and need to using
different JVM settings

Why not just build lots of images?
• Image building is automatic - why don’t we just build an
image for each environment?

Why not just build lots of images?
• Unit of deploy is a machine
image
• Images are created via an
imperative set of
commands
- Shell Script
- Ansible
• What is in each image?
What is different?

Immutable-ish
• Scripts at startup handle differences
• Consul cluster?
- consul-template

Fudgetown
• Many dozens of microservices
• All with configuration files
- XML, yaml, ini, other

Fudgetown
• Multiple processes make up a single ‘service’
• All have to be started in a specific order

Fudgetown
• Deploying changes takes much longer
- ~10-15 minutes for a Packer build and deployment to test infra
• Tests on minor changes take a lot longer

Fudgetown
• We don’t know what the
state of our instances is,
or should be
• We don’t know whether
config files were written
successfully
• It takes ages to test
things

Back to the drawing board
• Doing the thing the ‘cool kids’ say they are doing is not
the path to technical success
• Our customers care whether their app is working, not how

What do we actually need?
• Infrastructure and configuration in a known and verifiable
state
• Self-healing
• Fault tolerant - should continue to work even if a whole
data centre (‘AZ’) fails

• Autoscaling which works every time
• New instances which provision quickly (autoscaling)
• Automated deployments
- Possibly Canary, Blue/Green
• Nice to have: quick to test changes
What do we actually need?

Instance configuration in a known state
• We need a way to describe configuration on the machine
• A declarative language
• Should tell us if something went wrong

Except..
• Puppet master doesn’t lend itself to an autoscaling world
- Performance bottleneck bringing up new instances
- Single point of failure
- Especially in the zone failure scenario

The rules
• Terminating an instance should always automatically give
you a replacement which works
- Even if external repos are down
• CentOS mirrors
• EPEL
• Elasticsearch yum repo
• Gem
• Pip
• We should expect data centre (‘AZ’) failure

How do we do it?
• Packer - base common AMI
• Puppet
• S3
• yum/apt
• Jenkins

Packer
• Build a base image
• Generally common to all roles
• Sometimes will have per-role AMIs
• pip/gem dependencies generally installed here
- Easier than building a package, even with FPM
• Install big RPMs here to save time at provisioning

Masterless Puppet
• Put the Puppet manifests and modules on an instance
• Run puppet apply

Distributing Puppet
• Puppet needs to be on every instance
• Build an RPM/DEB containing Puppet manifests/modules
• Add to a RPM/DEB repo in S3
• Script at startup (cloud-init) installs Puppet
• Puppet runs from systemd❤

External Repos
• Mirror CentOS, etc repos in S3
• Repos are copied as part of deployment process
- Dev repos continually updated
- When code is promoted to next step (eg staging), repos also copied
- OS upgrades are a part of the normal deployment process

Repos in S3
• Puppet, application code
in yum repos in S3
• Repo created from a
Terraform module
• Just drop your RPM in, it
handles metadata
generation
https://registry.terraform.io/
modules/claranet/s3-yum-
repo/aws

Config updates
• AWS provides SSM
• SSM triggers updating Puppet RPM, running Puppet
• ~120 seconds from commit to Puppet run finishing

Success
• We have been using this approach for 6+ years
• Tried other approaches
• Always came back for apps unsuitable for containerisation

Your problems are not my problems
• Have lovely 12 factor apps?
• Why are you wasting time building infrastructure?!

Career advice
• You don’t get paid to build infrastructure
• ‘Serverless’ isn’t NoOps
• Understanding distributed systems and their many failure
modes the path to future success

Conclusions
• Concentrate on the desired outcome, not what somebody
at a conference said worked for them
• Find the things that will give you the most success most
easily, then iterate

• Architect for ease of management
• Don’t be constrained by ‘best practice’
• Don’t be embarrassed by ‘ugly hacks’ when they solve real
problems
Conclusions

Immutable infrastructure isn’t the answer

More Related Content

What's hot (20)

Similar to Immutable infrastructure isn’t the answer (20)

Recently uploaded (20)

Immutable infrastructure isn’t the answer