Site Reliability in
the Serverless Age
Erik Peterson
CEO & Founder
CloudZero
erik@cloudzero.com | @silvexis
Serverless Boston Meetup | 9/18/2018
About Me
Erik Peterson – erik@cloudzero.com, @silvexis
• CEO and Founder of CloudZero
• I’m recovering from the application security industry,
now 100% focused on Cloud and Serverless
• Have been building systems on AWS since 2008
• Previously
• Veracode
• HP, SPI Dynamics, Sanctum
• United Nations IAEA, US Department of State,
SunTrust, Moody’s Investors
• Fun fact: I’ve lived in 6 US states and 3 countries
About CloudZero
Actionable Intelligence for Serverless Systems
• Dynamically map your environment, automatically discover
resources, relationships and data flows
• Easily identify errors and bottlenecks
• Track system costs and identify cost anomalies within
hours, not days or months
• Agentless deployment, requires only AWS data sources like
X-ray, CloudTrail and CloudWatch
What is Serverless?
What is Reliability?
How does Serverless affect Reliability?
How CloudZero Operates
The Future
WHAT IS
SERVERLESS?
• Event driven
• Invisible infrastructure
• Automatically scales with usage
• Fault tolerance and high availability built in
• Never pay for idle
Site reliability in the serverless age  - Serverless Boston Meetup
Serverless is not
just Functions as
a Service
But FaaS is one of its most important building blocks
Serverless is a Spectrum (AWS edition)
0% 100%50%
More ServerlessLess Serverless
WernerVogels
CTOAmazonWeb Services
Sowhat does
reliability evenmean?
Reliability is the trustworthiness of a system’s
ability to delight the customer
Two forces
exist today
that drive
reliability
• DevOps (culture)
• Eliminate Dev and
Ops silos
• Accept failure as
normal
• Driven to achieve the
fastest feature velocity
• Measure everything
• Site Reliability
Engineering (practice)
• Automate everything
• Manage to Service
Level Objectives
• Monitor what matters
• Forecast demand and
manage capacity
• Use resources
efficiently
How does Serverless affect these forces?
Hint: Change is coming
Serverless
effect on
DevOps
• Eliminate Dev and Ops
silos
• Accept failure as normal
• Driven to achieve the
fastest feature velocity
• Measure everything
• Cost effective systems
are well built systems
REQUIRED
REQUIRED
CHANGE NOW HAPPENING
FASTER THAN YOUR CAN
KEEP UP WITH
BUILT IN, BUT WE ARE
DROWNING IN DATA
WAIT WHAT? WHAT ARE
COGS?
NEW
Serverless
effect on Site
Reliability
Engineering
• Automate everything
• Manage to Service
Level Objectives
• Monitor what matters
• Forecast demand and
manage capacity
• Use resources
efficiently
REQUIRED
I KNOW MY SLO, BUT WHAT
IS MY SLA?
MY DASHBOARDS CAN’T
KEEP UP
EVERYTHING SCALES, BUT
THERE ARE LIMITS
THE BILL IS WHAT?!
You can’t
“lift and shift” your way into
Serverless
Thisapplies to yourculture, technology and process
How CloudZero Does It
CloudZero’s
Culture and
Practice
• Culture
• Dev and Ops are one
• Failure is guaranteed
• Always 5 min from
production
• A well designed
system is a cost
effective one
• Practice
• Automate everything
• Ensure SLOs are
aligned with SLAs
• Dynamically monitor
(we eat our own
dogfood)
• Understand system
limits and plan
accordingly
• Track cost as a first
class operational
performance metric
Full disclosure some of these are a work in progress
Automate Everything
• We use Serverless Framework or AWS SAM for packaging and deployment
• Serverless used to be the only game in town
• SAM has made huge improvements
• Stackery.io is looking very interesting (and supports SAM)
• If you are starting from scratch today, start with SAM
• Semaphore for CI/CD
• Works so well we wrote a blog post on it
• https://www.cloudzero.com/blog/continuous-delivery-in-the-world-of-serverless
Dynamically
monitor
We are 100% eating our own
dogfood here
Understand Serverless Limits
• Scaling is built in but, Serverless systems have limits and constraints.
• You will hit them once you are in prod under heavy customer load
• It can be very very hard to figure out when the limits are being hit in a large
system with many moving parts. Here are just a few examples:
• Maximum number of concurrent
executions per AWS account
(1000, changeable)
• Immediate Concurrency Increase
(500 or more per min, depends on
region, fixed)
AWS Lambda API Gateway
• Integration timeout (29
sec max, fixed)
• Max Payload size (10mb,
fixed)
• S3 will asynchronously
call Lambda
• Lambda polls DynamoDB
Streams only once per
second, per shard
Invocation Limits
Monitor your completesystem cost
Don’t just watch your Lambda bill, it is just one part of
a Serverless system
CloudWatch Logs
$1.79$15
$0.89
$789!!!
$12
LambdaCost: $1.79
TotalSystemCost: $818.68
System Costs Per Day
Site reliability in the serverless age  - Serverless Boston Meetup
Serverless is going to cause a new DevOps
Tribe to emerge
Source: https://twitter.com/swardley/status/1024107922203111424
Simon Wardley
“Donottellme theDevOpscommunityisn't
fragmentingintotwotribes...oh, yesitis... PnA vs
PnC...I'veaddeda thirdbaselinegroupPn B,who
aren'tDevOpsbutgivea usefulcontrolsample.”
Survey Size: 389
Existing DevOps “Tribe” New FinDevOps “Tribe”
Existing DevOps “Tribe”
Source: https://twitter.com/swardley/status/1024107922203111424
103 respondents
New FinDevOps “Tribe”
Source: https://twitter.com/swardley/status/1024107922203111424
207 respondents
ThankYou!
•Lets continue the conversation
•erik@cloudzero.com
•@silvexis

More Related Content

PPTX
Brisbane DevOps Meetup - Reinvent 2015
PDF
Colorado Cloud Foundry Meeting
PDF
Intro to OpenStack - Scott Sanchez and Niki Acosta
PDF
Taking Gliffy to the Cloud – Moving to Atlassian Connect - Mike Cialowicz
PPTX
The cloud and all that jazz
PDF
Best practices deploying Sitecore to Microsoft Azure
PDF
Enterprise Serverless Adoption. An Experience Report
PDF
Getting to Cloud Nine: Container, Cloud and Serverless Migration Strategies
Brisbane DevOps Meetup - Reinvent 2015
Colorado Cloud Foundry Meeting
Intro to OpenStack - Scott Sanchez and Niki Acosta
Taking Gliffy to the Cloud – Moving to Atlassian Connect - Mike Cialowicz
The cloud and all that jazz
Best practices deploying Sitecore to Microsoft Azure
Enterprise Serverless Adoption. An Experience Report
Getting to Cloud Nine: Container, Cloud and Serverless Migration Strategies

What's hot (19)

PDF
Amazon Redshift (February 2016)
PPTX
Keynote TIAD Camp Serverless
PDF
Serverless is the future... or is it?
PPTX
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
PPTX
Campus days Azure HDInsight automation
PPTX
7 ways for data teams to save money in azure
PDF
Aws community day Bay Area 2019 Introduction - ACDBA19
PDF
The Journey from Monolith to Microservices: a Guided Adventure
PPTX
Aws developer meetup 24 feb-18 noida
PPTX
The Benefits of a Public Cloud: Why You Really Can't Build a Better One
PPTX
Serverless Code Deployments in AWS
PPTX
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
PDF
Choosing the right messaging service for your serverless app [with lumigo]
PPTX
MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scal...
PDF
How to Grow a Serverless Team
PDF
Ryan Brown - Open Community
PPTX
How and why test Azure Front Door with AWS Lambda & PowerShell? | Osman Sahin...
PPTX
Operationnal challenges behind Serverless architectures by Laurent Bernaille
PPTX
Devcon 2018 118
Amazon Redshift (February 2016)
Keynote TIAD Camp Serverless
Serverless is the future... or is it?
Aws Multi-Account, Self-Healing, Self-Bootstrapping DevOps Pipelines
Campus days Azure HDInsight automation
7 ways for data teams to save money in azure
Aws community day Bay Area 2019 Introduction - ACDBA19
The Journey from Monolith to Microservices: a Guided Adventure
Aws developer meetup 24 feb-18 noida
The Benefits of a Public Cloud: Why You Really Can't Build a Better One
Serverless Code Deployments in AWS
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
Choosing the right messaging service for your serverless app [with lumigo]
MongoDB World 2018: Replatforming: Switching to MongoDB for Flexibility, Scal...
How to Grow a Serverless Team
Ryan Brown - Open Community
How and why test Azure Front Door with AWS Lambda & PowerShell? | Osman Sahin...
Operationnal challenges behind Serverless architectures by Laurent Bernaille
Devcon 2018 118
Ad

Similar to Site reliability in the serverless age - Serverless Boston Meetup (20)

PDF
DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...
PDF
Site reliability in the Serverless age - Serverless Boston 2019
PPTX
Serverless Real-time Tracking & Analysis
PPTX
Serverless Toronto helps Startups
PDF
Enabling your DevOps culture with AWS-webinar
PDF
Best Practices for Web Infrastructure on Amazon Web Services
PPT
Dave Nielsen - the economically unstoppable cloud
PDF
Introduction-to-Cloud-Computing.pdf
PDF
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
PDF
estrat AWS Cloud Breakfast
PDF
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
PDF
Tech Talk on Cloud Computing
PPTX
Concurrency at Scale: Evolution to Micro-Services
PPTX
Grokking microservices in 5 minutes
PPTX
Scale net apps in aws
PPTX
Scale net apps in aws
PPTX
Scaling Systems: Architectures that grow
PPTX
Serverless operations for the iRobot fleet
PPT
Cloud Computing by Team Go Getters
PPTX
Azure Functions Real World Examples
DevOpsDays Houston 2019 - Erik Peterson - FinDevOps: Site Reliability in the ...
Site reliability in the Serverless age - Serverless Boston 2019
Serverless Real-time Tracking & Analysis
Serverless Toronto helps Startups
Enabling your DevOps culture with AWS-webinar
Best Practices for Web Infrastructure on Amazon Web Services
Dave Nielsen - the economically unstoppable cloud
Introduction-to-Cloud-Computing.pdf
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
estrat AWS Cloud Breakfast
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Tech Talk on Cloud Computing
Concurrency at Scale: Evolution to Micro-Services
Grokking microservices in 5 minutes
Scale net apps in aws
Scale net apps in aws
Scaling Systems: Architectures that grow
Serverless operations for the iRobot fleet
Cloud Computing by Team Go Getters
Azure Functions Real World Examples
Ad

Recently uploaded (20)

PPTX
Artificial_Intelligence_Basics use in our daily life
PPTX
WEEK 15.pptx WEEK 15.pptx WEEK 15.pptx WEEK 15.pptx
PDF
Paper: World Game (s) Great Redesign.pdf
PDF
healthwealthtech4all-blogspot-com-2025-08-top-5-tech-innovations-that-will-ht...
PPTX
Viva Digitally Software-Defined Wide Area Network.pptx
PPTX
Concepts of Object Oriented Programming.
PDF
Lesson.-Reporting-and-Sharing-of-Findings.pdf
PPTX
北安普顿大学毕业证UoN成绩单GPA修改北安普顿大学i20学历认证文凭
PDF
ilide.info-huawei-odn-solution-introduction-pdf-pr_a17152ead66ea2617ffbd01e8c...
PPTX
ECO SAFE AI - SUSTAINABLE SAFE AND HOME HUB
PPTX
Networking2-LECTURE2 this is our lessons
PPTX
Slides World Games Great Redesign Eco Economic Epochs.pptx
PDF
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
PPTX
Partner to Customer - Sales Presentation_V23.01.pptx
PDF
How Technology Shapes Our Information Age
PPTX
在线订购名古屋艺术大学毕业证, buy NUA diploma学历认证失败怎么办
PPTX
Introduction: Living in the IT ERA.pptx
PDF
Computer Networking, Internet, Casting in Network
DOCX
MLS 113 Medical Parasitology (LECTURE).docx
PPTX
COPD_Management_Exacerbation_Detailed_Placeholders.pptx
Artificial_Intelligence_Basics use in our daily life
WEEK 15.pptx WEEK 15.pptx WEEK 15.pptx WEEK 15.pptx
Paper: World Game (s) Great Redesign.pdf
healthwealthtech4all-blogspot-com-2025-08-top-5-tech-innovations-that-will-ht...
Viva Digitally Software-Defined Wide Area Network.pptx
Concepts of Object Oriented Programming.
Lesson.-Reporting-and-Sharing-of-Findings.pdf
北安普顿大学毕业证UoN成绩单GPA修改北安普顿大学i20学历认证文凭
ilide.info-huawei-odn-solution-introduction-pdf-pr_a17152ead66ea2617ffbd01e8c...
ECO SAFE AI - SUSTAINABLE SAFE AND HOME HUB
Networking2-LECTURE2 this is our lessons
Slides World Games Great Redesign Eco Economic Epochs.pptx
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
Partner to Customer - Sales Presentation_V23.01.pptx
How Technology Shapes Our Information Age
在线订购名古屋艺术大学毕业证, buy NUA diploma学历认证失败怎么办
Introduction: Living in the IT ERA.pptx
Computer Networking, Internet, Casting in Network
MLS 113 Medical Parasitology (LECTURE).docx
COPD_Management_Exacerbation_Detailed_Placeholders.pptx

Site reliability in the serverless age - Serverless Boston Meetup

  • 1. Site Reliability in the Serverless Age Erik Peterson CEO & Founder CloudZero [email protected] | @silvexis Serverless Boston Meetup | 9/18/2018
  • 2. About Me Erik Peterson [email protected], @silvexis • CEO and Founder of CloudZero • I’m recovering from the application security industry, now 100% focused on Cloud and Serverless • Have been building systems on AWS since 2008 • Previously • Veracode • HP, SPI Dynamics, Sanctum • United Nations IAEA, US Department of State, SunTrust, Moody’s Investors • Fun fact: I’ve lived in 6 US states and 3 countries
  • 3. About CloudZero Actionable Intelligence for Serverless Systems • Dynamically map your environment, automatically discover resources, relationships and data flows • Easily identify errors and bottlenecks • Track system costs and identify cost anomalies within hours, not days or months • Agentless deployment, requires only AWS data sources like X-ray, CloudTrail and CloudWatch
  • 4. What is Serverless? What is Reliability? How does Serverless affect Reliability? How CloudZero Operates The Future
  • 5. WHAT IS SERVERLESS? • Event driven • Invisible infrastructure • Automatically scales with usage • Fault tolerance and high availability built in • Never pay for idle
  • 7. Serverless is not just Functions as a Service But FaaS is one of its most important building blocks
  • 8. Serverless is a Spectrum (AWS edition) 0% 100%50% More ServerlessLess Serverless
  • 11. Reliability is the trustworthiness of a system’s ability to delight the customer
  • 12. Two forces exist today that drive reliability • DevOps (culture) • Eliminate Dev and Ops silos • Accept failure as normal • Driven to achieve the fastest feature velocity • Measure everything • Site Reliability Engineering (practice) • Automate everything • Manage to Service Level Objectives • Monitor what matters • Forecast demand and manage capacity • Use resources efficiently
  • 13. How does Serverless affect these forces? Hint: Change is coming
  • 14. Serverless effect on DevOps • Eliminate Dev and Ops silos • Accept failure as normal • Driven to achieve the fastest feature velocity • Measure everything • Cost effective systems are well built systems REQUIRED REQUIRED CHANGE NOW HAPPENING FASTER THAN YOUR CAN KEEP UP WITH BUILT IN, BUT WE ARE DROWNING IN DATA WAIT WHAT? WHAT ARE COGS? NEW
  • 15. Serverless effect on Site Reliability Engineering • Automate everything • Manage to Service Level Objectives • Monitor what matters • Forecast demand and manage capacity • Use resources efficiently REQUIRED I KNOW MY SLO, BUT WHAT IS MY SLA? MY DASHBOARDS CAN’T KEEP UP EVERYTHING SCALES, BUT THERE ARE LIMITS THE BILL IS WHAT?!
  • 16. You can’t “lift and shift” your way into Serverless Thisapplies to yourculture, technology and process
  • 18. CloudZero’s Culture and Practice • Culture • Dev and Ops are one • Failure is guaranteed • Always 5 min from production • A well designed system is a cost effective one • Practice • Automate everything • Ensure SLOs are aligned with SLAs • Dynamically monitor (we eat our own dogfood) • Understand system limits and plan accordingly • Track cost as a first class operational performance metric Full disclosure some of these are a work in progress
  • 19. Automate Everything • We use Serverless Framework or AWS SAM for packaging and deployment • Serverless used to be the only game in town • SAM has made huge improvements • Stackery.io is looking very interesting (and supports SAM) • If you are starting from scratch today, start with SAM • Semaphore for CI/CD • Works so well we wrote a blog post on it • https://www.cloudzero.com/blog/continuous-delivery-in-the-world-of-serverless
  • 20. Dynamically monitor We are 100% eating our own dogfood here
  • 21. Understand Serverless Limits • Scaling is built in but, Serverless systems have limits and constraints. • You will hit them once you are in prod under heavy customer load • It can be very very hard to figure out when the limits are being hit in a large system with many moving parts. Here are just a few examples: • Maximum number of concurrent executions per AWS account (1000, changeable) • Immediate Concurrency Increase (500 or more per min, depends on region, fixed) AWS Lambda API Gateway • Integration timeout (29 sec max, fixed) • Max Payload size (10mb, fixed) • S3 will asynchronously call Lambda • Lambda polls DynamoDB Streams only once per second, per shard Invocation Limits
  • 22. Monitor your completesystem cost Don’t just watch your Lambda bill, it is just one part of a Serverless system CloudWatch Logs $1.79$15 $0.89 $789!!! $12 LambdaCost: $1.79 TotalSystemCost: $818.68 System Costs Per Day
  • 24. Serverless is going to cause a new DevOps Tribe to emerge Source: https://twitter.com/swardley/status/1024107922203111424 Simon Wardley “Donottellme theDevOpscommunityisn't fragmentingintotwotribes...oh, yesitis... PnA vs PnC...I'veaddeda thirdbaselinegroupPn B,who aren'tDevOpsbutgivea usefulcontrolsample.” Survey Size: 389 Existing DevOps “Tribe” New FinDevOps “Tribe”
  • 25. Existing DevOps “Tribe” Source: https://twitter.com/swardley/status/1024107922203111424 103 respondents
  • 26. New FinDevOps “Tribe” Source: https://twitter.com/swardley/status/1024107922203111424 207 respondents
  • 27. ThankYou! •Lets continue the conversation •[email protected] •@silvexis