Application Metrics
with Prometheus examples
Rafael Dohms @rdohms
 Principal Architect @
How do you approach
metrics?
“The Prometheus
Scientist Method”
Application Metrics - IPC2023
I hope not.
Rafael Dohms
Principal Architect
rdohms
doh.ms 

Let’s talk about metrics.
But let’s do it with a
concrete example.
Kafka / DDD / Autonomous Microservices / Monitoring
2019
Kafka / DDD / Autonomous Microservices / Monitoring
2019
Kafka / DDD / Autonomous Microservices / Monitoring
2019
Metrics are insights into
the current state of your
application.
Metrics tell you if your
service is healthy.
Metrics tell you what
is wrong.
Metrics tell you what
is right.
Metrics tell you what
will soon be wrong.
Metrics tell you where
to start looking.
Site Reliability Engineering
SLIs

SLOs

SLAs

SLIs

Service Level Indicators
“A quantitative measure of some
aspect of your application”
The response time of a request was 150ms
Source: Site Reliability Engineering - O’Reilly
SLOs

Service Level Objectives
“A target value or a range of values
for something measured by an SLI”
Request response times should be below 200ms
Source: Site Reliability Engineering - O’Reilly
Help you drive architectural
decisions, like optimisation
SLOs

Response time SLO: 150 ms
95th Percentile of Processing time (PHP time): 5ms
As a result we decided to invest more time in exploring the problem
domain and not optimising our stack.
SLAs

Service Level Agreements
“An explicit or implicit contract with
your customer,that includes
consequences of missing their SLOs”
The 99th percentile of requests response times should meet our SLO,or we
will refund users
Source: Site Reliability Engineering - O’Reilly
Measuring
–Etsy Engineering
“If it moves, we track it.”
https://codeascraft.com/2011/02/15/measure-anything-measure-everything/
Metrics
Statistics
What is happening right
now?
How often does this happen?
Telemetry
Telemetry
“the process of recording and transmitting the readings of an instrument”
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quantities”
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quantities”
I really miss Ayrton Senna
Statistics / Analytics
“the practice of collecting and analysing numerical data in large quantities”
Statistics
Incoming feedback items
with origin information
Telemetry
response time of public
endpoints
“If it moves, we track it.”
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Incoming Data
Peak frequency
CPU
Memory
Disk Space
Bandwith
node
PHP
NginX
Database
Request Latency
System Throughput
Error Rate
Availability
Resource Usage
“If it moves, we track it.”
Incoming Data
Peak frequency
CPU
Memory
Disk Space
Bandwith
node
PHP
NginX
Database
Measure Monitoring
Measure measurements
Metrics,Everywhere.


 



 




SLIs


 



 




Picking good SLIs
SLIs may change
according to who is
looking at the data.
Understanding the
nature of your system
User-Facing
serving system?
availability,throughput,latency
Storage System?
availability,durability,latency
Big Data Systems?
throughput,end-to-end latency
User-Facing and Big Data Systems
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per step
- Data counting: how many,what nature
User-Facing and Big Data Systems
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per step
- Data counting: how many,what nature
User-Facing and Big Data Systems
More relevant to
development team
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per step
- Data counting: how many,what nature
๏Other Metrics
- node,nginx,php-fpm,java metrics
- server metrics: cpu,memory,disk space
- Size of cluster
- Kafka health
User-Facing and Big Data Systems
More relevant to
development team
๏SLIs
- Response time in the“receive”endpoint
- Turn around time,from“receive” to“show”.
- Individual processing time per step
- Data counting: how many,what nature
๏Other Metrics
- node,nginx,php-fpm,java metrics
- server metrics: cpu,memory,disk space
- Size of cluster
- Kafka health
User-Facing and Big Data Systems
More relevant to
development team
More relevant to
Infrastructure team
Picking Targets
Target value
SLI value >= target
Target Range
lower bound <= SLI value <= upper bound
Don’t pick a target based
on current performance
What is the business need?
What are users trying to achieve?
How much impact does it have on the user experience?
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
  

“Immediate"
“We sell as
real time”
“500ms,too
much HTML“
“I don’t know”
How long can it take between the user clicking
submit and a confirmation that our servers
received the data?
  

“Immediate"
“We sell as
real time”
“500ms,too
much HTML“
“I don’t know”
What is human perception of
immediate? 100ms
Collection API should respond within 150ms
Some, but not too many.
can you settle an argument
or choose a priority based on it?
Don’t over achieve.
The Chubby example.
Adapt. Evolve.
re-define SLO’s as your product evolves.
Meeting Expectations.
Attach consequences
to your Objectives.
The night is dark and
full of loopholes.
take a friend from legal with you.
Safety Margins.
like setting the alarm 5 minutes before the meeting.
Metrics in Practice.
prometheus.io




Push Model
scale this!






Pull Model
scale this!
Prometheus
Telemetry Statistics
Prometheus
StatsD,InfluxDB,etc…
+
Long Term Storage
Gauge
Histogram
Counter Summary
Cumulative
metric the
represents a
single number
that only
increases
Samples and
count of
observations
over time
A counter,that
can go up or
down
Same as a
histogram but
with stream of
quantiles over a
sliding window.


   
promphp/prometheus_client_php
jimdo
endclothing
lkaemmerling
 

reads from /metrics
reads from local storage
writes to local storage
your code
/metrics
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
APC / APCu
Redis
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
namespace
metric name
help
label names
buckets
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
measurement
label values
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
namespace
metric name
help
labels
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
<?php
use PrometheusCounter;
use PrometheusHistogram;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
$histogram = new Histogram(
$adapter,
'my_app',
'response_time_ms',
'This measures ....',
['status', 'url'],
[0, 10, 50, 100]
);
$histogram->observe(15, ['200', '/url']);
$counter = new Counter($adapter, 'my_app', 'count_total',
'How many...', ['status', 'url']);
$counter->inc(['200', '/url']);
$counter->incBy(5, ['200', '/url']);
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
$renderer = new RenderTextFormat();
$result = $renderer->render($adapter->collect());
echo $result;
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
$renderer = new RenderTextFormat();
$result = $renderer->render($adapter->collect());
echo $result;
<?php
use PrometheusRenderTextFormat;
use PrometheusStorageAPC;
require_once 'vendor/autoload.php';
$adapter = new APC();
# HELP my_app_count_total How many...
# TYPE my_app_count_total counter
my_app_count_total{status="200",url="/url"} 6
# HELP my_app_response_time_ms This measures ....
# TYPE my_app_response_time_ms histogram
my_app_response_time_ms_bucket{status="200",url="/url",le="0"} 0
my_app_response_time_ms_bucket{status="200",url="/url",le="10"} 0
my_app_response_time_ms_bucket{status="200",url="/url",le="50"} 1
my_app_response_time_ms_bucket{status="200",url="/url",le="100"} 1
my_app_response_time_ms_bucket{status="200",url="/url",le="+Inf"} 1
my_app_response_time_ms_count{status="200",url="/url"} 1
my_app_response_time_ms_sum{status="200",url="/url"} 16
$renderer = new RenderTextFormat();
$result = $renderer->render($adapter->collect());
echo $result;
–Also Rafael (today)
“I’ll just try this live demo
again.”
http://localhost:9090/graph http://localhost:8180/metrics

–Rafael (yesterday)
“Demos always fail.”
http://localhost:8180/index

https://github.com/rdohms/talk-app-metrics

You can’t act on what
you can’t see.
Application Metrics - IPC2023
Application Metrics - IPC2023
Metrics without
actionability are just
numbers on a screen.
Act as soon as an
SLO is threatened .
Thank you.
@rdohms
http://slides.doh.ms https://shirts.doh.ms

More Related Content

PDF
Application metrics with Prometheus - DPC18
PDF
Application Metrics (with Prometheus examples) #PHPDD18
PDF
Application Metrics (with Prometheus examples)
PDF
Application metrics - Confoo 2019
PDF
Observability foundations in dynamically evolving architectures
PPTX
The journy to real time analytics
PDF
Big data on_aws in korea by abhishek sinha (lunch and learn)
PPTX
Apache Spark Streaming -Real time web server log analytics
Application metrics with Prometheus - DPC18
Application Metrics (with Prometheus examples) #PHPDD18
Application Metrics (with Prometheus examples)
Application metrics - Confoo 2019
Observability foundations in dynamically evolving architectures
The journy to real time analytics
Big data on_aws in korea by abhishek sinha (lunch and learn)
Apache Spark Streaming -Real time web server log analytics

Similar to Application Metrics - IPC2023 (19)

PDF
AI at Scale in Enterprises
PDF
The "Ops" Side of DevSecOps
PPTX
5 Years Of Building SaaS On AWS
PPTX
Application Security at DevOps Speed and Portfolio Scale
PPTX
Deep Dive: AWS X-Ray London Summit 2017
PPTX
Thing you didn't know you could do in Spark
PDF
Elasticsearch in Netflix
PDF
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
PDF
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
PDF
Amazon Web Services Security
PPTX
SaaS Vs On Premise BI
PPT
Coradiant
PPTX
Hello Streams Overview
PDF
Apache Eagle at Hadoop Summit 2016 San Jose
PDF
Apache Eagle: Secure Hadoop in Real Time
PDF
Machine Data Analytics
PPTX
Prometheus - Open Source Forum Japan
PPTX
How to stop fingerpointing when your application is down
PPTX
Webinar: How Microsoft is changing the game with Windows Azure
AI at Scale in Enterprises
The "Ops" Side of DevSecOps
5 Years Of Building SaaS On AWS
Application Security at DevOps Speed and Portfolio Scale
Deep Dive: AWS X-Ray London Summit 2017
Thing you didn't know you could do in Spark
Elasticsearch in Netflix
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Amazon Web Services Security
SaaS Vs On Premise BI
Coradiant
Hello Streams Overview
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle: Secure Hadoop in Real Time
Machine Data Analytics
Prometheus - Open Source Forum Japan
How to stop fingerpointing when your application is down
Webinar: How Microsoft is changing the game with Windows Azure
Ad

More from Rafael Dohms (20)

PDF
The Individual Contributor Path - DPC2024
PDF
How'd we get here? A guide to Architectural Decision Records
PDF
Architectural Decision Records - PHPConfBR
PDF
Writing code you won’t hate tomorrow - PHPCE18
PDF
“Writing code that lasts” … or writing code you won’t hate tomorrow. - PHPKonf
PDF
“Writing code that lasts” … or writing code you won’t hate tomorrow. - PHP Yo...
PDF
Composer The Right Way - 010PHP
PDF
Writing Code That Lasts - #Magento2Seminar, Utrecht
PDF
Composer the Right Way - PHPSRB16
PDF
“Writing code that lasts” … or writing code you won’t hate tomorrow. - #PHPSRB16
PDF
Composer the Right Way - MM16NL
PDF
Composer The Right Way - PHPUGMRN
PDF
Composer the Right Way - PHPBNL16
PDF
“Writing code that lasts” … or writing code you won’t hate tomorrow.
PDF
A Journey into your Lizard Brain - PHP Conference Brasil 2015
PDF
“Writing code that lasts” … or writing code you won’t hate tomorrow.
PDF
“Writing code that lasts” … or writing code you won’t hate tomorrow.
PDF
“Writing code that lasts” … or writing code you won’t hate tomorrow.
PDF
Journey into your Lizard Brain - PHPJHB15
PDF
Composer The Right Way #PHPjhb15
The Individual Contributor Path - DPC2024
How'd we get here? A guide to Architectural Decision Records
Architectural Decision Records - PHPConfBR
Writing code you won’t hate tomorrow - PHPCE18
“Writing code that lasts” … or writing code you won’t hate tomorrow. - PHPKonf
“Writing code that lasts” … or writing code you won’t hate tomorrow. - PHP Yo...
Composer The Right Way - 010PHP
Writing Code That Lasts - #Magento2Seminar, Utrecht
Composer the Right Way - PHPSRB16
“Writing code that lasts” … or writing code you won’t hate tomorrow. - #PHPSRB16
Composer the Right Way - MM16NL
Composer The Right Way - PHPUGMRN
Composer the Right Way - PHPBNL16
“Writing code that lasts” … or writing code you won’t hate tomorrow.
A Journey into your Lizard Brain - PHP Conference Brasil 2015
“Writing code that lasts” … or writing code you won’t hate tomorrow.
“Writing code that lasts” … or writing code you won’t hate tomorrow.
“Writing code that lasts” … or writing code you won’t hate tomorrow.
Journey into your Lizard Brain - PHPJHB15
Composer The Right Way #PHPjhb15
Ad

Recently uploaded (20)

PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
STKI Israel Market Study 2025 version august
PDF
Five Habits of High-Impact Board Members
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPT
What is a Computer? Input Devices /output devices
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPT
Geologic Time for studying geology for geologist
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
Statistics on Ai - sourced from AIPRM.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Zenith AI: Advanced Artificial Intelligence
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
A contest of sentiment analysis: k-nearest neighbor versus neural network
Benefits of Physical activity for teenagers.pptx
STKI Israel Market Study 2025 version august
Five Habits of High-Impact Board Members
Getting started with AI Agents and Multi-Agent Systems
TEXTILE technology diploma scope and career opportunities
Improvisation in detection of pomegranate leaf disease using transfer learni...
Consumable AI The What, Why & How for Small Teams.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Flame analysis and combustion estimation using large language and vision assi...
Convolutional neural network based encoder-decoder for efficient real-time ob...
What is a Computer? Input Devices /output devices
sustainability-14-14877-v2.pddhzftheheeeee
Custom Battery Pack Design Considerations for Performance and Safety
Geologic Time for studying geology for geologist
Build Your First AI Agent with UiPath.pptx
Statistics on Ai - sourced from AIPRM.pdf
CloudStack 4.21: First Look Webinar slides
Zenith AI: Advanced Artificial Intelligence

Application Metrics - IPC2023