0% found this document useful (0 votes)
4 views

convert (3)

The document outlines various alert configurations for a Grafana dashboard related to UAT KSA, detailing metrics such as active connections, memory usage, CPU usage, and broker request handling. Each alert has specific conditions and thresholds defined for monitoring system performance, utilizing Prometheus as the data source. The document includes expressions and parameters for evaluating these metrics over specified time intervals.

Uploaded by

vandan26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

convert (3)

The document outlines various alert configurations for a Grafana dashboard related to UAT KSA, detailing metrics such as active connections, memory usage, CPU usage, and broker request handling. Each alert has specific conditions and thresholds defined for monitoring system performance, utilizing Prometheus as the data source. The document includes expressions and parameters for evaluating these metrics over specified time intervals.

Uploaded by

vandan26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 20

apiVersion

orgId name folder interval


uid

1 1 Alert window 2 min Grafana UAT KSA_Alerts_sre 2m ee58qzp7z59tse

fe58t1aufqqkgd

fe58tc75dnitca

de58u7skh6eiod

fe58v3ekxu0owe

de58vsh1losn4e

de58vxa3nfvggc

be58tx5m3l3i8e

ae58s499927lse

be58rxfxu1x4wc

1 Alerts Grafana UAT KSA_Alerts_sre 1m ce58r8zfqk8w0d

ee58rerabwdmoe

ce58sg51i1t6oa

fe58sk6nmyku8b

ae58sphia6800c

ae58tilbptgxsa

be58tnkqqrgu8e

de58vfehwrzeof
ee58vhzeqt81sa

ae6dywgt2kv0ge
relativeTi
meRang
title condition e
refId datasourceUid
from to

UAT KSA | Active Connections > 30000 C A 7200 0 prometheus


B 7200 0 __expr__
C 7200 0 __expr__
UAT KSA | JVM Memory Usage > 5 GB C A 7200 0 prometheus
C 7200 0 __expr__
UAT KSA | CPU Usage > 5 sec C A 7200 0 prometheus
C 7200 0 __expr__
UAT KSA | Bytes In per Sec - Cluster Level > 70Mb/s C A 7200 0 prometheus
B 7200 0 __expr__
C 7200 0 __expr__
UAT KSA | Bytes out / Sec - Cluster Level > 1.2gb/s C B 7200 0 prometheus
A 7200 0 __expr__
C 7200 0 __expr__
UAT KSA broker request handler > 80% ( Warning ) C A 7200 0 prometheus
C 7200 0 __expr__
Critical | UAT KSA | broker request handler > 90% C A 7200 0 prometheus
C 7200 0 __expr__
UAT KSA | Fetch follow latency per broker > 1 B D 7200 0 prometheus
B 7200 0 __expr__
UAT KSA | Total time fetch Latency > 1sec B D 7200 0 prometheus
B 600 0 __expr__
UAT KSA | Total Time Produce Latency > 1sec B D 7200 0 prometheus
B 7200 0 __expr__
UAT KSA | Brokers Online > 6 C A 7200 0 prometheus
B 0 0 __expr__
C 0 0 __expr__
UAT KSA | Active Controllers<1 C A 7200 0 prometheus
B 0 0 __expr__
C 0 0 __expr__
UAT KSA | Failed Produce Request Per Broker > 0 C A 7200 0 prometheus
B 7200 0 __expr__
C 7200 0 __expr__
UAT KSA | Failed Fetch Request Per Broker > 0 C A 7200 0 prometheus
B 7200 0 __expr__
C 7200 0 __expr__
UAT KSA | ZooKeeper Request Latency > 500ms C A 7200 0 prometheus
C 7200 0 __expr__
UAT KSA | ZooKeeper Disconnects Per Sec > 0 C A 7200 0 prometheus
C 7200 0 __expr__
UAT KSA | ZooKeeper Expires Per Sec>0 C A 7200 0 prometheus
C 7200 0 __expr__
UAT KSA | Under Replicated Partitions >0 C A 7200 0 prometheus
B 0 0 __expr__
C 0 0 __expr__
UAT KSA | Offline Partitions >0 C A 7200 0 prometheus
B 0 0 __expr__
C 0 0 __expr__
UAT KSA Leader Election rate count > 0 C A 600 0 prometheus
C 600 0 __expr__
datasource
disableTextWrap editorMode
type uid
prometheus prometheus 0 code
__expr__ __expr__
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus
__expr__ __expr__
__expr__ __expr__
prometheus prometheus
__expr__ __expr__
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus code
__expr__ __expr__
prometheus prometheus
__expr__ __expr__
__expr__ __expr__
prometheus prometheus
__expr__ __expr__
__expr__ __expr__
code
__expr__ __expr__
expr

sum(kafka_server_socket_server_metrics_connection_count)

sum without(area)(jvm_memory_bytes_used{job="kafka/kafka-external-broker-pm"} )

rate(process_cpu_seconds_total{job=~"kafka/kafka-external-broker-pm"}[5m])

sum (rate(kafka_server_brokertopicmetrics_bytesin_total{job="kafka/kafka-external-broker-pm",topic!=""}[1m]))

sum (rate(kafka_server_brokertopicmetrics_bytesout_total{job="kafka/kafka-external-broker-pm",topic!=""}[1m]))

100 - rate(kafka_server_kafkarequesthandlerpool_requesthandleravgidlepercent_count{job="kafka/kafka-external-broker-pm"}

100 - rate(kafka_server_kafkarequesthandlerpool_requesthandleravgidlepercent_count{job="kafka/kafka-external-broker-pm"}

avg(kafka_network_requestmetrics_totaltimems{request="FetchFollower",job="kafka/kafka-external-broker-pm",quantile="0.99

avg(kafka_network_requestmetrics_totaltimems{request="FetchConsumer",job="kafka/kafka-external-broker-pm",quantile="0.9

avg(kafka_network_requestmetrics_totaltimems{request="Produce",job="kafka/kafka-external-broker-pm",quantile="0.99"})by(

count(kafka_server_replicamanager_leadercount{job=~"kafka/kafka-external-broker-pm"})

sum(kafka_controller_kafkacontroller_activecontrollercount{job=~"kafka/kafka-external-broker-pm"})

sum(irate(kafka_server_brokertopicmetrics_failedproducerequests_total{job=~"kafka/kafka-external-broker-pm"}[1m]))by(instan

sum(irate(kafka_server_brokertopicmetrics_failedfetchrequests_total{job=~"kafka/kafka-external-broker-pm"}[1m]))by(instance

sum(kafka_server_zookeeperclientmetrics_zookeeperrequestlatencyms{job="kafka/kafka-external-broker-pm"})by(instance)

sum (irate(kafka_server_sessionexpirelistener_zookeeperdisconnects_total{job="kafka/kafka-external-broker-pm"}[1m]))by(ins

sum (irate(kafka_server_sessionexpirelistener_zookeeperexpires_total{job="kafka/kafka-external-broker-pm"}[1m]))by(instance

sum(kafka_server_replicamanager_underreplicatedpartitions{job=~"kafka/kafka-external-broker-pm"})
sum(kafka_controller_kafkacontroller_offlinepartitionscount{job=~"kafka/kafka-external-broker-pm"})

sum(irate(kafka_controller_controllerstats_leaderelectionrateandtimems_count{job="kafka/kafka-external-broker-pm",instance=
data
model

fullMetaSearch includeNullMetadata instant interval intervalMs legendFormat

0 0 FALSE 30000 __auto


1000
1000
TRUE 30000 Broker - {{instance}}
1000
TRUE 30000 Broker - {{instance}}
1000
FALSE 30000 Bytes In [{{instance}}]
1000
1000
FALSE 30000 Bytes Out [{{instance}}]
1000
1000
TRUE 30000 Request handler avg {{instance}}
1000
TRUE 30000 Request handler avg {{instance}}
1000
TRUE 30000 Total {{request}} 99th
1000
TRUE 30000 Total {{request}} 99th
1000
TRUE 30000 Total {{request}} 99th
1000
FALSE 60000
1000
1000
60000
1000
1000
FALSE 30000 {{instance}}
1000
1000
FALSE 30000 {{instance}}
1000
1000
TRUE 30000 Broker: {{instance}}
1000
TRUE 30000 Broker - {{instance}}
1000
TRUE 30000 Broker - {{instance}}
1000
60000
1000
1000
60000
1000
1000
TRUE 1000 __auto
1000
groups
rules

conditions
maxDataPoints range refId useBackend evaluator operator query reducer
type
params type type params params type
43200 1A FALSE
43200 B gt and B last query
43200 C 30000 gt and C last query
43200 0A
43200 C 5368709120 gt and C last query
43200 0A
43200 C 5 gt and C last query
43200 0A
43200 B gt and B last query
43200 C 73400320 gt and C last query
43200 0B
43200 A gt and A last query
43200 C 1288490189 gt and C last query
43200 0A
43200 C 80 gt and C last query
43200 0A
43200 C 90 gt and C last query
43200 0D
43200 B 1000 gt and B last query
43200 0D
43200 B 1002 gt and B last query
43200 0D
43200 B 1000 gt and B last query
100 A
43200 B gt and B last query
43200 C 6 lt and C last query
100 A
43200 B gt and B last query
43200 C 1 lt and C last query
43200 0A
43200 B gt and B last query
43200 C 0 gt and C last query
43200 0A
43200 B gt and B last query
43200 C 0 gt and C last query
43200 0A
43200 C 500 gt and C last query
43200 0A
43200 C 0 gt and C last query
43200 0A
43200 C 0 gt and C last query
100 A
43200 B gt and B last query
43200 C 0 gt and C last query
100 A
43200 B gt and B last query
43200 C 0 gt and C last query
43200 0A
43200 C 0 gt and C last query
rules

dashboardUid panelId noDataState


expression reducer type format intervalFactor

5nhADrDWk 692 NoData


A last reduce
B threshold
time_series 1 5nhADrDWk 163 NoData
A threshold
time_series 1 5nhADrDWk 81 NoData
A threshold
time_series 2 5nhADrDWk 694 NoData
A last reduce
B threshold
time_series 2 5nhADrDWk 693 NoData
B last reduce
A threshold
time_series 1 5nhADrDWk 681 NoData
A threshold
time_series 1 5nhADrDWk 681 NoData
A threshold
time_series 1 5nhADrDWk 678 NoData
D threshold
time_series 1 5nhADrDWk 677 NoData
D threshold
time_series 1 5nhADrDWk 192 NoData
D threshold
time_series 1 5nhADrDWk 647 NoData
A last reduce
B threshold
time_series 1 5nhADrDWk 233 NoData
A last reduce
B threshold
time_series 1 5nhADrDWk 612 NoData
A last reduce
B threshold
time_series 1 5nhADrDWk 613 NoData
A last reduce
B threshold
time_series 1 5nhADrDWk 150 NoData
A threshold
time_series 1 5nhADrDWk 138 NoData
A threshold
time_series 1 5nhADrDWk 139 NoData
A threshold
time_series 2 5nhADrDWk 30 NoData
A last reduce
B threshold
time_series 2 5nhADrDWk 126 NoData
A last reduce
B threshold
NoData
A threshold
execErrState for
__dashboardUid__ __panelId__

Error 2m 5nhADrDWk 692

Error 2m 5nhADrDWk 163

Error 2m 5nhADrDWk 81

Error 2m 5nhADrDWk 694

Error 2m 5nhADrDWk 693

Error 2m 5nhADrDWk 681

Error 2m 5nhADrDWk 681

Error 2m 5nhADrDWk 678

Error 2m 5nhADrDWk 677

Error 2m 5nhADrDWk 192

Error 1m 5nhADrDWk 647

Error 1m 5nhADrDWk 233

Error 1m 5nhADrDWk 612

Error 1m 5nhADrDWk 613

Error 1m 5nhADrDWk 150

Error 1m 5nhADrDWk 138

Error 1m 5nhADrDWk 139

Error 1m 5nhADrDWk 30
Error 1m 5nhADrDWk 126

Error 1m
This alert triggers when there is a leader election event in the Kafka cluster. Leader elections can indicate potential instability in
notification_settings

isPaused
summary receiver

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email

0 grafana-default-email
0 grafana-default-email

Attention Needed - Kafka Leader Election Rate Count > 0 0 grafana-default-email

You might also like