0% found this document useful (0 votes)

4 views

3. SRE-Practical work 3 Monitoring and Alerting Setup

This document outlines a practical work assignment focused on setting up monitoring and alerting using Prometheus and Grafana. It includes detailed tasks for installing and configuring both tools, creating dashboards, setting up alerts, and expanding monitoring capabilities with Node Exporter. Additionally, it covers advanced topics such as querying in Prometheus, dynamic dashboarding in Grafana, and backup and restore procedures.

Uploaded by

endofetta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

3. SRE-Practical work 3 Monitoring and Alerting Setup

Uploaded by

endofetta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Practical work 3 - Monitoring and Alerting Setup

This practical work focuses on setting up monitoring and alerting using tools like
Prometheus and Grafana. Students will implement monitoring for a sample application, create
dashboards, and set up alerts to understand the application's performance and potential issues.

Practice Instruction:
• For those who have not completed tasks from practice 1-2, the first four (4)
tasks, please start with them.
• For those who have already successfully completed the initial four tasks, please
begin with Task 5.

Software requirements:
• Installation rights on a computer to configure tools such as Prometheus,
Grafana, Node Exporter, etc.
• A code editor (for example, Visual Studio Code, Atom) for editing
configuration files and scripts.

Documentation:
• Make sure that all settings are documented, including configuration changes, so
that they can be reproduced or fixed in the future.
• It is useful to work in teams or study groups(2 students) to share
ideasandsolutions.
• Always monitor system resources when installing and launching new software
to ensure optimal performance.Prepare a task completion report with
screenshots and description
• Write conclusions about what you have learned new for yourself in this practical
work

Task 1: Install and Configure Prometheus

1. Objective: Install Prometheus and verify its web UI.
• Download and extract the latest version of Prometheus.
• Run Prometheus with the default configuration.
• Visit http://localhost:9090 to verify Prometheus is running.
2. Objective: Configure Prometheus to scrape a target.
• Edit prometheus.yml to add a new scrape target.
• Reload Prometheus configuration.
• Visit http://localhost:9090/targets to verify the target is being scraped.

Task 2: Install and Configure Grafana

1. Objective: Install Grafana and verify its web UI.
• Install Grafana using the appropriate method for your OS.
• Start Grafana and verify it's running at http://localhost:3000.
2. Objective: Add Prometheus as a data source in Grafana.
• Log into Grafana using the default credentials (admin/admin).
• Add a new data source, selecting Prometheus.
• Configure the data source to point to your Prometheus instance at
http://localhost:9090.
Task 3: Create a Grafana Dashboard
1. Objective: Create a new dashboard and add a panel.
• Create a new dashboard in Grafana.
• Add a panel that displays the up-time of your Prometheus instance.
• Adjust the time range and refresh interval for your dashboard.
2. Objective: Import a pre-existing dashboard.
• Visit the Grafana dashboards page at https://grafana.com/grafana/dashboards.
• Search for a popular Prometheus dashboard, e.g., "Node Exporter Full".
• Import this dashboard into your Grafana instance and select the Prometheus data
source.

Task 4: Setup Alerts in Grafana

1. Create an alert rule for a panel.
Objective: Understand how to use Grafana's built-in alerting system to monitor specific
conditions on a dashboard panel.
Steps:
a. Edit a panel in your dashboard.
Navigate to your desired dashboard.
Choose the panel you want to attach an alert to.
Click on the panel title and select "Edit" from the dropdown menu.
b. Navigate to the "Alert" section.
Inside the panel editor, find the "Alert" tab.
Click on "Create Alert".
c. Set up a rule.
Name your alert rule for easy identification.
Set evaluation intervals.
Define conditions: For example, if you are monitoring Prometheus uptime (using the
metric up), you can set a condition like avg() of query(A, 5m, now) is below 0.9. This checks
if the average uptime over the last 5 minutes is below 90%, which would indicate Prometheus
was down for part of that time.
d. Test the alert rule.
Once you have defined the condition, click on the "Test Rule" button. Grafana will
evaluate the rule against the recent data and show if the alert would be firing or not.
2. Configure a notification channel.
Objective: Ensure Grafana can notify you or your team when specific conditions are
met.
Steps:
a. Set up a notification channel in Grafana.
From the Grafana side menu, click on the bell icon (🔔) which represents the alerting
menu.
Select "Notification channels" and click on "Add channel".
Choose your preferred notification method. Grafana supports various channels such as
Email, Slack, Webhook, and others.
For example, for Slack:
Name the channel (e.g., "Team Slack Alerts").
Type should be Slack.
In the URL, provide the incoming webhook URL from your Slack workspace.
Optionally, mention channel, user, or add an icon.
b. Test the notification channel.
Once all details are filled out, click on the "Test" button. This will send a test
notification to the channel you have configured.
Check the target channel (e.g., Slack or Email) to verify you've received the test alert.
Note: To effectively use the alerting system, ensure that the alert is linked to the
notification channel. In the panel's alert settings, under the Notifications section, select the
notification channel you created. This way, when the alert is triggered, it will send a
notification to the configured channel.

Task 5: Expand Monitoring Capabilities

Task 5: Expand Monitoring Capabilities
1. Install and configure Node Exporter.
Objective: Broaden your monitoring scope by capturing OS and hardware metrics using
Node Exporter.
Steps:
a. Download and run Node Exporter on your machine.
- Navigate to the official Prometheus download page and get the appropriate
Node Exporter binary for your OS.
- Extract the downloaded tarball.
- Navigate to the extracted directory and run Node Exporter:
bash
./node_exporter
b. Configure Prometheus to scrape metrics from Node Exporter.
- Edit your prometheus.yml configuration file.
- Add a new job under the scrape_configs section:
yaml
Copy code
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100']

Save the file and reload/restart Prometheus for changes to take effect.

c. Create or import a Grafana dashboard for Node Exporter metrics.

- Navigate to your Grafana instance.
- You can manually create a new dashboard and add panels for Node Exporter
metrics (e.g., node_cpu_seconds_total, node_memory_MemAvailable_bytes,
etc.)
- Alternatively, use the Grafana dashboard repository to find a pre-made Node
Exporter dashboard. Once found, import it into your Grafana instance.

2. Set up an alert based on Node Exporter metrics.

Objective: Ensure proactive monitoring by setting alerts on critical OS or hardware
metrics.
Steps:
a. Choose a critical metric from Node Exporter.
- Metrics like node_cpu_seconds_total, node_memory_MemFree_bytes, or
node_filesystem_free_bytes are common candidates.
- For this example, let's consider high CPU usage, represented by the metric: 100
- (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

b. Create an alert in Grafana based on the chosen metric.

- Navigate to (or create) the panel that visualizes the chosen metric.
- Click on the panel title and select "Edit".
- Go to the "Alert" tab and create a new alert.
- Define a condition, e.g., alert when the average CPU usage over the past 5
minutes is greater than 90%.
- Under the "Notifications" section, select your notification channel (created in
the previous task).
- Save the dashboard.
By following these steps, you'll be well-prepared to monitor OS and hardware metrics
using Node Exporter and Grafana. This is a critical aspect of observability, allowing for timely
response to system issues and proactive infrastructure management.

Task 6: Querying and Exploration in Prometheus

Objective 1: Familiarize yourself with Prometheus' query language (PromQL).

Instructions:
a. Access Prometheus Web UI
- Open your web browser.
- Navigate to http://localhost:9090/graph. This is the default Prometheus UI
where you can execute PromQL queries.
b. Execute Basic Queries
- In the "Expression" input box, type a metric name, for example:
- up: This will show you the uptime of all instances that Prometheus is scraping.
- node_cpu_seconds_total: This provides the total CPU time in seconds.
- Click "Execute" and view the raw metric values below.
c. Use PromQL for Complex Queries
- Explore the usage of functions and operators in PromQL. For instance:
- rate(node_cpu_seconds_total[5m]): This computes the per-second average rate
of time series in the last five minutes.
- Note: rate is particularly useful for metrics like counters which only go up over
time.
Objective 2: Use multi-dimensional data to filter and aggregate results.
Instructions:
a. Use Label Selectors
- You can refine your queries using label selectors. These allow you to filter
metrics based on their associated labels.
- Example: node_cpu_seconds_total{job="node-exporter",
instance="localhost:9100"}. This narrows down the metric to the node-exporter
job from the localhost:9100 instance.
b. Aggregation with PromQL
- PromQL supports various aggregation operators to provide summary
information.
- sum: This sums up data across all provided labels.
- avg: Computes the average across data points.
- Combine aggregation operators with by or without to specify which label
dimensions to consider.
Example: sum(rate(node_cpu_seconds_total[5m])) by (job): This aggregates the CPU
rate for each job separately.

Task 7: Advanced Grafana Dashboarding

Objective 1: Use variables in Grafana for dynamic dashboards

Instructions:
a. Create/Edit a Dashboard
- From the Grafana main menu, click on the "+" icon and select "Dashboard".
- Alternatively, navigate to an existing dashboard that you'd like to edit.
b. Add a Variable
- On the dashboard screen, click on the cogwheel/settings icon on the top, then
select "Variables".
- Click on "New Variable".
- Choose a name for your variable, and for the Type, select "Query".
- Under "Data source", choose "Prometheus".
- In the "Query" box, you can type a request such as {job=~".+"} to fetch all jobs.
- Save your changes.
c. Update a Panel with the Variable
- Return to your dashboard and edit a panel.
- In your metric query, you can now reference the variable by using
$VariableName (replace "VariableName" with the name you gave to your
variable). For instance, if you're looking to filter metrics by job, it could look
like node_cpu_seconds_total{job="$JobName"}.
- Save the panel.
Objective 2: Explore Grafana's transformations and overrides
Instructions:
a. Add a Panel with Multiple Metrics
- Click on "Add Panel" in your dashboard.
- In the query section, add multiple metrics, such as node_cpu_seconds_total and
node_memory_MemAvailable_bytes.
b. Use Transformations
- With the panel still in edit mode, click on the "Transform" tab.
- Explore different transformations such as:
- Reduce: To consolidate a group of series.
- Inner join: To combine multiple queries.
- Add field from calculation: To create a new field based on a calculation between
others.
- For a simple exercise, you can use the "Add field from calculation" to calculate
the percentage of used memory based on total and available memory.
c. Apply Overrides
- Move to the "Overrides" tab in the panel edit mode.
- Click "Add Override".
- You can then specify conditions, like "For field with name" or "For series with
name" and adjust properties like color, display name, etc.
As an example, you can set a different color for CPU and Memory in the same graph.
Task 8: Advanced Alerting and Recording Rules

Objective 1: Set up recording rules in Prometheus.

- Identify a frequently used or complex query in Grafana.
- Configure a recording rule in prometheus.yml to pre-compute and store the
result of this query.
- Verify the new recorded metric in the Prometheus web UI.
Objective 2: Configure multi-stage alerting.
- Create a multi-condition alert in Grafana (e.g., alert on high CPU usage only if
high memory usage is also detected).
- Configure the alert to have multiple levels (e.g., warning and critical) with
different thresholds.

Task 9: Blackbox Monitoring with Prometheus

Objective 1: Install and configure the Blackbox Exporter.

- Download and run the Blackbox Exporter.
- Configure Prometheus to use the Blackbox Exporter to check the availability of
a set of web pages or services.
Objective 2: Create a Grafana panel visualizing the availability and response
times.
- Add panels that display the up status and response times for the targets being
monitored with the Blackbox Exporter.

Task 10: Backup and Restore

Objective 1: Backup your Prometheus data.

- Stop the Prometheus service.
- Create a backup of the Prometheus data directory.
- Start the Prometheus service.
Objective 2: Restore Prometheus from a backup.
- Stop the Prometheus service.
- Restore the data directory from the backup.
- Verify that the data is intact after starting Prometheus.

By completing these advanced tasks, you will not only deepen your understanding of
Prometheus and Grafana's features and capabilities but also be better prepared to tackle real-
world monitoring and alerting challenges. Remember to always refer to official documentation
for detailed configurations and best practices.

S3000 Platform Parts Guide Rev A
No ratings yet
S3000 Platform Parts Guide Rev A
74 pages
Prometheus Ebook v2
75% (4)
Prometheus Ebook v2
231 pages
Turnbull James Monitoring With Prometheus PDF
100% (1)
Turnbull James Monitoring With Prometheus PDF
394 pages
Sap Isu c4c
No ratings yet
Sap Isu c4c
8 pages
Mastering Prometheus & Grafana
No ratings yet
Mastering Prometheus & Grafana
18 pages
Grafana 02
No ratings yet
Grafana 02
6 pages
DevOps Shack _ Comprehensive Monitoring Guide
No ratings yet
DevOps Shack _ Comprehensive Monitoring Guide
41 pages
Assignment 3
No ratings yet
Assignment 3
13 pages
Monitor Health Graf Prom
No ratings yet
Monitor Health Graf Prom
34 pages
Setup of Prometheus, Node Exporter, and Grafana
No ratings yet
Setup of Prometheus, Node Exporter, and Grafana
18 pages
Prometheus Grafana Setup
100% (1)
Prometheus Grafana Setup
5 pages
Prometheus Lab (1)
No ratings yet
Prometheus Lab (1)
4 pages
unit-5
No ratings yet
unit-5
13 pages
(Prometheus & Grafana) Use and Create Own Performance Dashboard
No ratings yet
(Prometheus & Grafana) Use and Create Own Performance Dashboard
10 pages
29 Using Prometheus Alertmanager Node Exporter To Monitor A Companys Geo Distributed Infrastructure
No ratings yet
29 Using Prometheus Alertmanager Node Exporter To Monitor A Companys Geo Distributed Infrastructure
12 pages
MasteringMonitoringwithPrometheusandGrafanae356a4305d8896cf[1]
No ratings yet
MasteringMonitoringwithPrometheusandGrafanae356a4305d8896cf[1]
14 pages
Kubernetes Monitoring With Prometheus Grafana
No ratings yet
Kubernetes Monitoring With Prometheus Grafana
6 pages
grafana_monitoring_guide
No ratings yet
grafana_monitoring_guide
4 pages
Monitoring Ec2 Instance
No ratings yet
Monitoring Ec2 Instance
15 pages
Devo p Monitoring
No ratings yet
Devo p Monitoring
15 pages
Prometheus and Grafana
No ratings yet
Prometheus and Grafana
7 pages
Prometheus Concepts
No ratings yet
Prometheus Concepts
4 pages
Prometheus and Grafana Monitoring Tools 1703260158
No ratings yet
Prometheus and Grafana Monitoring Tools 1703260158
59 pages
Prometheus Course
No ratings yet
Prometheus Course
162 pages
16 - Prometheus Handout
No ratings yet
16 - Prometheus Handout
31 pages
Prometheus and Grafana
No ratings yet
Prometheus and Grafana
6 pages
Prometheus Grafana Helm Argocd
No ratings yet
Prometheus Grafana Helm Argocd
15 pages
Grafana How To
No ratings yet
Grafana How To
4 pages
2019 05 15 Prometheus 101 Continuous Lifecycle Alexander Schwartz
No ratings yet
2019 05 15 Prometheus 101 Continuous Lifecycle Alexander Schwartz
39 pages
prom_qna
No ratings yet
prom_qna
43 pages
Prom Raw All
No ratings yet
Prom Raw All
2 pages
An Introduction To Prometheus: Brian Brazil Founder
No ratings yet
An Introduction To Prometheus: Brian Brazil Founder
42 pages
Devops Ultimate Monitoring Project
No ratings yet
Devops Ultimate Monitoring Project
17 pages
Intro To Prometheus Workshop - Grafana
No ratings yet
Intro To Prometheus Workshop - Grafana
67 pages
Observability - Part 2
No ratings yet
Observability - Part 2
9 pages
House Dzone Refcard 293 Getting Started Prometheus
No ratings yet
House Dzone Refcard 293 Getting Started Prometheus
6 pages
Prometheus
No ratings yet
Prometheus
17 pages
Devo
No ratings yet
Devo
17 pages
Prometheus Part 13 Use Cases
No ratings yet
Prometheus Part 13 Use Cases
24 pages
Prom Notes
No ratings yet
Prom Notes
47 pages
Monotoring Tool
No ratings yet
Monotoring Tool
3 pages
Monitoring
No ratings yet
Monitoring
63 pages
Kubernetes Monitoring Using Prometheus and Grafana
No ratings yet
Kubernetes Monitoring Using Prometheus and Grafana
8 pages
How To Install and Configure Prometheus - Grafana - and Node Exporter - Linkedin
No ratings yet
How To Install and Configure Prometheus - Grafana - and Node Exporter - Linkedin
7 pages
Monitoring Stack Project
No ratings yet
Monitoring Stack Project
25 pages
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
From Everand
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
Mr Troy
No ratings yet
Visualisation Grafana Most Important 20
No ratings yet
Visualisation Grafana Most Important 20
7 pages
16 Monitoring Part4 02
No ratings yet
16 Monitoring Part4 02
5 pages
Booking Confirmation
No ratings yet
Booking Confirmation
56 pages
1735258490619
No ratings yet
1735258490619
18 pages
SRECon EMEA 2017 - Monitoring Cloudflare's Planet-Scale Edge Network With Prometheus
No ratings yet
SRECon EMEA 2017 - Monitoring Cloudflare's Planet-Scale Edge Network With Prometheus
76 pages
14grafana
No ratings yet
14grafana
4 pages
All_MonitoringTools_configurations
No ratings yet
All_MonitoringTools_configurations
5 pages
prometheus_monitor
No ratings yet
prometheus_monitor
10 pages
Comprehensive Plan For Adding Metrics, Thresholds, Alerts, Dashboards, and Monitoring For Colo Server Facilities
No ratings yet
Comprehensive Plan For Adding Metrics, Thresholds, Alerts, Dashboards, and Monitoring For Colo Server Facilities
4 pages
Network Monitoring
No ratings yet
Network Monitoring
8 pages
CH 06
No ratings yet
CH 06
20 pages
16 - Prometheus Checklist
No ratings yet
16 - Prometheus Checklist
9 pages
How To Setup Monitoring On Kubernetes Using Prometheus
No ratings yet
How To Setup Monitoring On Kubernetes Using Prometheus
2 pages
16 - Prometheus (Dark Theme)
No ratings yet
16 - Prometheus (Dark Theme)
10 pages
Prometheus
No ratings yet
Prometheus
34 pages
SESSION6 - Real Time Monitoring - 1
No ratings yet
SESSION6 - Real Time Monitoring - 1
16 pages
HydroCom Assembly and Installation 4.14
No ratings yet
HydroCom Assembly and Installation 4.14
68 pages
AWS Interview Tips PDF
No ratings yet
AWS Interview Tips PDF
3 pages
Light Stripping Force With High Reliability: Easy Handling
No ratings yet
Light Stripping Force With High Reliability: Easy Handling
2 pages
Datasheet Ion 8650
100% (1)
Datasheet Ion 8650
10 pages
US20010024927A1
No ratings yet
US20010024927A1
7 pages
Draft Report (PM - Chakradhar)
No ratings yet
Draft Report (PM - Chakradhar)
9 pages
Broadway Big Band - User's Manual v1.06
No ratings yet
Broadway Big Band - User's Manual v1.06
67 pages
Request For Quotations and Terms of Reference
100% (1)
Request For Quotations and Terms of Reference
7 pages
Agriculture: A Design of An Unmanned Electric Tractor Platform
No ratings yet
Agriculture: A Design of An Unmanned Electric Tractor Platform
19 pages
Fundamentals of Substation Equipment and Control Systems
100% (1)
Fundamentals of Substation Equipment and Control Systems
4 pages
Motosim-Eg VRC - 156225-1CD PDF
No ratings yet
Motosim-Eg VRC - 156225-1CD PDF
300 pages
CG Module 1 OpenGL
No ratings yet
CG Module 1 OpenGL
66 pages
With Edublocks Tra C Lights: Connect The Leds
No ratings yet
With Edublocks Tra C Lights: Connect The Leds
2 pages
Coagulometer BMD M1
No ratings yet
Coagulometer BMD M1
2 pages
Andrew Leahey HIST-501: Term Paper Drexel University
No ratings yet
Andrew Leahey HIST-501: Term Paper Drexel University
12 pages
FlowCAD An Capture PDF Export
No ratings yet
FlowCAD An Capture PDF Export
10 pages
Q&A-Sherry Turkle, Alone Together-02!13!2011
100% (1)
Q&A-Sherry Turkle, Alone Together-02!13!2011
13 pages
Performance Testing:: Jmeter
No ratings yet
Performance Testing:: Jmeter
8 pages
Hochiki Release All
No ratings yet
Hochiki Release All
16 pages
Chapter 1: Introducing Today's Technologies
No ratings yet
Chapter 1: Introducing Today's Technologies
36 pages
Operating Manual: Dive Computer
No ratings yet
Operating Manual: Dive Computer
60 pages
Fire Pump Size Calculation: Project
No ratings yet
Fire Pump Size Calculation: Project
3 pages
Soal QUIZ Komputer Dan Masyarakat 2022
No ratings yet
Soal QUIZ Komputer Dan Masyarakat 2022
4 pages
SPPA-T3000 Cyber Security For I&C Systems: Industrial Control Systems / December, 2019
No ratings yet
SPPA-T3000 Cyber Security For I&C Systems: Industrial Control Systems / December, 2019
25 pages
UL, E358076-EMT Conector y Union
No ratings yet
UL, E358076-EMT Conector y Union
1 page
G401 Manual
No ratings yet
G401 Manual
2 pages
Scheduleug Unix Sas
No ratings yet
Scheduleug Unix Sas
80 pages
DAF LF Gearboxes and Clutch Service Manual
100% (4)
DAF LF Gearboxes and Clutch Service Manual
254 pages

3. SRE-Practical work 3 Monitoring and Alerting Setup

Uploaded by

3. SRE-Practical work 3 Monitoring and Alerting Setup

Uploaded by

Practical work 3 - Monitoring and Alerting Setup

Task 1: Install and Configure Prometheus

Task 2: Install and Configure Grafana

Task 4: Setup Alerts in Grafana

Task 5: Expand Monitoring Capabilities

c. Create or import a Grafana dashboard for Node Exporter metrics.

2. Set up an alert based on Node Exporter metrics.

b. Create an alert in Grafana based on the chosen metric.

Task 6: Querying and Exploration in Prometheus

Task 7: Advanced Grafana Dashboarding

Objective 1: Set up recording rules in Prometheus.

Task 9: Blackbox Monitoring with Prometheus

Objective 1: Install and configure the Blackbox Exporter.

Task 10: Backup and Restore

Objective 1: Backup your Prometheus data.

You might also like