普罗米修斯运维，Prometheus 安装部署与配置解析，收藏这篇就够了

原创于 2025-09-12 11:39:39 发布 · 709 阅读

16 ·

CC 4.0 BY-SA版权

文章标签：

#运维 #prometheus

计算机同时被 3 个专栏收录

1298 篇文章

订阅专栏

程序员

1298 篇文章

订阅专栏

网络安全

1136 篇文章

订阅专栏

Prometheus 是一个开源的系统监控，它通过 HTTP 协议或取系统指标数据并将这些数据存储在本地的时间序列数据库中。Prometheus 内置 PromQL方便查询存储指标数据，支持创建图表和警报规则。

Prometheus Server
：负责指标抓取、存储、查询及告警触发。
Exporter
：暴露被监控服务的指标接口（如 Node Exporter,MySQL Exporter等）。

1. 安装 Prometheus

软件下载地址：

https://prometheus.io/download/

1.1 下载 Prometheus

我用的版本比较老这里仅作参考，建议下载新版本二进制包

# 下载 Prometheuswget https://github.com/prometheus/prometheus/releases/download/v2.3.0/prometheus-2.3.0.linux-amd64.tar.gz# 解压文件tar -zxvf prometheus-2.3.0.linux-amd64.tar.gzcd prometheus-2.3.0.linux-amd64

1.2 目录结构

prometheus
:启动执行文件
promtool
: 命令行工具，用于验证配置文件和规则
prometheus.yml
: 主配置文件
rules&target
: 需要手动创建存放告警规则文件和监控node的配置文件

ll rules/

ll target/node/
consoles/
: 存放 Web 控制台模板

1.3 启动 Prometheus

# 启动 Prometheusnohup ./prometheus --config.file=prometheus.yml \--web.enable-lifecycle \--web.listen-address=192.168.1.139:8001 &

Prometheus 默认端口9090我这里用的是8001，浏览器访问地址

http://192.168.1.139:8001

2、Node Exporter 安装（监控主机）

下载地址：

cd /opt/soft/#解压 tar -zxvf  node_exporter-1.3.1.linux-amd64.tar.gz#进入安装目录 cd /opt/soft/node_exporter-1.3.1.linux-amd64#启动 （collector.textfile.directory 存放自定义指标数据最后会有使用说明）nohup ./node_exporter --collector.textfile.directory=./key &

访问 `http://127.0.0.1:9100/metrics`，若返回类似以下内容，说明 Node Exporter 已正常工作：

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.# TYPE go_gc_duration_seconds summarygo_gc_duration_seconds{quantile="0"} 1.6425e-05go_gc_duration_seconds{quantile="0.25"} 3.4856e-05go_gc_duration_seconds{quantile="0.5"} 5.8672e-05go_gc_duration_seconds{quantile="0.75"} 8.4572e-05go_gc_duration_seconds{quantile="1"} 0.000452457go_gc_duration_seconds_sum 12.595550358go_gc_duration_seconds_count 185069# HELP go_goroutines Number of goroutines that currently exist.# TYPE go_goroutines gaugego_goroutines 7# HELP go_info Information about the Go environment.

下面我们开始解析配置和使用Prometheus

3. 配置 Prometheus

Prometheus 的配置文件是 YAML 格式的

3.1 配置解析

全局配置

# my global configglobal:  scrape_interval:     15s # 抓取（scrape）间隔时间， 默认是1分钟  evaluation_interval: 15s # 规则评估间隔默认也是 1 分钟

Alertmanager配置

# Alertmanager configurationalerting:  alertmanagers:  - static_configs:    - targets:      - 127.0.0.1:9093 #这里配置Alertmanager的地址

3.2 规则配置
rule files是支持自定义的，为了清晰区分不同的告警规则，可以不同的规则文件
作用：定义触发报警的规则条件，prometheus 会加载rules下的所有规则文件里定义的规则条件

# 加载规则一次，并根据全局“evaluation_interval”定期对其进行评估rule_files:    - /data/prometheus-2.3.0.linux-amd64/rules/*.rules     - /data/prometheus-2.3.0.linux-amd64/rules/*.yml#*.yml、*.rules代表所有此后缀的文件如1.rules、2.rules

3.3 端点以及被监控的node_export配置
为了方便维护，可以以不同的项目，或者按业务类型，创建不同的目录为每个应用创建单独的配置文件

#需要注意下metrics_path，node_export默认是/metrics可以不写，这里例子监控的是java后端,java接入prometheus指标端点是

/actuator/prometheus

所以就配置的/actuator/prometheus

# 获取指标数据就这样访问

“http://127.0.0.1:8080/actuator/prometheus”

- job_name: 'application'  #这个任务下的指标都会自动加一个标签 `job="application"`      metrics_path: /actuator/prometheus  #file_sd_configs文件服务发现"机制来动态发现监控目标  #优点：新增或者关闭监控目标不用重启Prometheus      file_sd_configs:   - files: #定义被监控的node_export配置文件存放路径    - "/data/app/monitor/prometheus-2.3.0.linux-amd64/target/app/*.json"    refresh_interval: 6s # 文件发现服务刷新时间

3.4 以下是一份完整的配置文件

global:  scrape_interval:     15s   evaluation_interval: 15s

alerting:  alertmanagers:  - static_configs:    - targets:      - 127.0.0.1:9093
rule_files:    - /data/prometheus-2.3.0.linux-amd64/rules/*.rules     - /data/prometheus-2.3.0.linux-amd64/rules/*.ymlscrape_configs:  #prometheus自己  - job_name: 'prometheus'
    static_configs:    - targets: ['192.168.1.139:8001']      labels:          instance: prometheus  - job_name: 'node' #存放node-export的配置    file_sd_configs:    - files:      - "/data/prometheus-2.3.0.linux-amd64/target/node/*.json"      refresh_interval: 6s  - job_name: 'application' #后端业务应用的配置     metrics_path: /actuator/prometheus    file_sd_configs:    - files:      - "/data/prometheus-2.3.0.linux-amd64/target/application/*.json"      refresh_interval: 6s

3.5 在服务端增加被监控端的配置文件

#targets 目标主机地址#labels 监控目标主机的标签，结合altermannger通知模版这些标签可以展示到告警通知的内容里 [root@test node]# cat 192.168.1.139.json[  {    "targets":  ["192.168.1.139:9100"],     "labels": {        "env": "test",        "servicename": "测试",        "hostname": "测试机"  }}]创建好后将文件放在/data/prometheus-2.3.0.linux-amd64/target/application/下，prometheus定时扫描会自动加载被监控端的配置

`访问http://192.168.1.139:8001, 看到192.168.1.139:9100已经加过来了`

3.6 规则文件（这是一个自定义的规则例子结合脚本实现的监控）

[root@test rules]# cat zidingyi.rules groups: # 规则组- name: port # 规则组的名称  - alert: nexus(私服) #告警名称    expr: nexus == 0   # for: 1m #持续多久后发送 # 不设置持续时间，默认为0会直接 Inactive 状态转换成 Firing状态，然后触发警报    labels:      severity: "紧急"    annotations: #信息      summary: "端口不通"      description: "模版测试收到请忽略"#将文件zidingyi.rules放到/data/prometheus-2.3.0.linux-amd64/rules/#路径下

这里我们就以监控nexus为例子在

在node_exporter端自定义脚本

cat /opt/soft/node_exporter-1.3.1.linux-amd64/key/key_runner.sh#!/bin/bashecho "nexus" `netstat -tunlp|grep 8082|wc -l` #当8082不存在时会打印0  #我们监控规则里定义 expr: nexus == 0 时发送通知

增加crontab定时任务

*/1 * * * * /bin/bash /opt/soft/node_exporter-1.3.1.linux-amd64/key/key_runner.sh >/opt/soft/node_exporter-1.3.1.linux-amd64/key/key.prom

这里需注意我们用脚本自定义指标数据写入key.prom，格式是

nexus 1 或者 nexus 0 #告警规则里会根据0 1区分是否发送通知

node_export会收集启动时要指定collector.textfile.directory 注：本文内容为技术经验总结仅供参考。实际部署时，请结合自身业务场景、环境特性及安全要求调整配置，操作前建议备份关键数据并在测试环境验证

35岁+运维人员的发展与出路

经常有人问我：干网工、干运维多年遇瓶颈，想学点新技术给自己涨涨“身价”，应该怎么选择？

聪明人早已经用脚投票：近年来，越来越多运维的朋友寻找新的职业发展机会，将目光聚焦到了网络安全产业。

1、为什么我建议你学习网络安全？

有一种技术人才：华为阿里平安等大厂抢着要，甚至高薪难求——白帽黑客。白帽黑客，就是网络安全卫士，他们“低调”行事，同时“身价”不菲。

根据腾讯安全发布的《互联网安全报告》，目前中国**网络安全岗位缺口已达70万，缺口高达95%。**而与网络安全人才需求量逐年递增局面相反的是，每年高校安全专业培养人才仅有3万余人，很多企业却一“将”难求，网络安全人才供应严重匮乏。

这种供求不平衡直接反映在安全工程师的薪资上，简单来说就是：竞争压力小，薪资还很高。

而且安全行业就业非常灵活，既可以就职一家公司从事信息安全维护和研究，也可以当作兼职或成为自由职业者，给SRC平台提交漏洞获取奖金等等。

随着国家和政府的强监管需求，一线城市安全行业近年来已经发展的相当成熟工作机会非常多，二三线城市安全也在逐步得到重视未来将有巨大缺口。

作为运维人员，这几年对于安全的技能要求也将不断提高，现阶段做好未来2到3年的技术储备，有非常大的必要性

2、运维转型成为网络安全工程师，是不是很容易？

运维转安全，因为本身有很好的Linux基础，相对于其他人来说，确实有一定的优势，入门会快一些。
系统管理经验
运维对服务器、网络架构的深度理解，可直接迁移到安全防护场景。例如，熟悉Linux/Windows系统漏洞修补、权限管控，能快速上手安全加固工作。
网络协议与架构知识
运维日常接触TCP/IP、路由协议等，有助于分析网络攻击路径（如DDoS防御、流量异常检测）。
自动化与脚本能力
运维常用的Shell/Python脚本技能，可无缝衔接安全工具开发（如自动化渗透脚本、日志分析工具）。
平滑过渡方向
从安全运维切入，逐步学习渗透测试、漏洞挖掘等技能，利用现有运维经验快速上手。
学习资源丰富
可复用运维工具（如ELK日志分析、Ansible自动化）与安全工具（如Nessus、Metasploit）结合学习，降低转型成本。

3. 转型可以挖漏洞搞副业获取收益挖SRC漏洞