Prometheus—生产实践总结(原生)

Prometheus—生产实战(原生)

在这里插入图片描述
监控目标:
k8s、Linux、Windows、数据库、中间件

插件:
黑盒(tcp-exporter、black-exporter)、各种exporter

目录结构

在这里插入图片描述

一、监控目标Target

监控目标
在这里插入图片描述

写法一、file_sd_configs

实际内容不放到主文件, 监控端点放到target目录

  - job_name: 'mysql'
    file_sd_configs:
    - files:
      - targets/mysql/*.yaml
      refresh_interval: 5m
- targets: ['172.17.12.94:9104','172.17.12.96:9104']
  labels:
    environment: 'dev'
- targets: ['172.17.12.75:9104','172.17.12.80:9104']
  labels:
    environment: 'qa'
写法二、static_configs

实际监控端点写入主文件 (❌不推荐, 比较繁琐)

  - job_name: 'mysql_dev'
    static_configs:
      - targets: ['172.17.12.94:9104','172.17.12.96:9104']
        labels:
          environment: 'dev'
  - job_name: 'mysql_qa'
    static_configs:
      - targets: ['172.17.12.75:9104','172.17.12.80:9104']
        labels:
          environment: 'qa'
写法三、k8s监控kubernetes_sd_configs
  - job_name: 'k8s_pro_cadvisor'
    metrics_path: /metrics
    scheme: https
    kubernetes_sd_configs:
    - role: node
      api_server: https://192.168.44.245:6443
      bearer_token_file: /opt/prometheus/prometheus-2.28.1/token/k8s_pro
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/prometheus-2.28.1/token/k8s_pro
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace]
      target_label: environment
      replacement: prod
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.*)
    - action: replace
      regex: (.*)
      source_labels: ["__address__"]
      target_label: __address__
      replacement: 192.168.44.245:6443
    - action: replace
      source_labels: [__meta_kubernetes_node_name]
      target_label: __metrics_path__
      regex: (.*)
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: container_tasks_state|container_fs_(.*)|go_(.*),container_memory_failures_total,container_threads(.*),container_memory_cache,container_memory_failcnt,container_memory_mapped_file,container_memory_max_usage_bytes,container_memory_rss,container_memory_swap,container_spec_(.*)
      action: drop

  - job_name: 'k8s_pro_kubelet'
    metrics_path: /metrics
    scheme: https
    kubernetes_sd_configs:
    - role: node
      api_server: https://192.168.44.245:6443
      bearer_token_file: /opt/prometheus/prometheus-2.28.1/token/k8s_pro
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/prometheus-2.28.1/token/k8s_pro
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace]
      target_label: environment
      replacement: prod
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.*)
    - action: replace
      regex: (.*)
      source_labels: ["__address__"]
      target_label: __address__
      replacement: 192.168.44.245:6443
    - action: replace
      source_labels: [__meta_kubernetes_node_name]
      target_label: __metrics_path__
      regex: (.*)
      replacement: /api/v1/nodes/${1}/proxy/metrics
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: rest_client_request_duration_seconds_bucket|storage_operation_duration_seconds_bucket|go_(.*)
      action: drop

  - job_name: 'k8s_pro_exporter'
    metrics_path: /metrics
    scheme: http
    kubernetes_sd_configs:
    - role: node
      api_server: https://192.168.44.245:6443
      bearer_token_file: /opt/prometheus/prometheus-2.28.1/token/k8s_pro
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/prometheus-2.28.1/token/k8s_pro
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace]
      target_label: environment
      replacement: prod
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.*)
    - action: replace
      source_labels: ["__meta_kubernetes_node_address_InternalIP"]
      target_label: __address__
      replacement: ${1}:9100
    metric_relabel_configs:
    - source_labels: [__name__]
      regex: go_(.*)
      action: drop

  - job_name: 'kube-state-metrics-prod'
    kubernetes_sd_configs:
    - role: endpoints
      api_server: https://192.168.44.245:6443
      bearer_token_file: /opt/prometheus/prometheus-2.28.1/token/k8s_pro
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/prometheus-2.28.1/token/k8s_pro
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_pod_name]
      action: keep
      regex: kube-system;kube-state-metrics;.*
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
      action: replace
      target_label: job
      replacement: kube-state-metrics
    - source_labels: [__meta_kubernetes_namespace]
      target_label: environment
      replacement: prod
[root@BETAWS32 prometheus-2.28.1]# cat /opt/prometheus/prometheus-2.28.1/token/k8s_pro
eyJhbxxxxxxU5wQxxxxxUUifQ.exxxxxxxxxxxx

二、配置告警rules

rule_files:
  - "rules/*.yaml"
groups:
- name: k8s_pods
  rules:
  - alert: 容器cpu使用率
    expr: sum by (pod,instance,environment) (rate(container_cpu_usage_seconds_total{container!="istio-proxy",container!="POD",namespace=~"beta.*",image!=""}[3m]))/(sum by (pod,instance,environment) (container_spec_cpu_quota{container!="istio-proxy",container!="POD",image!="",namespace=~"beta.*"}) / 100000) > 0.90
    for: 0m
    labels:
      severity: warning
      team: operations
    annotations:
      description: "容器{{ $labels.namespace }}/{{ $labels.pod }} CPU使用率超过90%, 当前使用率{{ $value }}"
      summary: "容器{{ $labels.namespace }}/{{ $labels.pod }} CPU使用率超过90%, 当前使用率{{ $value }}"

告警配置

在这里插入图片描述

覆盖范围

alertmanger:

1、linux 主机 - cpu - mem - net - volume - io - process

2、windows 主机 - cpu - mem - net - volume - io - process - services - iis

3、consul - servicecheck - mastercheck - agentcheck

4、pod - cpu - mem - io - hpa - deployment - statefulset - daemonset - health - crash - kube certificate

5、envoy - cluster upstream

6、 istio

7、coredns

8、blackbox - http check - check slow - ssl certificate

9、数据库: mysql、redis、sqlserver—没覆盖、mongodb

热更新

curl -X PUT localhost:9090/-/reload

静默

在这里插入图片描述

资源

  • prometheus.yml
  • weixin.tmpl
  • alertmanager.yml
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值