有关CPU利用率计算的两个公式
使用PromQL计算CPU利用率,目前网上有两种说法。第一种(rate和irate均可,一个反映区间值,一个反映瞬时值,新版本指标名为node_cpu_seconds_total
),计算单核:
1 - rate(node_cpu{mode="idle"}[5m])
计算节点,取各核平均值:
avg(1 - rate(node_cpu{mode="idle"}[5m])) by (instance)
第二种,计算单核:
1 - sum(increase(node_cpu{mode="idle"}[1m])) by (cpu,instance) / sum(increase(node_cpu[1m])) by (cpu,instance)
计算节点:
1 - sum(increase(node_cpu{mode="idle"}[1m])) by (instance) / sum(increase(node_cpu[1m])) by (instance)
为什么使用irate/rate计算CPU利用率结果是不准确/错误的
在实际使用中我们会发现,当节点的CPU实际利用率较低时,使用第一种公式计算出的CPU利用率与实际相差较大,这是因为该公式存在逻辑上的错误。以1 - rate(node_cpu{mode="idle"}[5m])
,该公式计算逻辑为1 - 5分钟内CPU idle时间/CPU总运行时间5分钟
,即假定CPU 5分钟内处于各状态的时间之和为5m。
我们使用一台4核空闲机器进行测试,top
显示节点与单核CPU利用率约在1%左右,此时计算sum(increase(node_cpu[5m])) by (cpu)
,可见5分钟内各状态时间之和约为50-60s:
使用chaosd
加压(chaosd attack stress cpu -l 50 -w 4
),top
显示节点与单核CPU利用率约在75-85%左右,此时计算sum(increase(node_cpu[5m])) by (cpu)
,可见5分钟内各状态时间之和约为180-190s:
继续加压(chaosd attack stress cpu -l 90 -w 4
),top
显示节点与单核CPU利用率约在92-93%左右,此时计算sum(increase(node_cpu[5m])) by (cpu)
,可见5分钟内各状态时间之和约为280-290s:
由此可见,只有在CPU利用率较高时,CPU各状态之和才近似等于节点运行时间(此问题发生在grub启动参数/boot/grub2/grub.cfg
中添加了nohz=off
的场景中,可通过cat /proc/cmdline
查看,如未添加该参数,irate/rate
仍可以得到正确值)。
结论
使用irate/rate计算CPU利用率是不准确的,且CPU利用率越低计算结果越不准确。
因此在使用node_exporter计算CPU利用率时,因采用第二种方法。
node_exporter
CPU指标的数值来源为/proc/stat
# https://man7.org/linux/man-pages/man5/proc.5.html
/proc/stat
kernel/system statistics. Varies with architecture.
Common entries include:
cpu 10132153 290696 3084719 46828483 16683 0 25195 0
175628 0
cpu0 1393280 32966 572056 13343292 6130 0 17875 0 23933 0
The amount of time, measured in units of USER_HZ
(1/100ths of a second on most architectures, use
sysconf(_SC_CLK_TCK) to obtain the right value),
that the system ("cpu" line) or the specific CPU
("cpuN" line) spent in various states:
user (1) Time spent in user mode.
nice (2) Time spent in user mode with low
priority (nice).
system (3) Time spent in system mode.
idle (4) Time spent in the idle task. This value
should be USER_HZ times the second entry in
the /proc/uptime pseudo-file.
iowait (since Linux 2.5.41)
(5) Time waiting for I/O to complete. This
value is not reliable, for the following
reasons:
• The CPU will not wait for I/O to
complete; iowait is the time that a task
is waiting for I/O to complete. When a
CPU goes into idle state for outstanding
task I/O, another task will be scheduled
on this CPU.
• On a multi-core CPU, the task waiting for
I/O to complete is not running on any
CPU, so the iowait of each CPU is
difficult to calculate.
• The value in this field may decrease in
certain conditions.
irq (since Linux 2.6.0)
(6) Time servicing interrupts.
softirq (since Linux 2.6.0)
(7) Time servicing softirqs.
steal (since Linux 2.6.11)
(8) Stolen time, which is the time spent in
other operating systems when running in a
virtualized environment
guest (since Linux 2.6.24)
(9) Time spent running a virtual CPU for
guest operating systems under the control of
the Linux kernel.
guest_nice (since Linux 2.6.33)
(10) Time spent running a niced guest
(virtual CPU for guest operating systems
under the control of the Linux kernel).