以下是对 perf stat -e cpu-clock ./server
命令输出的逐行解析:
输出内容:
9,524.73 msec cpu-clock # 1.001 CPUs utilized 9.516814366 seconds time elapsed 9.451367000 seconds user 0.054269000 seconds sys
1. 9,524.73 msec cpu-clock
-
含义:
程序实际消耗的 CPU 时间总量(用户态 + 内核态) -
单位:毫秒(msec)
-
数值:9524.73 毫秒 ≈ 9.52473 秒
-
解读:
服务器进程在 CPU 上实际执行了约 9.52 秒(不包括等待 I/O 等阻塞时间)
2. # 1.001 CPUs utilized
-
含义:
CPU 利用率指标 -
计算方式:
CPU 利用率 = (cpu-clock 时间 / time elapsed 时间) × CPU 核心数
-
解读:
程序运行时 平均使用了 1.001 个 CPU 核心
(若为 2.0 表示完全利用了两个核心)
3. 9.516814366 seconds time elapsed
-
含义:
程序从启动到结束的 实际物理时间(墙钟时间) -
单位:秒
-
解读:
从执行./server
到进程退出,实际经过了约 9.5168 秒
4. 9.451367000 seconds user
-
含义:
程序在 用户态 消耗的 CPU 时间 -
单位:秒
-
解读:
服务器执行自身代码(业务逻辑、计算等)耗时 9.451 秒
5. 0.054269000 seconds sys
-
含义:
程序在 内核态 消耗的 CPU 时间 -
单位:秒
-
解读:
操作系统内核为服务器提供服务(系统调用、中断处理等)耗时 0.054 秒
关键指标关系:
图表
物理时间
time elapsed 9.516s
cpu-clock 9.524s
user 9.451s
sys 0.054s
-
CPU 占用率 =
(9.52473 / 9.51681) ≈ 100.1%
(表示程序几乎完全占用了 1 个 CPU 核心) -
用户态/内核态比例:
用户态占比 = 9.451 / 9.524 ≈ 99.2%
内核态占比 = 0.054 / 9.524 ≈ 0.8%
性能诊断结论:
-
CPU 密集型服务:
用户态时间占比极高(99.2%),说明服务主要消耗在业务计算上 -
低系统调用开销:
内核态时间仅占 0.8%,表明系统调用(如网络 I/O)效率很高 -
无阻塞等待:
cpu-clock (9.524s) ≈ time elapsed (9.516s)
说明进程几乎没有因 I/O 阻塞而让出 CPU -
单核饱和:
CPU 利用率超过 100%(1.001 核),表明服务受限于单核计算能力
优化建议:
! 需要重点关注用户态代码优化: - 分析热点函数:perf record -g ./server && perf report - 检查算法复杂度 - 减少不必要的计算 - 考虑多线程/进程扩展
其它:
#perf stat -e cpu-clock,task-clock,context-switches,cpu-migrations ./server
[root@localhost ~]# perf -h
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
annotate Read perf.data (created by perf record) and display annotated code
archive Create archive with object files with build-ids found in perf.data file
bench General framework for benchmark suites
buildid-cache Manage build-id cache.
buildid-list List the buildids in a perf.data file
c2c Shared Data C2C/HITM Analyzer.
config Get and set variables in a configuration file.
data Data file related processing
diff Read perf.data files and display the differential profile
evlist List the event names in a perf.data file
ftrace simple wrapper for kernel's ftrace functionality
inject Filter to augment the events stream with additional information
kallsyms Searches running kernel for symbols
kmem Tool to trace/measure kernel memory properties
kvm Tool to trace/measure kvm guest os
list List all symbolic event types
lock Analyze lock events
mem Profile memory accesses
record Run a command and record its profile into perf.data
report Read perf.data (created by perf record) and display the profile
sched Tool to trace/measure scheduler properties (latencies)
script Read perf.data (created by perf record) and display trace output
stat Run a command and gather performance counter statistics
test Runs sanity tests.
timechart Tool to visualize total system behavior during a workload
top System profiling tool.
probe Define new dynamic tracepoints
trace strace inspired tool
See 'perf help COMMAND' for more information on a specific command.
[root@localhost ~]#
[root@localhost ~]# perf stat -h
Usage: perf stat [<options>] [<command>]
-a, --all-cpus system-wide collection from all CPUs
-A, --no-aggr disable CPU count aggregation
-B, --big-num print large numbers with thousands' separators
-C, --cpu <cpu> list of cpus to monitor in system-wide
-c, --scale scale/normalize counters
-D, --delay <n> ms to wait before starting measurement after program start
-d, --detailed detailed run - start a lot of events
-e, --event <event> event selector. use 'perf list' to list available events
-G, --cgroup <name> monitor event in cgroup name only
-g, --group put the counters into a counter group
-I, --interval-print <n>
print counts at regular interval in ms (overhead is possible for values <= 100ms)
-i, --no-inherit child tasks do not inherit counters
-M, --metrics <metric/metric group list>
monitor specified metrics or metric groups (separated by ,)
-n, --null null run - dont start any counters
-o, --output <file> output file name
-p, --pid <pid> stat events on existing process id
-r, --repeat <n> repeat command and print average + stddev (max: 100, forever: 0)
-S, --sync call sync() before starting a run
-t, --tid <tid> stat events on existing thread id
-T, --transaction hardware transaction statistics
-v, --verbose be more verbose (show counter open errors, etc)
-x, --field-separator <separator>
print counts with custom separator
--append append to the output file
--filter <filter>
event filter
--interval-clear clear screen in between new interval
--interval-count <n>
print counts for fixed number of times
--log-fd <n> log output to fd, instead of stderr
--metric-only Only print computed metrics. No raw values
--no-merge Do not merge identical named events
--per-core aggregate counts per physical processor core
--per-socket aggregate counts per processor socket
--per-thread aggregate counts per thread
--post <command> command to run after to the measured command
--pre <command> command to run prior to the measured command
--smi-cost measure SMI cost
--table display details about each run (only with -r option)
--timeout <n> stop workload and print counts after a timeout period in ms (>= 10ms)
--topdown measure topdown level 1 statistics
(END)
-D, --delay <n> ms to wait before starting measurement after program start
-d, --detailed detailed run - start a lot of events
-e, --event <event> event selector. use 'perf list' to list available events
-G, --cgroup <name> monitor event in cgroup name only
-g, --group put the counters into a counter group
-I, --interval-print <n>
print counts at regular interval in ms (overhead is possible for values <= 100ms)
-i, --no-inherit child tasks do not inherit counters
-M, --metrics <metric/metric group list>
monitor specified metrics or metric groups (separated by ,)
-n, --null null run - dont start any counters
-o, --output <file> output file name
-p, --pid <pid> stat events on existing process id
-r, --repeat <n> repeat command and print average + stddev (max: 100, forever: 0)
-S, --sync call sync() before starting a run
-t, --tid <tid> stat events on existing thread id
-T, --transaction hardware transaction statistics
-v, --verbose be more verbose (show counter open errors, etc)
-x, --field-separator <separator>
print counts with custom separator
--append append to the output file
--filter <filter>
event filter
--interval-clear clear screen in between new interval
--interval-count <n>
print counts for fixed number of times
--log-fd <n> log output to fd, instead of stderr
--metric-only Only print computed metrics. No raw values
--no-merge Do not merge identical named events
--per-core aggregate counts per physical processor core
--per-socket aggregate counts per processor socket
--per-thread aggregate counts per thread
--post <command> command to run after to the measured command
--pre <command> command to run prior to the measured command
--smi-cost measure SMI cost
--table display details about each run (only with -r option)
--timeout <n> stop workload and print counts after a timeout period in ms (>= 10ms)
--topdown measure topdown level 1 statistics
(END)
-D, --delay <n> ms to wait before starting measurement after program start
-d, --detailed detailed run - start a lot of events
-e, --event <event> event selector. use 'perf list' to list available events
-G, --cgroup <name> monitor event in cgroup name only
-g, --group put the counters into a counter group
-I, --interval-print <n>
print counts at regular interval in ms (overhead is possible for values <= 100ms)
-i, --no-inherit child tasks do not inherit counters
-M, --metrics <metric/metric group list>
monitor specified metrics or metric groups (separated by ,)
-n, --null null run - dont start any counters
-o, --output <file> output file name
-p, --pid <pid> stat events on existing process id
-r, --repeat <n> repeat command and print average + stddev (max: 100, forever: 0)
-S, --sync call sync() before starting a run
-t, --tid <tid> stat events on existing thread id
-T, --transaction hardware transaction statistics
-v, --verbose be more verbose (show counter open errors, etc)
-x, --field-separator <separator>
print counts with custom separator
--append append to the output file
--filter <filter>
event filter
--interval-clear clear screen in between new interval
--interval-count <n>
print counts for fixed number of times
--log-fd <n> log output to fd, instead of stderr
--metric-only Only print computed metrics. No raw values
--no-merge Do not merge identical named events
--per-core aggregate counts per physical processor core
--per-socket aggregate counts per processor socket
--per-thread aggregate counts per thread
--post <command> command to run after to the measured command
--pre <command> command to run prior to the measured command
--smi-cost measure SMI cost
--table display details about each run (only with -r option)
--timeout <n> stop workload and print counts after a timeout period in ms (>= 10ms)
--topdown measure topdown level 1 statistics
[root@localhost ~]# perf stat -e
Error: switch `e' requires a value
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
[root@localhost ~]#
[root@localhost ~]# perf list | more
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-store-misses [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
armv8_pmuv3_0/br_mis_pred/ [Kernel PMU event]
armv8_pmuv3_0/br_mis_pred_retired/ [Kernel PMU event]
armv8_pmuv3_0/br_pred/ [Kernel PMU event]
armv8_pmuv3_0/br_retired/ [Kernel PMU event]
armv8_pmuv3_0/br_return_retired/ [Kernel PMU event]
armv8_pmuv3_0/bus_access/ [Kernel PMU event]
armv8_pmuv3_0/bus_cycles/ [Kernel PMU event]
armv8_pmuv3_0/cid_write_retired/ [Kernel PMU event]
armv8_pmuv3_0/cpu_cycles/ [Kernel PMU event]
armv8_pmuv3_0/dtlb_walk/ [Kernel PMU event]
armv8_pmuv3_0/exc_return/ [Kernel PMU event]
armv8_pmuv3_0/exc_taken/ [Kernel PMU event]
armv8_pmuv3_0/inst_retired/ [Kernel PMU event]
armv8_pmuv3_0/inst_spec/ [Kernel PMU event]
armv8_pmuv3_0/itlb_walk/ [Kernel PMU event]
armv8_pmuv3_0/l1d_cache/ [Kernel PMU event]
armv8_pmuv3_0/l1d_cache_refill/ [Kernel PMU event]
armv8_pmuv3_0/l1d_cache_wb/ [Kernel PMU event]
armv8_pmuv3_0/l1d_tlb/ [Kernel PMU event]
armv8_pmuv3_0/l1d_tlb_refill/ [Kernel PMU event]
armv8_pmuv3_0/l1i_cache/ [Kernel PMU event]
armv8_pmuv3_0/l1i_cache_refill/ [Kernel PMU event]
armv8_pmuv3_0/l1i_tlb/ [Kernel PMU event]
armv8_pmuv3_0/l1i_tlb_refill/ [Kernel PMU event]
armv8_pmuv3_0/l2d_cache/ [Kernel PMU event]
armv8_pmuv3_0/l2d_cache_refill/ [Kernel PMU event]
armv8_pmuv3_0/l2d_cache_wb/ [Kernel PMU event]
armv8_pmuv3_0/l2d_tlb/ [Kernel PMU event]
armv8_pmuv3_0/l2d_tlb_refill/ [Kernel PMU event]
armv8_pmuv3_0/l2i_cache/ [Kernel PMU event]
armv8_pmuv3_0/l2i_cache_refill/ [Kernel PMU event]
armv8_pmuv3_0/l2i_tlb/ [Kernel PMU event]
armv8_pmuv3_0/l2i_tlb_refill/ [Kernel PMU event]
armv8_pmuv3_0/ll_cache/ [Kernel PMU event]