Linux 系统性能分析工具集-perf

以下是对 perf stat -e cpu-clock ./server 命令输出的逐行解析:

输出内容:

9,524.73 msec cpu-clock                 #    1.001 CPUs utilized

       9.516814366 seconds time elapsed

       9.451367000 seconds user
       0.054269000 seconds sys

1. 9,524.73 msec cpu-clock

  • 含义
    程序实际消耗的 CPU 时间总量(用户态 + 内核态)

  • 单位:毫秒(msec)

  • 数值:9524.73 毫秒 ≈ 9.52473 秒

  • 解读
    服务器进程在 CPU 上实际执行了约 9.52 秒(不包括等待 I/O 等阻塞时间)


2. # 1.001 CPUs utilized

  • 含义
    CPU 利用率指标

  • 计算方式
    CPU 利用率 = (cpu-clock 时间 / time elapsed 时间) × CPU 核心数

  • 解读
    程序运行时 平均使用了 1.001 个 CPU 核心
    (若为 2.0 表示完全利用了两个核心)


3. 9.516814366 seconds time elapsed

  • 含义
    程序从启动到结束的 实际物理时间(墙钟时间)

  • 单位:秒

  • 解读
    从执行 ./server 到进程退出,实际经过了约 9.5168 秒


4. 9.451367000 seconds user

  • 含义
    程序在 用户态 消耗的 CPU 时间

  • 单位:秒

  • 解读
    服务器执行自身代码(业务逻辑、计算等)耗时 9.451 秒


5. 0.054269000 seconds sys

  • 含义
    程序在 内核态 消耗的 CPU 时间

  • 单位:秒

  • 解读
    操作系统内核为服务器提供服务(系统调用、中断处理等)耗时 0.054 秒


关键指标关系:

图表

物理时间

time elapsed 9.516s

cpu-clock 9.524s

user 9.451s

sys 0.054s

  1. CPU 占用率 = (9.52473 / 9.51681) ≈ 100.1%
    (表示程序几乎完全占用了 1 个 CPU 核心)

  2. 用户态/内核态比例
    用户态占比 = 9.451 / 9.524 ≈ 99.2%
    内核态占比 = 0.054 / 9.524 ≈ 0.8%


性能诊断结论:

  1. CPU 密集型服务
    用户态时间占比极高(99.2%),说明服务主要消耗在业务计算上

  2. 低系统调用开销
    内核态时间仅占 0.8%,表明系统调用(如网络 I/O)效率很高

  3. 无阻塞等待
    cpu-clock (9.524s) ≈ time elapsed (9.516s)
    说明进程几乎没有因 I/O 阻塞而让出 CPU

  4. 单核饱和
    CPU 利用率超过 100%(1.001 核),表明服务受限于单核计算能力


优化建议:

! 需要重点关注用户态代码优化:
  - 分析热点函数:perf record -g ./server && perf report
  - 检查算法复杂度
  - 减少不必要的计算
  - 考虑多线程/进程扩展

其它:

#perf stat -e cpu-clock,task-clock,context-switches,cpu-migrations ./server

[root@localhost ~]# perf -h

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   annotate        Read perf.data (created by perf record) and display annotated code
   archive         Create archive with object files with build-ids found in perf.data file
   bench           General framework for benchmark suites
   buildid-cache   Manage build-id cache.
   buildid-list    List the buildids in a perf.data file
   c2c             Shared Data C2C/HITM Analyzer.
   config          Get and set variables in a configuration file.
   data            Data file related processing
   diff            Read perf.data files and display the differential profile
   evlist          List the event names in a perf.data file
   ftrace          simple wrapper for kernel's ftrace functionality
   inject          Filter to augment the events stream with additional information
   kallsyms        Searches running kernel for symbols
   kmem            Tool to trace/measure kernel memory properties
   kvm             Tool to trace/measure kvm guest os
   list            List all symbolic event types
   lock            Analyze lock events
   mem             Profile memory accesses
   record          Run a command and record its profile into perf.data
   report          Read perf.data (created by perf record) and display the profile
   sched           Tool to trace/measure scheduler properties (latencies)
   script          Read perf.data (created by perf record) and display trace output
   stat            Run a command and gather performance counter statistics
   test            Runs sanity tests.
   timechart       Tool to visualize total system behavior during a workload
   top             System profiling tool.
   probe           Define new dynamic tracepoints
   trace           strace inspired tool

 See 'perf help COMMAND' for more information on a specific command.

[root@localhost ~]# 
[root@localhost ~]# perf stat -h

 Usage: perf stat [<options>] [<command>]

    -a, --all-cpus        system-wide collection from all CPUs
    -A, --no-aggr         disable CPU count aggregation
    -B, --big-num         print large numbers with thousands' separators
    -C, --cpu <cpu>       list of cpus to monitor in system-wide
    -c, --scale           scale/normalize counters
    -D, --delay <n>       ms to wait before starting measurement after program start
    -d, --detailed        detailed run - start a lot of events
    -e, --event <event>   event selector. use 'perf list' to list available events
    -G, --cgroup <name>   monitor event in cgroup name only
    -g, --group           put the counters into a counter group
    -I, --interval-print <n>
                          print counts at regular interval in ms (overhead is possible for values <= 100ms)
    -i, --no-inherit      child tasks do not inherit counters
    -M, --metrics <metric/metric group list>
                          monitor specified metrics or metric groups (separated by ,)
    -n, --null            null run - dont start any counters
    -o, --output <file>   output file name
    -p, --pid <pid>       stat events on existing process id
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)
    -S, --sync            call sync() before starting a run
    -t, --tid <tid>       stat events on existing thread id
    -T, --transaction     hardware transaction statistics
    -v, --verbose         be more verbose (show counter open errors, etc)
    -x, --field-separator <separator>
                          print counts with custom separator
        --append          append to the output file
        --filter <filter>
                          event filter
        --interval-clear  clear screen in between new interval
        --interval-count <n>
                          print counts for fixed number of times
        --log-fd <n>      log output to fd, instead of stderr
        --metric-only     Only print computed metrics. No raw values
        --no-merge        Do not merge identical named events
        --per-core        aggregate counts per physical processor core
        --per-socket      aggregate counts per processor socket
        --per-thread      aggregate counts per thread
        --post <command>  command to run after to the measured command
        --pre <command>   command to run prior to the measured command
        --smi-cost        measure SMI cost
        --table           display details about each run (only with -r option)
        --timeout <n>     stop workload and print counts after a timeout period in ms (>= 10ms)
        --topdown         measure topdown level 1 statistics

(END)
    -D, --delay <n>       ms to wait before starting measurement after program start
    -d, --detailed        detailed run - start a lot of events
    -e, --event <event>   event selector. use 'perf list' to list available events
    -G, --cgroup <name>   monitor event in cgroup name only
    -g, --group           put the counters into a counter group
    -I, --interval-print <n>
                          print counts at regular interval in ms (overhead is possible for values <= 100ms)
    -i, --no-inherit      child tasks do not inherit counters
    -M, --metrics <metric/metric group list>
                          monitor specified metrics or metric groups (separated by ,)
    -n, --null            null run - dont start any counters
    -o, --output <file>   output file name
    -p, --pid <pid>       stat events on existing process id
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)
    -S, --sync            call sync() before starting a run
    -t, --tid <tid>       stat events on existing thread id
    -T, --transaction     hardware transaction statistics
    -v, --verbose         be more verbose (show counter open errors, etc)
    -x, --field-separator <separator>
                          print counts with custom separator
        --append          append to the output file
        --filter <filter>
                          event filter
        --interval-clear  clear screen in between new interval
        --interval-count <n>
                          print counts for fixed number of times
        --log-fd <n>      log output to fd, instead of stderr
        --metric-only     Only print computed metrics. No raw values
        --no-merge        Do not merge identical named events
        --per-core        aggregate counts per physical processor core
        --per-socket      aggregate counts per processor socket
        --per-thread      aggregate counts per thread
        --post <command>  command to run after to the measured command
        --pre <command>   command to run prior to the measured command
        --smi-cost        measure SMI cost
        --table           display details about each run (only with -r option)
        --timeout <n>     stop workload and print counts after a timeout period in ms (>= 10ms)
        --topdown         measure topdown level 1 statistics

(END)
    -D, --delay <n>       ms to wait before starting measurement after program start
    -d, --detailed        detailed run - start a lot of events
    -e, --event <event>   event selector. use 'perf list' to list available events
    -G, --cgroup <name>   monitor event in cgroup name only
    -g, --group           put the counters into a counter group
    -I, --interval-print <n>
                          print counts at regular interval in ms (overhead is possible for values <= 100ms)
    -i, --no-inherit      child tasks do not inherit counters
    -M, --metrics <metric/metric group list>
                          monitor specified metrics or metric groups (separated by ,)
    -n, --null            null run - dont start any counters
    -o, --output <file>   output file name
    -p, --pid <pid>       stat events on existing process id
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)
    -S, --sync            call sync() before starting a run
    -t, --tid <tid>       stat events on existing thread id
    -T, --transaction     hardware transaction statistics
    -v, --verbose         be more verbose (show counter open errors, etc)
    -x, --field-separator <separator>
                          print counts with custom separator
        --append          append to the output file
        --filter <filter>
                          event filter
        --interval-clear  clear screen in between new interval
        --interval-count <n>
                          print counts for fixed number of times
        --log-fd <n>      log output to fd, instead of stderr
        --metric-only     Only print computed metrics. No raw values
        --no-merge        Do not merge identical named events
        --per-core        aggregate counts per physical processor core
        --per-socket      aggregate counts per processor socket
        --per-thread      aggregate counts per thread
        --post <command>  command to run after to the measured command
        --pre <command>   command to run prior to the measured command
        --smi-cost        measure SMI cost
        --table           display details about each run (only with -r option)
        --timeout <n>     stop workload and print counts after a timeout period in ms (>= 10ms)
        --topdown         measure topdown level 1 statistics


[root@localhost ~]# perf stat -e
 Error: switch `e' requires a value
 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events
[root@localhost ~]# 
[root@localhost ~]# perf list | more
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  L1-icache-loads                                    [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  armv8_pmuv3_0/br_mis_pred/                         [Kernel PMU event]
  armv8_pmuv3_0/br_mis_pred_retired/                 [Kernel PMU event]
  armv8_pmuv3_0/br_pred/                             [Kernel PMU event]
  armv8_pmuv3_0/br_retired/                          [Kernel PMU event]
  armv8_pmuv3_0/br_return_retired/                   [Kernel PMU event]
  armv8_pmuv3_0/bus_access/                          [Kernel PMU event]
  armv8_pmuv3_0/bus_cycles/                          [Kernel PMU event]
  armv8_pmuv3_0/cid_write_retired/                   [Kernel PMU event]
  armv8_pmuv3_0/cpu_cycles/                          [Kernel PMU event]
  armv8_pmuv3_0/dtlb_walk/                           [Kernel PMU event]
  armv8_pmuv3_0/exc_return/                          [Kernel PMU event]
  armv8_pmuv3_0/exc_taken/                           [Kernel PMU event]
  armv8_pmuv3_0/inst_retired/                        [Kernel PMU event]
  armv8_pmuv3_0/inst_spec/                           [Kernel PMU event]
  armv8_pmuv3_0/itlb_walk/                           [Kernel PMU event]
  armv8_pmuv3_0/l1d_cache/                           [Kernel PMU event]
  armv8_pmuv3_0/l1d_cache_refill/                    [Kernel PMU event]
  armv8_pmuv3_0/l1d_cache_wb/                        [Kernel PMU event]
  armv8_pmuv3_0/l1d_tlb/                             [Kernel PMU event]
  armv8_pmuv3_0/l1d_tlb_refill/                      [Kernel PMU event]
  armv8_pmuv3_0/l1i_cache/                           [Kernel PMU event]
  armv8_pmuv3_0/l1i_cache_refill/                    [Kernel PMU event]
  armv8_pmuv3_0/l1i_tlb/                             [Kernel PMU event]
  armv8_pmuv3_0/l1i_tlb_refill/                      [Kernel PMU event]
  armv8_pmuv3_0/l2d_cache/                           [Kernel PMU event]
  armv8_pmuv3_0/l2d_cache_refill/                    [Kernel PMU event]
  armv8_pmuv3_0/l2d_cache_wb/                        [Kernel PMU event]
  armv8_pmuv3_0/l2d_tlb/                             [Kernel PMU event]
  armv8_pmuv3_0/l2d_tlb_refill/                      [Kernel PMU event]
  armv8_pmuv3_0/l2i_cache/                           [Kernel PMU event]
  armv8_pmuv3_0/l2i_cache_refill/                    [Kernel PMU event]
  armv8_pmuv3_0/l2i_tlb/                             [Kernel PMU event]
  armv8_pmuv3_0/l2i_tlb_refill/                      [Kernel PMU event]
  armv8_pmuv3_0/ll_cache/                            [Kernel PMU event]

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值