java故障处理(内存100%，cpu100%，FullGC怎么办)

原创已于 2025-08-27 10:49:41 修改 · 3.1k 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#java

于 2019-06-04 11:52:11 首次发布

java 专栏收录该内容

38 篇文章

订阅专栏

收藏的好文章

好文收藏	来源（公众号或者作者）	地址	日期
一些长时间GC停顿问题的排查及解决办法	占小狼	https://mp.weixin.qq.com/s/fP–JJnkTR92NWdZtdEgqQ	2019-3-25
系统运行缓慢，CPU 100%，以及Full GC次数过多问题的排查思路	芋道源码	https://mp.weixin.qq.com/s/_tWm2G57vLgomvpNNHKAMA	2019-3-1
分享一次 Java 内存泄漏的排查	Java基基	https://mp.weixin.qq.com/s/M02Qk5OQ13xRytTK97SaFw	2019-3-14
并发环境下HashMap引起full gc排查	李小武	http://blog.lichengwu.cn/java/2015/04/06/case-of-hashmap-in-concurrency/	2015-4-6
Metaspace 引起的 FullGC 问题排查过程及解决方案	程序猿DD	https://mp.weixin.qq.com/s/rkTDMFkvBDZzT2fUfOjV_Q	2019-6-14
从一起GC血案谈到反射原理	假笨说	https://mp.weixin.qq.com/s/5H6UHcP6kvR2X5hTj_SBjA?	2017-01-12
深入理解Java虚拟机：（十六） Java虚拟机的性能监控及诊断工具	老周聊架构	https://riemann.blog.csdn.net/article/details/104157865	2020-2-2

一些常用命令

1.查看自己服务的进程id （pid）

ps -ef | grep java 或者 jps

2.查看是否有full gc *（5000ms打印一次，也可以去掉这个参数）

jstat -gcutil （pid）5000

  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  0.00 100.00  48.36  10.55  98.24  95.95     30    2.205     0    0.000    2.205
  0.00 100.00  70.42  10.55  98.24  95.95     30    2.205     0    0.000    2.205

3.查看堆内存使用状况

jmap -heap （pid）
java 11 用 jcmd 1964471 GC.heap_info
或者jhsdb jmap --heap --pid <PID>

jmap -heap 59191
Debugger attached successfully.
Server compiler detected.
JVM version is 25.45-b02

using thread-local object allocation.
Garbage-First (G1) GC with 2 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 4194304000 (4000.0MB)
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 2516582400 (2400.0MB)
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
   regions  = 4000
   capacity = 4194304000 (4000.0MB)
   used     = 556760056 (530.9677658081055MB)
   free     = 3637543944 (3469.0322341918945MB)
   13.274194145202637% used
G1 Young Generation:
Eden Space:
   regions  = 485
   capacity = 673185792 (642.0MB)
   used     = 508559360 (485.0MB)
   free     = 164626432 (157.0MB)
   75.54517133956386% used
Survivor Space:
   regions  = 3
   capacity = 3145728 (3.0MB)
   used     = 3145728 (3.0MB)
   free     = 0 (0.0MB)
   100.0% used
G1 Old Generation:
   regions  = 44
   capacity = 397410304 (379.0MB)
   used     = 45054968 (42.96776580810547MB)
   free     = 352355336 (336.03223419189453MB)
   11.337141374170308% used

4.现场保留

保留histo内存快照;jmap -histo (pid) > histo.log
JVM线程信息保存: jstack (pid) > stack.log
保存jvm堆内存快照 jmap -dump:live,format=b,file=heap.bin <pid>

jps -l # 或 ps -ef | grep java

生成 heap dump（包含所有对象）

jcmd <PID> GC.heap_dump /tmp/app-$(date +%Y%m%d-%H%M%S).hprof

仅保留“活对象”（会触发一次 Full GC，文件更小）

jcmd <PID> GC.heap_dump -all=false /tmp/app-live.hprof
# 等价于 jmap -dump:live,file=/tmp/app-live.hprof <PID>

应用自动生成（线上问题复现时最有用）

在发生 OOM 时自动落盘：
JAVA_TOOL_OPTIONS=“-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dumps”
或在启动参数里加上面两个 JVM 选项。
OOM 发生时会在 HeapDumpPath 目录生成 .hprof。

其他

top命令
在这里插入图片描述

Here is a list that explains what each column means.

PID: A process’s process ID number.
USER: The process’s owner.
PR: The process’s priority. The lower the number, the higher the priority.
NI: The nice value of the process, which affects its priority.
VIRT: How much virtual memory the process is using.
RES: How much physical RAM the process is using, measured in kilobytes.
SHR: How much shared memory the process is using.
S: The current status of the process (zombied, sleeping, running, uninterruptedly sleeping, or traced).
%CPU: The percentage of the processor time used by the process.
%MEM: The percentage of physical RAM used by the process.
TIME+: How much processor time the process has used.
COMMAND: The name of the command that started the process.

top -Hp (pid)
可以查看到当前进程的每个线程占用的cpu

~#top -Hp 3023620

top - 16:30:28 up 378 days, 16:08,  3 users,  load average: 8.41, 9.51, 10.33
Threads: 334 total,   4 running, 330 sleeping,   0 stopped,   0 zombie
%Cpu(s): 65.7 us,  3.0 sy,  0.0 ni, 29.4 id,  0.2 wa,  1.0 hi,  0.7 si,  0.0 st
MiB Mem :   7609.4 total,    173.4 free,   6582.7 used,    853.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    787.0 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                       
3024300 www       20   0 9945940   6.2g  23948 R  77.7  83.0   4:22.39 ForkJoinPool.co

转换线程 PID 为十六进制（用于 jstack）

printf "%x\n" 3024300
2e25ac

jstack 3023620 > jstack.txt

在 jstack.txt 中查找线程 ID

grep -A 30 "nid=0x2e25ac" jstack.txt

查询内存占用情况

$ jcmd 1964471 GC.class_histogram | head -n 30
1964471:
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       2119026      187444648  [B (java.base@11.0.18)
   2:        499742       74634968  [Ljava.lang.Object; (java.base@11.0.18)
   3:       2076948       49846752  java.lang.String (java.base@11.0.18)
   4:        558674       49163312  java.lang.reflect.Method (java.base@11.0.18)
   5:       1062045       33985440  java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.18)
   6:        646640       31038720  org.aspectj.weaver.reflect.ShadowMatchImpl
   7:         44536       29215616  com.dianping.cat.io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
   8:       1184914       28437936  java.lang.Long (java.base@11.0.18)
   9:        701747       22455904  org.apache.shardingsphere.sql.parser.sql.common.segment.dml.expr.simple.ParameterMarkerExpressionSegment
  10:        646640       20692480  org.aspectj.weaver.patterns.ExposedState
  11:        848784       20370816  java.util.LinkedList$Node (java.base@11.0.18)
  12:          7196       15459944  [C (java.base@11.0.18)
  13:        554902       13379488  [Z (java.base@11.0.18)
  14:        554242       13301800  [Lorg.aspectj.weaver.ast.Var;
  15:        136193       11984984  com.fangguo.bizcore.dal.dataobject.shop.statistics.ShopTradeBarcodePictureStatisticsDO
  16:        107427       11172408  com.fangguo.bizcore.dal.dataobject.shop.statistics.ShopTradeBarcodeStatisticsDO
  17:         29796       10629768  [I (java.base@11.0.18)
  18:        429718       10313232  java.util.ArrayList (java.base@11.0.18)
  19:        240721        9628840  java.util.LinkedHashMap$Entry (java.base@11.0.18)
  20:        300062        9601984  java.util.HashMap$Node (java.base@11.0.18)
  21:        352186        8452464  java.time.LocalDate (java.base@11.0.18)
  22:        140653        7876568  java.util.LinkedHashMap (java.base@11.0.18)
  23:        245344        7851008  org.antlr.v4.runtime.atn.ATNConfig
  24:          2629        7356096  [Ljava.util.concurrent.ConcurrentHashMap$Node; (java.base@11.0.18)
  25:           207        6786288  [Ljava.util.concurrent.ForkJoinTask; (java.base@11.0.18)
  26:         56047        6566704  [Ljava.util.HashMap$Node; (java.base@11.0.18)
  27:         53336        6471904  java.lang.Class (java.base@11.0.18)

Arthas

$ java -jar /app/arthas-boot.jar 3938402
[INFO] JAVA_HOME: /usr/lib/jvm/jdk-11-oracle-x64
[INFO] arthas-boot version: 3.6.9
[INFO] Process 113275 already using port 3658
[INFO] Process 113275 already using port 8563
[ERROR] The telnet port 3658 is used by process 113275 instead of target process 3938402, you will connect to an unexpected process.
[ERROR] 1. Try to restart arthas-boot, select process 113275, shutdown it first with running the 'stop' command.
[ERROR] 2. Or try to stop the existing arthas instance: java -jar arthas-client.jar 127.0.0.1 3658 -c "stop"
[ERROR] 3. Or try to use different telnet port, for example: java -jar arthas-boot.jar --telnet-port 9998 --http-port -1

根据提示，如果报错，可以改用其他端口启动 Arthas

$ java -jar /app/arthas-boot.jar --telnet-port 9998 --http-port -1 3938402
[INFO] JAVA_HOME: /usr/lib/jvm/jdk-11-oracle-x64
[INFO] arthas-boot version: 3.6.9
[INFO] arthas home: /home/www/.arthas/lib/4.0.5/arthas
[INFO] Try to attach process 3938402
Picked up JAVA_TOOL_OPTIONS: 
[INFO] Attach process 3938402 success.
[INFO] arthas-client connect 127.0.0.1 9998
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.                           
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'                          
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.                          
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |                         
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'                          

wiki        https://arthas.aliyun.com/doc                                       
tutorials   https://arthas.aliyun.com/doc/arthas-tutorials.html                 
version     4.0.5                                                               
main_class  /app/erp/backend/erp-shein-task/erp-shein-task.jar --spring.profile 
            s.active=shein,shein-prod,prod --mybatis-plus.configuration.log-imp 
            l=org.apache.ibatis.logging.nologging.NoLoggingImpl                 
pid         3938402                                                             
start_time  2025-06-25 20:49:10.180