收藏的好文章
好文收藏 | 来源(公众号或者作者) | 地址 | 日期 |
---|---|---|---|
一些长时间GC停顿问题的排查及解决办法 | 占小狼 | https://mp.weixin.qq.com/s/fP–JJnkTR92NWdZtdEgqQ | 2019-3-25 |
系统运行缓慢,CPU 100%,以及Full GC次数过多问题的排查思路 | 芋道源码 | https://mp.weixin.qq.com/s/_tWm2G57vLgomvpNNHKAMA | 2019-3-1 |
分享一次 Java 内存泄漏的排查 | Java基基 | https://mp.weixin.qq.com/s/M02Qk5OQ13xRytTK97SaFw | 2019-3-14 |
并发环境下HashMap引起full gc排查 | 李小武 | http://blog.lichengwu.cn/java/2015/04/06/case-of-hashmap-in-concurrency/ | 2015-4-6 |
Metaspace 引起的 FullGC 问题排查过程及解决方案 | 程序猿DD | https://mp.weixin.qq.com/s/rkTDMFkvBDZzT2fUfOjV_Q | 2019-6-14 |
从一起GC血案谈到反射原理 | 假笨说 | https://mp.weixin.qq.com/s/5H6UHcP6kvR2X5hTj_SBjA? | 2017-01-12 |
深入理解Java虚拟机:(十六) Java虚拟机的性能监控及诊断工具 | 老周聊架构 | https://riemann.blog.csdn.net/article/details/104157865 | 2020-2-2 |
一些常用命令
1.查看自己服务的进程id (pid)
ps -ef | grep java
或者 jps
2.查看是否有full gc *(5000ms打印一次,也可以去掉这个参数)
jstat -gcutil (pid)5000
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 100.00 48.36 10.55 98.24 95.95 30 2.205 0 0.000 2.205
0.00 100.00 70.42 10.55 98.24 95.95 30 2.205 0 0.000 2.205
3.查看堆内存使用状况
jmap -heap (pid)
java 11 用 jcmd 1964471 GC.heap_info
或者jhsdb jmap --heap --pid <PID>
jmap -heap 59191
Debugger attached successfully.
Server compiler detected.
JVM version is 25.45-b02
using thread-local object allocation.
Garbage-First (G1) GC with 2 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 4194304000 (4000.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 2516582400 (2400.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 1048576 (1.0MB)
Heap Usage:
G1 Heap:
regions = 4000
capacity = 4194304000 (4000.0MB)
used = 556760056 (530.9677658081055MB)
free = 3637543944 (3469.0322341918945MB)
13.274194145202637% used
G1 Young Generation:
Eden Space:
regions = 485
capacity = 673185792 (642.0MB)
used = 508559360 (485.0MB)
free = 164626432 (157.0MB)
75.54517133956386% used
Survivor Space:
regions = 3
capacity = 3145728 (3.0MB)
used = 3145728 (3.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 44
capacity = 397410304 (379.0MB)
used = 45054968 (42.96776580810547MB)
free = 352355336 (336.03223419189453MB)
11.337141374170308% used
4.现场保留
保留histo内存快照;jmap -histo (pid) > histo.log
JVM线程信息保存: jstack (pid) > stack.log
保存jvm堆内存快照 jmap -dump:live,format=b,file=heap.bin <pid>
jps -l # 或 ps -ef | grep java
生成 heap dump(包含所有对象)
jcmd <PID> GC.heap_dump /tmp/app-$(date +%Y%m%d-%H%M%S).hprof
仅保留“活对象”(会触发一次 Full GC,文件更小)
jcmd <PID> GC.heap_dump -all=false /tmp/app-live.hprof
# 等价于 jmap -dump:live,file=/tmp/app-live.hprof <PID>
应用自动生成(线上问题复现时最有用)
在发生 OOM 时自动落盘:
JAVA_TOOL_OPTIONS=“-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dumps”
或在启动参数里加上面两个 JVM 选项。
OOM 发生时会在 HeapDumpPath 目录生成 .hprof。
其他
Here is a list that explains what each column means.
PID: A process’s process ID number.
USER: The process’s owner.
PR: The process’s priority. The lower the number, the higher the priority.
NI: The nice value of the process, which affects its priority.
VIRT: How much virtual memory the process is using.
RES: How much physical RAM the process is using, measured in kilobytes.
SHR: How much shared memory the process is using.
S: The current status of the process (zombied, sleeping, running, uninterruptedly sleeping, or traced).
%CPU: The percentage of the processor time used by the process.
%MEM: The percentage of physical RAM used by the process.
TIME+: How much processor time the process has used.
COMMAND: The name of the command that started the process.
top -Hp (pid)
可以查看到当前进程的每个线程占用的cpu
~#top -Hp 3023620
top - 16:30:28 up 378 days, 16:08, 3 users, load average: 8.41, 9.51, 10.33
Threads: 334 total, 4 running, 330 sleeping, 0 stopped, 0 zombie
%Cpu(s): 65.7 us, 3.0 sy, 0.0 ni, 29.4 id, 0.2 wa, 1.0 hi, 0.7 si, 0.0 st
MiB Mem : 7609.4 total, 173.4 free, 6582.7 used, 853.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 787.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3024300 www 20 0 9945940 6.2g 23948 R 77.7 83.0 4:22.39 ForkJoinPool.co
转换线程 PID 为十六进制(用于 jstack)
printf "%x\n" 3024300
2e25ac
jstack 3023620 > jstack.txt
在 jstack.txt 中查找线程 ID
grep -A 30 "nid=0x2e25ac" jstack.txt
查询内存占用情况
$ jcmd 1964471 GC.class_histogram | head -n 30
1964471:
num #instances #bytes class name (module)
-------------------------------------------------------
1: 2119026 187444648 [B (java.base@11.0.18)
2: 499742 74634968 [Ljava.lang.Object; (java.base@11.0.18)
3: 2076948 49846752 java.lang.String (java.base@11.0.18)
4: 558674 49163312 java.lang.reflect.Method (java.base@11.0.18)
5: 1062045 33985440 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.18)
6: 646640 31038720 org.aspectj.weaver.reflect.ShadowMatchImpl
7: 44536 29215616 com.dianping.cat.io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
8: 1184914 28437936 java.lang.Long (java.base@11.0.18)
9: 701747 22455904 org.apache.shardingsphere.sql.parser.sql.common.segment.dml.expr.simple.ParameterMarkerExpressionSegment
10: 646640 20692480 org.aspectj.weaver.patterns.ExposedState
11: 848784 20370816 java.util.LinkedList$Node (java.base@11.0.18)
12: 7196 15459944 [C (java.base@11.0.18)
13: 554902 13379488 [Z (java.base@11.0.18)
14: 554242 13301800 [Lorg.aspectj.weaver.ast.Var;
15: 136193 11984984 com.fangguo.bizcore.dal.dataobject.shop.statistics.ShopTradeBarcodePictureStatisticsDO
16: 107427 11172408 com.fangguo.bizcore.dal.dataobject.shop.statistics.ShopTradeBarcodeStatisticsDO
17: 29796 10629768 [I (java.base@11.0.18)
18: 429718 10313232 java.util.ArrayList (java.base@11.0.18)
19: 240721 9628840 java.util.LinkedHashMap$Entry (java.base@11.0.18)
20: 300062 9601984 java.util.HashMap$Node (java.base@11.0.18)
21: 352186 8452464 java.time.LocalDate (java.base@11.0.18)
22: 140653 7876568 java.util.LinkedHashMap (java.base@11.0.18)
23: 245344 7851008 org.antlr.v4.runtime.atn.ATNConfig
24: 2629 7356096 [Ljava.util.concurrent.ConcurrentHashMap$Node; (java.base@11.0.18)
25: 207 6786288 [Ljava.util.concurrent.ForkJoinTask; (java.base@11.0.18)
26: 56047 6566704 [Ljava.util.HashMap$Node; (java.base@11.0.18)
27: 53336 6471904 java.lang.Class (java.base@11.0.18)
Arthas
$ java -jar /app/arthas-boot.jar 3938402
[INFO] JAVA_HOME: /usr/lib/jvm/jdk-11-oracle-x64
[INFO] arthas-boot version: 3.6.9
[INFO] Process 113275 already using port 3658
[INFO] Process 113275 already using port 8563
[ERROR] The telnet port 3658 is used by process 113275 instead of target process 3938402, you will connect to an unexpected process.
[ERROR] 1. Try to restart arthas-boot, select process 113275, shutdown it first with running the 'stop' command.
[ERROR] 2. Or try to stop the existing arthas instance: java -jar arthas-client.jar 127.0.0.1 3658 -c "stop"
[ERROR] 3. Or try to use different telnet port, for example: java -jar arthas-boot.jar --telnet-port 9998 --http-port -1
根据提示,如果报错, 可以改用其他端口启动 Arthas
$ java -jar /app/arthas-boot.jar --telnet-port 9998 --http-port -1 3938402
[INFO] JAVA_HOME: /usr/lib/jvm/jdk-11-oracle-x64
[INFO] arthas-boot version: 3.6.9
[INFO] arthas home: /home/www/.arthas/lib/4.0.5/arthas
[INFO] Try to attach process 3938402
Picked up JAVA_TOOL_OPTIONS:
[INFO] Attach process 3938402 success.
[INFO] arthas-client connect 127.0.0.1 9998
,---. ,------. ,--------.,--. ,--. ,---. ,---.
/ O \ | .--. ''--. .--'| '--' | / O \ ' .-'
| .-. || '--'.' | | | .--. || .-. |`. `-.
| | | || |\ \ | | | | | || | | |.-' |
`--' `--'`--' '--' `--' `--' `--'`--' `--'`-----'
wiki https://arthas.aliyun.com/doc
tutorials https://arthas.aliyun.com/doc/arthas-tutorials.html
version 4.0.5
main_class /app/erp/backend/erp-shein-task/erp-shein-task.jar --spring.profile
s.active=shein,shein-prod,prod --mybatis-plus.configuration.log-imp
l=org.apache.ibatis.logging.nologging.NoLoggingImpl
pid 3938402
start_time 2025-06-25 20:49:10.180
查看线程占用前5
[arthas@3938402]$ thread -n 5
参考https://www.deonsworld.co.za/2012/12/20/understanding-and-using-htop-monitor-system-resources/