Java程序线上问题排查

Java程序线上问题排查

ps命令查看进程id

1
2
3
4
5
6
7
8
# 普通使用方法
ps -ef | grep {应用关键字}

# 过滤掉grep进程的记录
ps -ef | grep {应用关键字} | grep -v "grep"

# 只取进程id
ps -ef | grep {应用关键字} | grep -v "grep" | awk '{print $2}'

jmap命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# 查看堆内存使用情况
jmap -heap {pid}

# 结合上述用法
ps -ef | grep {应用关键字} | grep -v "grep" | xargs jmap -heap

[root@xxx ~]# ps -ef | grep xxx | grep -v "grep" | awk '{print $2}' | xargs jmap -heap
Attaching to process ID 83112, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.212-b04

using thread-local object allocation.
Garbage-First (G1) GC with 8 thread(s)

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 1073741824 (1024.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 643825664 (614.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
regions = 1024
capacity = 1073741824 (1024.0MB)
used = 325123808 (310.0622253417969MB)
free = 748618016 (713.9377746582031MB)
30.27951419353485% used
G1 Young Generation:
Eden Space:
regions = 49
capacity = 661651456 (631.0MB)
used = 51380224 (49.0MB)
free = 610271232 (582.0MB)
7.765451664025356% used
Survivor Space:
regions = 14
capacity = 14680064 (14.0MB)
used = 14680064 (14.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 269
capacity = 397410304 (379.0MB)
used = 259063520 (247.06222534179688MB)
free = 138346784 (131.93777465820312MB)
65.18792225377226% used

40192 interned Strings occupying 4681792 bytes.
1
2
# 转储内存快照至hprof文件,方便使用MAT分析,一般定位内存溢出和泄漏
jmap -dump:format=b,file=./xxx.hprof {pid}

jstat命令查看垃圾收集GC情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# 每隔2000毫秒输出一次gc情况,输出20次
jstat -gc {pid} 2000 20

[root@host-192-168-98-230 ~]# jstat -gc 83112 2000 20
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 75776.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 76800.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
0.0 14336.0 0.0 14336.0 646144.0 76800.0 388096.0 252991.7 102144.0 96846.7 12032.0 11092.7 620248 10258.578 44644 32701.778 42960.356
...

# 结果说明
S0C:第一个幸存区的大小
S1C:第二个幸存区的大小
S0U:第一个幸存区的使用大小
S1U:第二个幸存区的使用大小
EC:伊甸园区的大小
EU:伊甸园区的使用大小
OC:老年代大小
OU:老年代使用大小
MC:方法区大小
MU:方法区使用大小
CCSC:压缩类空间大小
CCSU:压缩类空间使用大小
YGC:年轻代垃圾回收次数
YGCT:年轻代垃圾回收消耗时间
FGC:老年代垃圾回收次数
FGCT:老年代垃圾回收消耗时间
GCT:垃圾回收消耗总时间

jstack命令查看线程快照,定位死锁/阻塞等问题

1
2
3
4
5
6
7
8
9
10
jstack {pid}

# 一般流程
1. top -p -H {pid},查看该java进程中的异常线程pid
2. printf 0x%x {tid},转换线程id为十六进制
3. jstack {pid} > ./xxx.log
4. 在jstack日志中查看第二步的线程id日志

# one more thing
可以将jstack日志上传到 https://gceasy.io/index.jsp#features 进行可视化分析

打开GC日志

1
2
3
4
在java启动参数添加
-XX:+PrintGC # 简单的日志
-XX:+PrintGCDetails # 详细的日志
-Xloggc:/xxx/logs/gc.$$.log" # 输出到文件,$$符号的意思是表示脚本当前运行的进程号