轉載留存:
點擊查看原文鏈接
最近兩天測試環境有一個服務總是會掛(兩到三天一次),JVM虛擬機總是會崩潰。所以有必要了解JVM崩潰的原因是什么。
當JVM發生致命錯誤導致崩潰時,會生成一個hs_err_pid_xxx.log這樣的文件,該文件包含了導致 JVM crash 的重要信息,我們可以通過分析該文件定位到導致 JVM Crash 的原因,從而修復保證系統穩定。
默認情況下,該文件是生成在工作目錄下的,當然也可以通過 JVM 參數指定生成路徑:
-
XX:ErrorFile=/var/log/hs_err_pid<pid>.log
1
這個文件主要包含如下內容:
日志頭文件
導致 crash 的線程信息
所有線程信息
安全點和鎖信息
堆信息
本地代碼緩存
編譯事件
gc 相關記錄
jvm 內存映射
jvm 啟動參數
服務器信息
下面就根據這個文件內容逐步解析。
1|1日志頭文件
內容如下
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (
0xb
) at pc=
0x0000003797807a91
, pid=
29071
, tid=
139901421901568
#
# JRE version: Java(TM) SE Runtime Environment (
8.0
_45-b14) (build
1.8
.0
_45-b14)
# Java VM: Java HotSpot(TM)
64
-Bit Server VM (
25.45
-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libresolv.so
.2
+
0x7a91
] __libc_res_nquery+
0x1c1
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping,
try
"ulimit -c unlimited"
before starting Java again
#
# If you would like to submit a bug report, please visit:
# http:
//bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in
native
code.
# See problematic frame
for
where to report the bug.
#
這段內容主要簡述了導致 JVM Crash 的原因。常見的原因有 JVM 自身的 bug,應用程序錯誤,JVM 參數,服務器資源不足,JNI 調用錯誤等。當然還有一些版本和配置信息,
SIGSEGV (0xb) at pc
=
0x0000003797807a91, pid=29071, tid=139901421901568
1
非預期的錯誤被 JRE 檢測到了,其中
SIGSEGV :信號量
0xb :信號碼
pc=0x0000003797807a91 :程序計數器的值
pid=29071 :進程號
tid=139901421901568 :線程號
SIGSEGV(0xb) 表示 JVM Crash 時正在執行 JNI 代碼,常見的描述還有
EXCEPTION_ACCESS_VIOLATION
,該描述表示 JVM Crash 時正在執行 JVM 自身的代碼,這往往是因為 JVM 的 Bug 導致的 Crash;另一種常見的描述是
EXCEPTION_STACK_OVERFLOW
,該描述表示這是個棧溢出導致的錯誤,這往往是應用程序中存在深層遞歸導致的。
# JRE version: Java(TM) SE Runtime Environment (8.0_45-b14) (build 1.8.0_45-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode linux-amd64 compressed oops)
1
2
JRE 和 JVM 的版本信息
# Problematic frame:
# C [libresolv.so.2+0x7a91] __libc_res_nquery+0x1c1
1
2
這個信息比較重要,問題幀信息:
C
表示幀類型為本地幀,還有其他類型:
j : 解釋的Java幀
V : 虛擬機幀
v :虛擬機生成的存根棧幀
J:其他幀類型,包括編譯后的Java幀
[libresolv.so.2+0x7a91] __libc_res_nquery+0x1c1
和程序計數器(pc)表達的含義一樣,但是用的是本地so庫+偏移量的方式。
1|2導致 crash 的線程信息
內容如下:
--------------- T H R E A D ---------------
Current thread (
0x0000000001e94800
): JavaThread
"pool-1-thread-2"
[_thread_in_native, id=
30111
, stack(
0x00007f3d567e5000
,
0x00007f3d568e6000
)]
siginfo: si_signo:
11
(SIGSEGV), si_code:
1
(SEGV_MAPERR), si_addr:
0x0000000000000003
Registers:
RAX=
0x0000000000000000
, RBX=
0x0000000000000000
, RCX=
0x0000000000000000
, RDX=
0x0000000000000050
RSP=
0x00007f3d568e2280
, RBP=
0x00007f3d568e2570
, RSI=
0x0000000000000000
, RDI=
0x00000000ffffffff
R8 =
0x0000000000000000
, R9 =
0x0000000000000000
, R10=
0x000000000007a337
, R11=
0x0000000000000213
R12=
0x00007f3d568e2ef0
, R13=
0x00007f3d568e22b0
, R14=
0x0000000000000000
, R15=
0x00007f3d568e5db8
RIP=
0x0000003797807a91
, EFLAGS=
0x0000000000010246
, CSGSFS=
0x0000000000000033
, ERR=
0x0000000000000004
TRAPNO=
0x000000000000000e
Top of Stack: (sp=
0x00007f3d568e2280
)
0x00007f3d568e2280
: b8e4bfb900000800
00007
f3d568e3760
0x00007f3d568e2290
:
00007
f3d568e3758
00007
f3d568e377c
0x00007f3d568e22a0
:
00007
f3d568e3778
6
f6e6b6e56a88a58
0x00007f3d568e22b0
:
00000100000149
a0
7
a68710800000000
0x00007f3d568e22c0
:
6970067363642
d78
6
d6f63036e61676e
....省略
Instructions: (pc=
0x0000003797807a91
)
0x0000003797807a71
:
48
89
45
b8
48
8
b
4
d b8
0
f b6
51
03
89
d3
83
e3
0x0000003797807a81
:
0
f
75
0
d
0
f b7
49
06
66
c1 c9
08
66
85
c9
75
4
f
0x0000003797807a91
:
0
f b6
48
03
bf
0
f
00
00
00
40
20
cf
75
0
d
0
f b7
0x0000003797807aa1
:
70
06
66
c1 ce
08
66
85
f6
75
34
83
e1
0
f
83
e2
Register to memory mapping:
RAX=
0x0000000000000000
is an unknown value
RBX=
0x0000000000000000
is an unknown value
RCX=
0x0000000000000000
is an unknown value
RDX=
0x0000000000000050
is an unknown value
RSP=
0x00007f3d568e2280
is pointing into the stack
for
thread:
0x0000000001e94800
RBP=
0x00007f3d568e2570
is pointing into the stack
for
thread:
0x0000000001e94800
... 省略
Stack: [
0x00007f3d567e5000
,
0x00007f3d568e6000
], sp=
0x00007f3d568e2280
, free space=
1012
k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=
native
code)
C [libresolv.so
.2
+
0x7a91
] __libc_res_nquery+
0x1c1
C [libresolv.so
.2
+
0x7fd1
]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J
15056
java.net.Inet6AddressImpl.lookupAllHostAddr(Ljava/lang/String;)[Ljava/net/InetAddress; (
0
bytes) @
0x00007f3d7492af8c
[
0x00007f3d7492af40
+
0x4c
]
J
14966
C1 java.net.InetAddress.getAddressesFromNameService(Ljava/lang/String;Ljava/net/InetAddress;)[Ljava/net/InetAddress; (
245
bytes) @
0x00007f3d75466754
[
0x00007f3d754662c0
+
0x494
]
J
14291
C2 java.net.InetAddress.getAllByName(Ljava/lang/String;Ljava/net/InetAddress;)[Ljava/net/InetAddress; (
387
bytes) @
0x00007f3d7534b718
[
0x00007f3d7534ae20
+
0x8f8
]
J
14178
C1 java.net.InetSocketAddress.<init>(Ljava/lang/String;I)V (
47
bytes) @
0x00007f3d752ce0f4
[
0x00007f3d752cdec0
+
0x234
]
j sun.security.ssl.SSLSocketImpl.<init>(Lsun/security/ssl/SSLContextImpl;Ljava/lang/String;ILjava/net/InetAddress;I)V+
144
j sun.security.ssl.SSLSocketFactoryImpl.createSocket(Ljava/lang/String;ILjava/net/InetAddress;I)Ljava/net/Socket;+
13
j com.ufclub.daq.qhzx.utils.SSLProtocolSocketFactory.createSocket(Ljava/lang/String;ILjava/net/InetAddress;I)Ljava/net/Socket;+
15
.... 省略代碼
這部分內容包含出發 JVM 致命錯誤的線程詳細信息和線程棧。
線程信息
Current thread (
0x0000000001e94800
): JavaThread
"pool-1-thread-2"
[_thread_in_native, id=
30111
, stack(
0x00007f3d567e5000
,
0x00007f3d568e6000
)]
1
0x0000000001e94800:出錯的線程指針
JavaThread:線程類型,可能的類型包括
JavaThread:Java線程
VMThread : JVM 的內部線程
CompilerThread:用來調用JITing,實時編譯裝卸class 。 通常,jvm會啟動多個線程來處理這部分工作,線程名稱后面的數字也會累加,例如:CompilerThread1
GCTaskThread:執行gc的線程
WatcherThread:JVM 周期性任務調度的線程,是一個單例對象
ConcurrentMarkSweepThread:jvm在進行CMS GC的時候,會創建一個該線程去進行GC,該線程被創建的同時會創建一個SurrogateLockerThread(簡稱SLT)線程並且啟動它,SLT啟動之后,處於等待階段。CMST開始GC時,會發一個消息給SLT讓它去獲取Java層Reference對象的全局鎖:Lock
pool-1-thread-2:線程名稱
_thread_in_native:當前線程狀態。該描述還包含有:
_thread_in_native:線程當前狀態,狀態枚舉包括:
_thread_uninitialized:線程還沒有創建,它只在內存原因崩潰的時候才出現
_thread_new:線程已經被創建,但是還沒有啟動
_thread_in_native:線程正在執行本地代碼,一般這種情況很可能是本地代碼有問題
_thread_in_vm:線程正在執行虛擬機代碼
_thread_in_Java:線程正在執行解釋或者編譯后的Java代碼
_thread_blocked:線程處於阻塞狀態
…_trans:以_trans結尾,線程正處於要切換到其它狀態的中間狀態
id=30111:線程ID
stack(0x00007f3d567e5000,0x00007f3d568e6000):棧區間
siginfo
:
si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000003
1
表示導致虛擬機終止的非預期的信號信息。
Top of Stack: (sp=
0x00007f3d568e2280
)
0x00007f3d568e2280
: b8e4bfb900000800
00007
f3d568e3760
0x00007f3d568e2290
:
00007
f3d568e3758
00007
f3d568e377c
0x00007f3d568e22a0
:
00007
f3d568e3778
6
f6e6b6e56a88a58
0x00007f3d568e22b0
:
00000100000149
a0
7
a68710800000000
0x00007f3d568e22c0
:
6970067363642
d78
6
d6f63036e61676e
....省略
Instructions: (pc=
0x0000003797807a91
)
0x0000003797807a71
:
48
89
45
b8
48
8
b
4
d b8
0
f b6
51
03
89
d3
83
e3
0x0000003797807a81
:
0
f
75
0
d
0
f b7
49
06
66
c1 c9
08
66
85
c9
75
4
f
0x0000003797807a91
:
0
f b6
48
03
bf
0
f
00
00
00
40
20
cf
75
0
d
0
f b7
0x0000003797807aa1
:
70
06
66
c1 ce
08
66
85
f6
75
34
83
e1
0
f
83
e2
棧頂程序計數器旁的操作碼,它們可以被反匯編成系統崩潰前執行的指令。
Stack: [
0x00007f3d567e5000
,
0x00007f3d568e6000
], sp=
0x00007f3d568e2280
, free space=
1012
k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=
native
code)
C [libresolv.so
.2
+
0x7a91
] __libc_res_nquery+
0x1c1
C [libresolv.so
.2
+
0x7fd1
]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J
15056
java.net.Inet6AddressImpl.lookupAllHostAddr(Ljava/lang/String;)[Ljava/net/InetAddress; (
0
bytes) @
0x00007f3d7492af8c
[
0x00007f3d7492af40
+
0x4c
]
J
14966
C1 java.net.InetAddress.getAddressesFromNameService(Ljava/lang/String;Ljava/net/InetAddress;)[Ljava/net/InetAddress; (
245
bytes) @
0x00007f3d75466754
[
0x00007f3d754662c0
+
0x494
]
J
14291
C2 java.net.InetAddress.getAllByName(Ljava/lang/String;Ljava/net/InetAddress;)[Ljava/net/InetAddress; (
387
bytes) @
0x00007f3d7534b718
[
0x00007f3d7534ae20
+
0x8f8
]
J
14178
C1 java.net.InetSocketAddress.<init>(Ljava/lang/String;I)V (
47
bytes) @
0x00007f3d752ce0f4
[
0x00007f3d752cdec0
+
0x234
]
j sun.security.ssl.SSLSocketImpl.<init>(Lsun/security/ssl/SSLContextImpl;Ljava/lang/String;ILjava/net/InetAddress;I)V+
144
j sun.security.ssl.SSLSocketFactoryImpl.createSocket(Ljava/lang/String;ILjava/net/InetAddress;I)Ljava/net/Socket;+
13
j com.ufclub.daq.qhzx.utils.SSLProtocolSocketFactory.createSocket(Ljava/lang/String;ILjava/net/InetAddress;I)Ljava/net/Socket;+
15
.... 省略代碼
線程棧信息。包含了地址、棧頂、棧計數器和線程尚未使用的棧信息。到這里就基本上已經確定了問題所在原因了。
1|3線程信息
Java Threads: ( => current thread )
0x00007f3d5d0a0800
JavaThread
"logback-8"
daemon [_thread_blocked, id=
489
, stack(
0x00007f3d56de7000
,
0x00007f3d56ee8000
)]
0x00007f3d5d09f800
JavaThread
"logback-7"
daemon [_thread_blocked, id=
30974
, stack(
0x00007f3d53fc3000
,
0x00007f3d540c4000
)]
0x00007f3d5c47f800
JavaThread
"logback-6"
daemon [_thread_blocked, id=
25662
, stack(
0x00007f3d545c9000
,
0x00007f3d546ca000
)]
0x00007f3d5c2a4000
JavaThread
"logback-5"
daemon [_thread_blocked, id=
20922
, stack(
0x00007f3d552d2000
,
0x00007f3d553d3000
)]
0x0000000003291800
JavaThread
"logback-4"
daemon [_thread_blocked, id=
16768
, stack(
0x00007f3d542c6000
,
0x00007f3d543c7000
)]
0x0000000002320000
JavaThread
"logback-3"
daemon [_thread_blocked, id=
14730
, stack(
0x00007f3d54bcd000
,
0x00007f3d54cce000
)]
0x0000000002d05000
JavaThread
"logback-2"
daemon [_thread_blocked, id=
6569
, stack(
0x00007f3d549cb000
,
0x00007f3d54acc000
)]
所有線程信息,一目了然。_thread_blocked表示阻塞。
1|4安全點和鎖信息
VM state:not at safepoint (normal execution)
1
虛擬機狀態。not at safepoint 表示正常運行。其余狀態:
- at safepoint:所有線程都因為虛擬機等待狀態而阻塞,等待一個虛擬機操作完成;
- synchronizing:一個特殊的虛擬機操作,要求虛擬機內的其它線程保持等待狀態。
VM Mutex/Monitor currently owned by a thread: None
1
虛擬機的 Mutex 和 Monito r目前沒有被線程持有。Mutex 是虛擬機內部的鎖,而 Monitor 則關聯到了 Java 對象。
1|5堆信息
Heap:
PSYoungGen total
178688
K, used
25522
K [
0x00000000eab00000
,
0x00000000f8d80000
,
0x0000000100000000
)
eden space
177664
K,
13
% used [
0x00000000eab00000
,
0x00000000ec343d30
,
0x00000000f5880000
)
from space
1024
K,
65
% used [
0x00000000f8c80000
,
0x00000000f8d28d20
,
0x00000000f8d80000
)
to space
1024
K,
0
% used [
0x00000000f8b80000
,
0x00000000f8b80000
,
0x00000000f8c80000
)
ParOldGen total
360448
K, used
47193
K [
0x00000000c0000000
,
0x00000000d6000000
,
0x00000000eab00000
)
object space
360448
K,
13
% used [
0x00000000c0000000
,
0x00000000c2e16518
,
0x00000000d6000000
)
Metaspace used
79300
K, capacity
80628
K, committed
80936
K, reserved
1120256
K
class space used
9401
K, capacity
9645
K, committed
9768
K, reserved
1048576
K
Card table byte_map: [
0x00007f3d729f1000
,
0x00007f3d72bf2000
] byte_map_base:
0x00007f3d723f1000
新生代、老年代、元空間一目了然。
Card table
表示一種卡表,是 jvm 維護的一種數據結構,用於記錄更改對象時的引用,以便 gc 時遍歷更少的 table 和 root。
1|6本地代碼緩存
CodeCache: size
=
245760Kb used=41374Kb max_used=41402Kb free=204385Kb
bounds [0x00007f3d72fb2000, 0x00007f3d75872000, 0x00007f3d81fb2000]
total_blobs=12767 nmethods=12130 adapters=549
compilation: enabled
一塊用於編譯和保存本地代碼的內存。
1|7編譯事件
Compilation events (
10
events):
Event:
501041.566
Thread
0x0000000001b6e000
16334
3
sun.security.rsa.RSAKeyFactory::<init> (
5
bytes)
Event:
501041.566
Thread
0x0000000001b6e000
nmethod
16334
0x00007f3d74985790
code [
0x00007f3d74985900
,
0x00007f3d74985b10
]
Event:
501041.569
Thread
0x0000000001b6e000
16335
3
sun.security.pkcs.PKCS8Key::<init> (
5
bytes)
Event:
501041.570
Thread
0x0000000001b6e000
nmethod
16335
0x00007f3d74736290
code [
0x00007f3d74736400
,
0x00007f3d747365b0
]
Event:
501041.575
Thread
0x0000000001b6e000
16336
3
sun.security.ssl.BaseSSLSocketImpl::<init> (
15
bytes)
Event:
501041.576
Thread
0x0000000001b6e000
nmethod
16336
0x00007f3d73f9b450
code [
0x00007f3d73f9b5c0
,
0x00007f3d73f9b7e8
]
Event:
501041.578
Thread
0x0000000001b6e000
16337
3
javax.net.ssl.SSLSocket::<init> (
5
bytes)
Event:
501041.580
Thread
0x0000000001b6e000
nmethod
16337
0x00007f3d739c7210
code [
0x00007f3d739c7380
,
0x00007f3d739c7508
]
Event:
501041.582
Thread
0x0000000001b6e000
16338
3
javax.net.ssl.SNIServerName::<init> (
66
bytes)
Event:
501041.583
Thread
0x0000000001b6e000
nmethod
16338
0x00007f3d74c15cd0
code [
0x00007f3d74c15ea0
,
0x00007f3d74c164f8
]
記錄10次編譯事件。這里的信息也印證了上面的結論。
1|8gc 日志
GC Heap History (
10
events):
Event:
476166.948
GC heap before
{Heap before GC invocations=
149
(full
3
):
PSYoungGen total
194560
K, used
193984
K [
0x00000000eab00000
,
0x00000000f8e80000
,
0x0000000100000000
)
eden space
193536
K,
100
% used [
0x00000000eab00000
,
0x00000000f6800000
,
0x00000000f6800000
)
from space
1024
K,
43
% used [
0x00000000f8c80000
,
0x00000000f8cf0000
,
0x00000000f8d80000
)
to space
1024
K,
0
% used [
0x00000000f8d80000
,
0x00000000f8d80000
,
0x00000000f8e80000
)
ParOldGen total
360448
K, used
47161
K [
0x00000000c0000000
,
0x00000000d6000000
,
0x00000000eab00000
)
object space
360448
K,
13
% used [
0x00000000c0000000
,
0x00000000c2e0e518
,
0x00000000d6000000
)
Metaspace used
79243
K, capacity
80500
K, committed
80680
K, reserved
1120256
K
class space used
9400
K, capacity
9645
K, committed
9768
K, reserved
1048576
K
Event:
476166.985
GC heap after
Heap after GC invocations=
149
(full
3
):
PSYoungGen total
190464
K, used
448
K [
0x00000000eab00000
,
0x00000000f8e00000
,
0x0000000100000000
)
eden space
189952
K,
0
% used [
0x00000000eab00000
,
0x00000000eab00000
,
0x00000000f6480000
)
from space
512
K,
87
% used [
0x00000000f8d80000
,
0x00000000f8df0000
,
0x00000000f8e00000
)
to space
1024
K,
0
% used [
0x00000000f8c00000
,
0x00000000f8c00000
,
0x00000000f8d00000
)
ParOldGen total
360448
K, used
47161
K [
0x00000000c0000000
,
0x00000000d6000000
,
0x00000000eab00000
)
object space
360448
K,
13
% used [
0x00000000c0000000
,
0x00000000c2e0e518
,
0x00000000d6000000
)
Metaspace used
79243
K, capacity
80500
K, committed
80680
K, reserved
1120256
K
class space used
9400
K, capacity
9645
K, committed
9768
K, reserved
1048576
K
}
... 省略
同樣是記錄10次 GC。
1|9jvm 內存映射
Dynamic libraries:
00400000
-
00401000
r-xp
00000000
fd:
00
2108521
/usr/java/jdk1.8.0_45/bin/java
00600000
-
00601000
rw-p
00000000
fd:
00
2108521
/usr/java/jdk1.8.0_45/bin/java
019e9000
-04f5e000 rw-p
00000000
00
:
00
0
[heap]
c0000000-d6000000 rw-p
00000000
00
:
00
0
d6000000-eab00000 ---p
00000000
00
:
00
0
...
7f3d6df48000-7f3d6df6a000 r--s
0038e000
fd:
00
2108900
/usr/java/jdk1.8.0_45/jre/lib/ext/cldrdata.jar
7f3d6df6a000-7f3d6df73000 r--s 07db3000 fd:
00
2374798
/opt/risk/service/xxx-xxx-container/xxx-xxxx-container.jar
...
這些信息是虛擬機崩潰時的虛擬內存列表區域。它可以告訴你崩潰原因時哪些類庫正在被使用,位置在哪里,還有堆棧和守護頁信息。
00400000-00401000:內存區域
r-xp:權限,r/w/x/p/s分別表示讀/寫/執行/私有/共享
00000000:文件內的偏移量
fd:00:文件位置的majorID和minorID
2108521:索引節點號
/usr/java/jdk1.8.0_45/bin/java:文件位置
從
/opt/risk/service/xxx-xxx-container/xxx-xxxx-container.jar
我們可以確認是那個jar出問題了。
1|10jvm 啟動參數
VM Arguments:
jvm_args: -Xmx1024M -Xms512M -XX:PermSize=
128
M -XX:MaxPermSize=
256
M
java_command: /opt/risk/service/xxx-xxx-xxx-container/xxx-xxx-xxx-container.jar
java_class_path (initial): /opt/risk/service/xxx-xxx-xxx-container/xxx-xxx-xxx-container.jar
Launcher Type: SUN_STANDARD
Environment Variables:
JAVA_HOME=/usr/java/jdk1
.8
.0
_45
CLASSPATH=.:/usr/java/jdk1
.8
.0
_45/lib/dt.jar:/usr/java/jdk1
.8
.0
_45/lib/tools.jar
PATH=/usr/java/jdk1
.8
.0
_45/bin:/bin:/usr/local/erlang/bin:/usr/local/maven3/bin:/usr/local/git/bin:/usr/java/jdk1
.8
.0
_45/bin:/bin:/usr/local/erlang/bin:/usr/local/maven3/bin:/usr/local/git/bin:/usr/lib64/qt-
3.3
/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/root/bin
SHELL=/bin/bash
jvm 虛擬機參數和環境變量。