最近發現java應用占用的內存和CPU都很高,第一反應是業務代碼問題,跟開發反饋,開發說沒問題,后來發現十幾個微服務同樣都是出現這種情況,讓我不得不懷疑需要優化JVM的參數,其實也就是一些啟動參數罷了。開發也沒解決,只能自己硬着頭皮上了。
這里總結一下排查的步驟:
首先是自己寫了個腳本(文章最后粘貼)排查問題出現在哪里,報錯如下所示:
[1] Busy(3.2%) thread(30444/0x76ec) stack of java process(30435) under user(root):
"VM Thread" os_prio=0 tid=0x00007f16800de800 nid=0x76ec runnable
[2] Busy(3.0%) thread(30442/0x76ea) stack of java process(30435) under user(root):
"Gang worker#3 (Parallel GC Threads)" os_prio=0 tid=0x00007f1680021800 nid=0x76ea runnable
[3] Busy(3.0%) thread(30441/0x76e9) stack of java process(30435) under user(root):
"Gang worker#2 (Parallel GC Threads)" os_prio=0 tid=0x00007f1680020000 nid=0x76e9 runnable
看的出來:"VM Thread"就是該cpu消耗較高的線程,查看相關文檔我們得知,VM Thread是JVM層面的一個線程,主要工作是對其他線程的創建,分配和對象的清理等工作的。從后面幾個線程也可以看出,JVM正在進行大量的GC工作。這里的原因已經比較明顯了,即大量的GC工作導致項目運行緩慢。那么具體是什么原因導致這么多的GC工作呢,我們使用了jstat命令查看了內存使用情況:
看的出來FGC非常頻繁,而且GCT時間也很久。
接下來再分析一下新老年代分配的空間,如下所示:
Attaching to process ID 22651, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.211-b12
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 536870912 (512.0MB)
NewSize = 503316480 (480.0MB)
MaxNewSize = 503316480 (480.0MB)
OldSize = 33554432 (32.0MB)
NewRatio = 2
SurvivorRatio = 4
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 419430400 (400.0MB)
used = 297926560 (284.1249084472656MB)
free = 121503840 (115.87509155273438MB)
71.0312271118164% used
Eden Space:
capacity = 335544320 (320.0MB)
used = 297926560 (284.1249084472656MB)
free = 37617760 (35.875091552734375MB)
88.78903388977051% used
From Space:
capacity = 83886080 (80.0MB)
used = 0 (0.0MB)
free = 83886080 (80.0MB)
0.0% used
從OldSize可以看出來老年代是32m,而NewSize是480m,因為年青代設置的實在是太大而年老代太小導致的FGC頻繁次數嚴重。
其實從New Generation和Eden Space這兩段來看也行,最好值是50%左右,如果相差太大也是有問題的。
最后重新設置xmn由原來的480改成200好了。這次設置的比例是年青代:年老代
為1:2
這是一個最簡單的gc問題了。
總結一下FGC的原因:
(1) 調用System.gc()時,系統建議執行Full GC,但是不必然執行
(2) 老年代空間不足(老年代空間不足,在不GC就OOM,這其實可能是Major GC會和Full GC混淆使用情況)
(3) 方法區空間不足
(4) 通過Minor GC后進入老年代的平均大小大於老年代的可用內存
(5) 由Eden區、survivor space0 (From Space) 區向survivor space1 (To Space)區復制時,對象大小大於To Space可用內存,則把該對象轉存到老年代,且老年代的可用內存小於該對象大小,其實也就是老年代空間不足的情況而已。
粘貼幾個不錯的排查博客:
https://blog.csdn.net/ym15229994318ym/article/details/106525945
https://www.cnblogs.com/three-fighter/p/14644152.html # 這個不錯
腳本如下所示:
#!/bin/bash
readonly PROG=`basename $0`
readonly -a COMMAND_LINE=("$0" "$@")
usage() {
cat <<EOF
Usage: ${PROG} [OPTION]...
Find out the highest cpu consumed threads of java, and print the stack of these threads.
Example: ${PROG} -c 10
Options:
-p, --pid find out the highest cpu consumed threads from the specifed java process,
default from all java process.
-c, --count set the thread count to show, default is 5
-h, --help display this help and exit
EOF
exit $1
}
readonly ARGS=`getopt -n "$PROG" -a -o c:p:h -l count:,pid:,help -- "$@"`
[ $? -ne 0 ] && usage 1
eval set -- "${ARGS}"
while true; do
case "$1" in
-c|--count)
count="$2"
shift 2
;;
-p|--pid)
pid="$2"
shift 2
;;
-h|--help)
usage
;;
--)
shift
break
;;
esac
done
count=${count:-5}
redEcho() {
[ -c /dev/stdout ] && {
# if stdout is console, turn on color output.
echo -ne "\033[1;31m"
echo -n "$@"
echo -e "\033[0m"
} || echo "$@"
}
yellowEcho() {
[ -c /dev/stdout ] && {
# if stdout is console, turn on color output.
echo -ne "\033[1;33m"
echo -n "$@"
echo -e "\033[0m"
} || echo "$@"
}
blueEcho() {
[ -c /dev/stdout ] && {
# if stdout is console, turn on color output.
echo -ne "\033[1;36m"
echo -n "$@"
echo -e "\033[0m"
} || echo "$@"
}
# Check the existence of jstack command!
if ! which jstack &> /dev/null; then
[ -z "$JAVA_HOME" ] && {
redEcho "Error: jstack not found on PATH!"
exit 1
}
! [ -f "$JAVA_HOME/bin/jstack" ] && {
redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack file does NOT exists!"
exit 1
}
! [ -x "$JAVA_HOME/bin/jstack" ] && {
redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack is NOT executalbe!"
exit 1
}
export PATH="$JAVA_HOME/bin:$PATH"
fi
readonly uuid=`date +%s`_${RANDOM}_$$
cleanupWhenExit() {
rm /tmp/${uuid}_* &> /dev/null
}
trap "cleanupWhenExit" EXIT
printStackOfThreads() {
local line
local count=1
while IFS=" " read -a line ; do
local pid=${line[0]}
local threadId=${line[1]}
local threadId0x="0x`printf %x ${threadId}`"
local user=${line[2]}
local pcpu=${line[4]}
local jstackFile=/tmp/${uuid}_${pid}
[ ! -f "${jstackFile}" ] && {
{
if [ "${user}" == "${USER}" ]; then
jstack ${pid} > ${jstackFile}
else
if [ $UID == 0 ]; then
sudo -u ${user} jstack ${pid} > ${jstackFile}
else
redEcho "[$((count++))] Fail to jstack Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user})."
redEcho "User of java process($user) is not current user($USER), need sudo to run again:"
yellowEcho " sudo ${COMMAND_LINE[@]}"
echo
continue
fi
fi
} || {
redEcho "[$((count++))] Fail to jstack Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user})."
echo
rm ${jstackFile}
continue
}
}
blueEcho "[$((count++))] Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user}):"
sed "/nid=${threadId0x} /,/^$/p" -n ${jstackFile}
done
}
ps -Leo pid,lwp,user,comm,pcpu --no-headers | {
[ -z "${pid}" ] &&
awk '$4=="java"{print $0}' ||
awk -v "pid=${pid}" '$1==pid,$4=="java"{print $0}'
} | sort -k5 -r -n | head --lines "${count}" | printStackOfThreads