Zabbix監控JVM(微服務進程)


Zabbix監控JVM(微服務進程)

老的方法感覺效果不好,又寫了一個模板和腳本,請移步:

http://www.cnops.top/posts/748ad64f.html

有興趣的可以繼續往下看。

Zabbix監控JVM(微服務進程)

1、ZabbixServer端配置

Zabbix服務器需安裝java,編譯需添加啟動參數--enable-java

本次安裝的編譯參數為:

./configure --prefix=/data/zabbix/ --enable-server --enable-agent --with-mysql --enable-ipv6 --with-net-snmp --with-libcurl --with-libxml2 --enable-java

2、ZabbixAgent端配置

ZabbixAgent端不僅需要安裝zabbix_agentd,還需要安裝zabbix_sender,可以通過地址http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/選擇合適的版本。

安裝

rpm -ivh http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-sender-3.0.9-1.el7.x86_64.rpm

3、監控原理

微服務的特性:

1、 每個進程是直接以java-jar service.jar的方式啟動,並沒有依賴於tomcat或者其他web應用。

2、 每台服務器上的微服務並沒有固定的數量,可以靈活的增加或者減少。

3、 每個微服務的啟動參數已有配置端口很多。

鑒於此種情況,傳統的監控方法監控微服務,會造成經常的手動去增加刪減web頁面配置,服務器內的端口管理也會很混亂。

所以使用discovery自動發現的方式去監控微服務。並將每個微服務的信息通過zabbix_sender發送到ZabbixServer端。

首先java版本為jdk1.8

# java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

 

關於微服務的信息主要通過jstat獲取,如下

 

# ps -ef|grep java
root     28131     1  0 11:17 ?        00:00:56 java -Xms100M -Xmx500M -Xmn150M -jar /data/work/service_jar/manageMiddle.jar --server.port=20000 --management.port=20001 --config.profile=test
root     28305     1  0 11:26 ?        00:00:51 java -Xms100M -Xmx300M -Xmn100M -jar /data/work/service_jar/resourceService.jar --server.port=18000 --management.port=18001 --config.profile=test
root     29067     1  0 11:59 ?        00:00:54 java -Xms100M -Xmx500M -Xmn150M -jar /data/work/service_jar/systemService.jar --server.port=21000 --management.port=21001 --config.profile=test
root     31345 29980  0 14:03 pts/0    00:00:00 grep --color=auto java
# jstat -gcutil 28131
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
 67.75   0.00  74.28  81.92  97.29  94.90     74    1.248     7    1.065    2.313
# jstat -gc 28131
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
14336.0 14848.0 9712.2  0.0   122880.0 91488.9   55808.0    45716.6   56704.0 55169.1 7296.0 6924.1     74    1.248   7      1.065    2.313

關於輸出結果的參數解釋

S0C:年輕代中第一個survivor(幸存區)的容量 (字節)
S1C:年輕代中第二個survivor(幸存區)的容量 (字節)
S0U:年輕代中第一個survivor(幸存區)目前已使用空間 (字節)
S1U:年輕代中第二個survivor(幸存區)目前已使用空間 (字節)
EC:年輕代中Eden(伊甸園)的容量 (字節)
EU:年輕代中Eden(伊甸園)目前已使用空間 (字節)
OC:Old代的容量 (字節)
OU:Old代目前已使用空間 (字節)
PC:Perm(持久代)的容量 (字節)
PU:Perm(持久代)目前已使用空間 (字節)
YGC:從應用程序啟動到采樣時年輕代中gc次數
YGCT:從應用程序啟動到采樣時年輕代中gc所用時間(s)
FGC:從應用程序啟動到采樣時old代(全gc)gc次數
FGCT:從應用程序啟動到采樣時old代(全gc)gc所用時間(s)
GCT:從應用程序啟動到采樣時gc用的總時間(s)

NGCMN:年輕代(young)中初始化(最小)的大小 (字節)

NGCMX:年輕代(young)的最大容量 (字節)

NGC:年輕代(young)中當前的容量 (字節)

OGCMN:old代中初始化(最小)的大小 (字節)

OGCMX:old代的最大容量 (字節)

OGC:old代當前新生成的容量 (字節)

PGCMN:perm代中初始化(最小)的大小 (字節)

PGCMX:perm代的最大容量 (字節)

PGC:perm代當前新生成的容量 (字節)

S0:年輕代中第一個survivor(幸存區)已使用的占當前容量百分比

S1:年輕代中第二個survivor(幸存區)已使用的占當前容量百分比

E:年輕代中Eden(伊甸園)已使用的占當前容量百分比

O:old代已使用的占當前容量百分比

P:perm代已使用的占當前容量百分比

S0CMX:年輕代中第一個survivor(幸存區)的最大容量 (字節)

S1CMX :年輕代中第二個survivor(幸存區)的最大容量 (字節)

ECMX:年輕代中Eden(伊甸園)的最大容量 (字節)

DSS:當前需要survivor(幸存區)的容量 (字節)(Eden區已滿)

TT: 持有次數限制

MTT : 最大持有次數限制

Jdk1.8中取消了永久區Perm

4、監控腳本

微服務全部放置在固定的目錄內,自動發現微服務腳本為

# cat java_discovery.py 
#/usr/bin/python
#This script is used to discovery disk on the server
import subprocess
import os
import socket
import json
import glob
  
java_names_file='java_names.txt'
javas=[]
if os.path.isfile(java_names_file):
#   print 'java_names_file exists!'
#####
##### here should use % (java_names_file) instead of using the python variable java_names_file directly inside the '''   ''' quotes
#####
  
   args='''awk -F':' '{print $1':'$2}' %s'''  % (java_names_file)
   t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0]
#elif glob.glob('/opt/xx/*_tomcat') and not os.path.isdir('/opt/logs/logstash') and not os.path.isdir('/opt/app/elasticsearch/config'):
elif glob.glob('/data/work/service_jar/*.jar'): 
  t=subprocess.Popen('cd /data/work/service_jar && ls *.jar|grep jar',shell=True,stdout=subprocess.PIPE)
   
for java in t.stdout.readlines():
    if len(java) != 0:
       javas.append({'{#JAVA_NAME}':java.strip('\n').strip(':')})
print json.dumps({'data':javas},indent=4,separators=(',',':'))

 

腳本內的目錄可以自由修改

 

輸出結果為json格式

# python java_discovery.py 
{
    "data":[
        {
            "{#JAVA_NAME}":"insuranceService.jar"
        },
        {
            "{#JAVA_NAME}":"manageMiddle.jar"
        },
        {
            "{#JAVA_NAME}":"resourceService.jar"
        },
        {
            "{#JAVA_NAME}":"systemService.jar"
        }
    ]
}

 

對微服務進行信息獲取,並利用zabbix_sender發送的腳本為

 

# cat jstat_status.py
#!/usr/bin/python
  
import subprocess
import sys
import os
  
__maintainer__ = "Francis"
  
jps = '/data/jdk1.8/bin/jps'
jstat = '/data/jdk1.8/bin/jstat'
zabbix_sender = "/usr/bin/zabbix_sender"
zabbix_conf = "/etc/zabbix/zabbix_agentd.conf"      
send_to_zabbix = 1
ip=os.popen("ifconfig|grep 'inet '|grep -v '127.0'|xargs|awk -F '[ :]' '{print $3}'").readline().rstrip()
serverip="172.19.138.53"
  
#"{#JAVA_NAME}":"tomcat_web_1"
  
  
def usage():
    """Display program usage"""
  
    print "\nUsage : ", sys.argv[0], " java_name alive|all"
    print "Modes : \n\talive : Return pid of running processs\n\tall : Send jstat stats as well"
    sys.exit(1)
  
  
class Jprocess:
  
    def __init__(self, arg):
        self.pdict = {
        "jpname": arg,
        }
  
        self.zdict = {
        "Heap_used" : 0,
                "Heap_ratio" : 0,
        "Heap_max" : 0,
        "Perm_used" : 0,
                "Perm_ratio" : 0,
        "Perm_max"  : 0,
                "S0_used"   : 0,
                "S0_ratio"  : 0,
                "S0_max"    : 0,
                "S1_used"   : 0,
                "S1_ratio"  : 0,
                "S1_max"    : 0,
                "Eden_used" : 0,
                "Eden_ratio" : 0,
                "Eden_max"  : 0,
                "Old_used"  : 0,
                "Old_ratio" : 0,
                "Old_max"   : 0,
                "YGC"       : 0,
                "YGCT"      : 0,
                "YGCT_avg"      : 0,
                "FGC"       : 0,
                "FGCT"      : 0,
                "FGCT_avg"      : 0,
                "GCT"       : 0,
                "GCT_avg"       : 0,
                  
        }
  
  
    def chk_proc(self):
#  ps -ef|grep java|grep tomcat_web_1|awk '{print $2}'
#                print self.pdict['jpname']
                pidarg = '''ps -ef|grep java|grep %s|grep -v grep | grep -v jstat_status.py |awk '{print $2}' ''' %(self.pdict['jpname']) 
                #pidout = subprocess.Popen(pidarg,shell=True,stdout=subprocess.PIPE) 
                #pid = pidout.stdout.readline().strip('\n') 
                pid = subprocess.check_output(pidarg, shell=True).strip()
                if pid != "" :
                   self.pdict['pid'] = pid
#                   print "Process found :", java_name, "with pid :", self.pdict['pid']
                else:
                   self.pdict['pid'] = ""
#                   print "Process not found"
                return self.pdict['pid']
  
    def get_jstats(self):
        if self.pdict['pid'] == "":
            return False
        self.pdict.update(self.fill_jstats("-gc"))
        self.pdict.update(self.fill_jstats("-gccapacity"))
        self.pdict.update(self.fill_jstats("-gcutil"))
  
#        print "\nDumping collected stat dictionary\n-----\n", self.pdict, "\n-----\n"
  
    def fill_jstats(self, opts):
#        print "\nGetting", opts, "stats for process", self.pdict['pid'], "with command : sudo", jstat, opts, self.pdict['pid'] ,"\n"
#        jstatout = subprocess.Popen(['sudo','-u','tomcat', jstat, opts, self.pdict['pid']], stdout=subprocess.PIPE)
        #print([jstat, opts, self.pdict['pid']])
        jstatout = subprocess.Popen([jstat, opts, self.pdict['pid']], stdout=subprocess.PIPE)
        stdout, stderr = jstatout.communicate()
        legend, data = stdout.split('\n',1)
        mydict = dict(zip(legend.split(), data.split()))
        return mydict
  
    def compute_jstats(self):
        if self.pdict['pid'] == "":
            return False
        self.zdict['S0_used'] = format(float(self.pdict['S0U']) * 1024,'0.2f')
        self.zdict['S0_max'] =  format(float(self.pdict['S0C']) * 1024,'0.2f')
        self.zdict['S0_ratio'] = format(float(self.pdict['S0']),'0.2f')
 
        self.zdict['S1_used'] = format(float(self.pdict['S1U']) * 1024,'0.2f')
        self.zdict['S1_max'] = format(float(self.pdict['S1C']) * 1024,'0.2f')
        self.zdict['S1_ratio'] = format(float(self.pdict['S1']),'0.2f')
  
        self.zdict['Old_used'] = format(float(self.pdict['OU']) * 1024,'0.2f')
        self.zdict['Old_max'] =  format(float(self.pdict['OC']) * 1024,'0.2f')
        self.zdict['Old_ratio'] = format(float(self.pdict['O']),'0.2f')
 
        self.zdict['Eden_used'] = format(float(self.pdict['EU']) * 1024,'0.2f')
        self.zdict['Eden_max'] = format(float(self.pdict['EC']) * 1024,'0.2f')
        self.zdict['Eden_ratio'] = format(float(self.pdict['E']),'0.2f')            
# self.zdict['Perm_used'] = format(float(self.pdict['PU']) * 1024,'0.2f')
# self.zdict['Perm_max'] = format(float(self.pdict['PC']) * 1024,'0.2f')
# self.zdict['Perm_ratio'] = format(float(self.pdict['P']),'0.2f')
                 
        self.zdict['Heap_used'] = format((float(self.pdict['EU']) + float(self.pdict['S0U']) + float(self.pdict['S1U'])  + float(self.pdict['OU'])) * 1024,'0.2f')
        self.zdict['Heap_max'] = format((float(self.pdict['EC']) + float(self.pdict['S0C']) + float(self.pdict['S1C'])  + float(self.pdict['OC'])) * 1024,'0.2f')
        self.zdict['Heap_ratio'] = format(float(self.zdict['Heap_used']) / float(self.zdict['Heap_max'])*100,'0.2f')
 
        self.zdict['YGC'] = self.pdict['YGC']
        self.zdict['FGC'] = self.pdict['FGC']
        self.zdict['YGCT'] = format(float(self.pdict['YGCT']),'0.3f')
        self.zdict['FGCT'] = format(float(self.pdict['FGCT']),'0.3f')
        self.zdict['GCT'] = format(float(self.pdict['GCT']),'0.3f') 
     
        if self.pdict['YGC'] == '0':
           self.zdict['YGCT_avg'] = '0'
        else: 
           self.zdict['YGCT_avg'] = format(float(self.pdict['YGCT'])/float(self.pdict['YGC']),'0.3f')
        if self.pdict['FGC'] == '0':
           self.zdict['FGCT_avg'] = '0'
        else:
           self.zdict['FGCT_avg'] = format(float(self.pdict['FGCT'])/float(self.pdict['FGC']),'0.3f')
        if self.pdict['YGC'] == '0' and self.pdict['FGC'] == '0':
           self.zdict['GCT_avg'] = '0' 
        else:
           self.zdict['GCT_avg'] = format(float(self.pdict['GCT'])/(float(self.pdict['YGC']) + float(self.pdict['FGC'])),'0.3f') 
                   
  
       # print "Dumping zabbix stat dictionary\n-----\n", self.zdict, "\n-----\n"
  
    def send_to_zabbix(self, metric):
####      {#JAVA_NAME} tomcat_web_1 
####      UserParameter=java.discovery,/usr/bin/python /opt/app/zabbix/sbin/java_discovery.py
####      UserParameter=java.discovery_status[*],/opt/app/zabbix/sbin/jstat_status.sh $1 $2 $3 $4 
####      java.discovery_status[tomcat_web_1,Perm_used]
####      java.discovery_status[{#JAVA_NAME},Perm_used]
        key = "java.discovery_status[" + self.pdict['jpname'] + "," + metric + "]"
  
        if self.pdict['pid'] != "" and  send_to_zabbix > 0:
           #print key + ":" + str(self.zdict[metric])
           try:
                                       
                 subprocess.call([zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", str(self.zdict[metric])], stdout=FNULL,stderr=FNULL, shell=False) 
           except OSError, detail:
                 print "Something went wrong while exectuting zabbix_sender : ", detail
        else:
           print "Simulation: the following command would be execucted :\n", zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", self.zdict[metric], "\n"
  
  
accepted_modes = ['alive', 'all']
  
 
if len(sys.argv) == 3 and sys.argv[2] in accepted_modes:
    java_name = sys.argv[1]
    mode = sys.argv[2]
else:
    usage()
  
  
#Check if process is running / Get PID
jproc = Jprocess(java_name) 
pid = jproc.chk_proc()
  
  
if pid != "" and  mode == 'all':
   jproc.get_jstats()
   #print jproc.zdict
   jproc.compute_jstats()               
   FNULL = open(os.devnull, 'w')
   for key in jproc.zdict:
       #print key,jproc.zdict[key]
       jproc.send_to_zabbix(key)
   FNULL.close()
  # print pid
 
  
else:
   print 0

觸發腳本為

# cat java_discovery_status_sender.py 
#/usr/bin/python
#This script is used to discovery disk on the server
import subprocess
import os
import socket
import json
import glob
    
java_names_file='java_names.txt'
javas=[]
if os.path.isfile(java_names_file):
  
   args='''awk -F':' '{print $1':'$2}' %s'''  % (java_names_file)
   t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0]
elif glob.glob('/data/work/service_jar/*.jar'): 
  t=subprocess.Popen('cd /data/work/service_jar && ls *.jar|grep jar',shell=True,stdout=subprocess.PIPE)
  res=subprocess.check_output('cd /data/work/service_jar && ls *.jar|grep jar',stderr=subprocess.STDOUT,shell = True)
 
for java in t.stdout.readlines():
    if len(java) != 0:
       javas.append({'{#JAVA_NAME}':java.strip('\n').strip(':')})
#print json.dumps({'data':javas},indent=4,separators=(',',':'))
 
#print res
 
for java in res.strip().split("\n"):
    if java:
        #print java
        out = subprocess.check_output("python /etc/zabbix/scripts/java/jstat_status.py %s all" % java, shell=True)
        #print(out)

其中web界面配置Host name的參數必須與Agent端配置文件內Hostname的參數完全相同

將腳本java_discovery_status_sender.py加入crontab

*/1 * * * * root /usr/bin/python /etc/zabbix/scripts/java/java_discovery_status_sender.py

每分鍾觸發一次,向server端發信息

5、userparameter配置

路徑及內容如下

# pwd
/etc/zabbix/zabbix_agentd.d
# cat userparameter_java_discovery_status.conf 
UserParameter=java.discovery,/usr/bin/python /etc/zabbix/scripts/java/java_discovery.py
UserParameter=java.discovery_status[*],/usr/bin/python /etc/zabbix/scripts/java/jstat_status.py $1 $2
UserParameter=java.discovery_status_sender,/usr/bin/python /etc/zabbix/scripts/java/java_discovery_status_sender.py

UserParameter=java.discovery_status[*]和UserParameter=java.discovery_status_sender的作用和原理都是一樣的,只不過在Agent端可以執行,在Server端通過zabbix_get調用都會出錯,暫時沒有找到更好的解決方法。所以通過crontab的方法定時向Server端發送監控信息。如果能夠解決問題,或者有更好的解決方法請聯系我。

重啟ZabbixAgent端

6、導入模板

模板下載地址:https://download.csdn.net/download/fjp824/10462387

導入模板,並將對應的host關聯

可以在Monitoring>Latest data頁面根據主機,查看zabbix_trapper接收到的監控信息,如下所以

查看圖表展示正常,監控完成。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM