一、簡介
環境介紹
角色
172.16.133.82 InfluxDb 172.16.133.82 Grafana 172.16.133.82 jmxtrans kafka 172.16.133.82 node1
軟件版本
influxdb-1.7.7.x86_64.rpm grafana-6.2.5-1.x86_64.rpm jmxtrans-266.rpm kafka_2.12-0.10.2.1
二、配置規划
- jmxtrans可以分別在每台kafka節點上部署,也可以部署到一台機器上,這里是選擇了后者,因為集群小,這樣配置文件可以集中管理,如果集群比較大,可以考慮分散部署
- 關於jmxtrans的配置文件,分全局指標(每個kafka節點)和topic指標,全局指標每個節點一個配置文件,命名規則:base_172.16.133.82.json,topic指標是每個topic一個配置文件,命名規則:falcon_monitor_us_82.json
三、監控指標
全局指標
每秒輸入的流量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec" "attr" : [ "Count" ] "resultAlias":"BytesInPerSec" "tags" : {"application" : "BytesInPerSec"}
每秒輸出的流量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec" "attr" : [ "Count" ] "resultAlias":"BytesOutPerSec" "tags" : {"application" : "BytesOutPerSec"}
每秒輸入的流量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec" "attr" : [ "Count" ] "resultAlias":"BytesRejectedPerSec" "tags" : {"application" : "BytesRejectedPerSec"}
每秒的消息寫入總量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec" "attr" : [ "Count" ] "resultAlias":"MessagesInPerSec" "tags" : {"application" : "MessagesInPerSec"}
每秒FetchFollower的請求次數
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower" "attr" : [ "Count" ] "resultAlias":"RequestsPerSec" "tags" : {"request" : "FetchFollower"}
每秒FetchConsumer的請求次數
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer" "attr" : [ "Count" ] "resultAlias":"RequestsPerSec" "tags" : {"request" : "FetchConsumer"}
每秒Produce的請求次數
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce" "attr" : [ "Count" ] "resultAlias":"RequestsPerSec" "tags" : {"request" : "Produce"}
內存使用的使用情況
"obj" : "java.lang:type=Memory" "attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ] "resultAlias":"MemoryUsage" "tags" : {"application" : "MemoryUsage"}
GC的耗時和次數
"obj" : "java.lang:type=GarbageCollector,name=*" "attr" : [ "CollectionCount","CollectionTime" ] "resultAlias":"GC" "tags" : {"application" : "GC"}
線程的使用情況
"obj" : "java.lang:type=Threading" "attr" : [ "PeakThreadCount","ThreadCount" ] "resultAlias":"Thread" "tags" : {"application" : "Thread"}
副本落后主分片的最大消息數量
"obj" : "kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica" "attr" : [ "Value" ] "resultAlias":"ReplicaFetcherManager" "tags" : {"application" : "MaxLag"}
該broker上的partition的數量
"obj" : "kafka.server:type=ReplicaManager,name=PartitionCount" "attr" : [ "Value" ] "resultAlias":"ReplicaManager" "tags" : {"application" : "PartitionCount"}
正在做復制的partition的數量
"obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions" "attr" : [ "Value" ] "resultAlias":"ReplicaManager" "tags" : {"application" : "UnderReplicatedPartitions"}
Leader的replica的數量
"obj" : "kafka.server:type=ReplicaManager,name=LeaderCount" "attr" : [ "Value" ] "resultAlias":"ReplicaManager" "tags" : {"application" : "LeaderCount"}
一個請求FetchConsumer耗費的所有時間
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer" "attr" : [ "Count","Max" ] "resultAlias":"TotalTimeMs" "tags" : {"application" : "FetchConsumer"}
一個請求FetchFollower耗費的所有時間
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower" "attr" : [ "Count","Max" ] "resultAlias":"TotalTimeMs" "tags" : {"application" : "FetchFollower"}
一個請求Produce耗費的所有時間
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce" "attr" : [ "Count","Max" ] "resultAlias":"TotalTimeMs" "tags" : {"application" : "Produce"}
topic的監控指標
falcon_monitor_us每秒的寫入流量
"kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=falcon_monitor_us" "attr" : [ "Count" ] "resultAlias":"falcon_monitor_us" "tags" : {"application" : "BytesInPerSec"}
falcon_monitor_us每秒的輸出流量
"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=falcon_monitor_us" "attr" : [ "Count" ] "resultAlias":"falcon_monitor_us" "tags" : {"application" : "BytesOutPerSec"}
falcon_monitor_us每秒寫入消息的數量
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=falcon_monitor_us" "attr" : [ "Count" ] "resultAlias":"falcon_monitor_us" "tags" : {"application" : "MessagesInPerSec"}
falcon_monitor_us在每個分區最后的Offset
"obj" : "kafka.log:type=Log,name=LogEndOffset,topic=falcon_monitor_us,partition=*" "attr" : [ "Value" ] "resultAlias":"falcon_monitor_us" "tags" : {"application" : "LogEndOffset"}
參數說明
obj
對應jmx的ObjectName,就是要監控的指標
attr
對應ObjectName的屬性,可以理解為要監控的指標的值
resultAlias
對應metric 的名稱,在InfluxDb里面就是MEASUREMENTS名
tags
對應InfluxDb的tag功能,對與存儲在同一個MEASUREMENTS里面的不同監控指標可以做區分,我們在用Grafana繪圖的時候會用到,建議對每個監控指標都打上tags
對於全局監控,每一個監控指標對應一個MEASUREMENTS,所有的kafka節點同一個監控指標數據寫同一個MEASUREMENTS ,對於topc監控的監控指標,同一個topic所有kafka節點寫到同一個MEASUREMENTS,並且以topic名稱命名
四、安裝與配置
kafka
因為需要通過jmx采集kafka的監控數據,所以在kafka的啟動時候需要啟動jmx端口,啟動方式如下:
cd /data/kafka/bin/ JMX_PORT=9999 nohup ./kafka-server-start.sh ../config/server.properties >/dev/null 2>&1 &
或者在啟動kafka的腳本kafka-server-start.sh中找到堆設置,添加export JMX_PORT="9999"
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G" export JMX_PORT="9999" fi
influxDb
創建jmxDB數據庫:
[devuser@annie thirdparties]$ influx Connected to http://localhost:8086 version 1.6.2 InfluxDB shell version: 1.7.7 > CREATE DATABASE "jmxDB" > create retention policy "72_hour" on jmxDB duration 72h replication 1 DEFAULT >
jmxtrans
#判斷是否已安裝此軟件 rpm -qa |grep jmx #卸載 rpm -e jmxXXXXXX #下載 wget https://github.com/downloads/jmxtrans/jmxtrans/jmxtrans-20121016.145842.6a28c97fbb-0.noarch.rpm#安裝 rpm -ivh jmxtrans-20121016.145842.6a28c97fbb-0.noarch.rpm#啟動[啟動前配置好/var/lib/jmxtrans下的json配置]
#啟動 必須root用戶啟動
/etc/init.d/jmxtrans start
#或
./jmxtrans.sh start
說明:
這些只是默認目錄,如果用 jmxtrans.sh start 啟動的話,是不會默認這些目錄的 ,如果用 /etc/init.d/jmxtrans start 啟動,會有一些報錯
jmxtrans安裝目錄:/usr/share/jmxtrans
jmxtrans配置文件 :/etc/sysconfig/jmxtrans
json配置文件默認目錄:/var/lib/jmxtrans/
去安裝目錄建立json和log目錄
cd /usr/share/jmxtrans
mkdir json
mkdir logs
這里在用 /etc/init.d/jmxtrans start 啟動時報錯如下:
報錯一:
Caused by: java.lang.IllegalArgumentException: Invalid type id 'com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory' (for id type 'Id.class'): no such class found at org.codehaus.jackson.map.jsontype.impl.ClassNameIdResolver.typeFromId(ClassNameIdResolver.java:89) at org.codehaus.jackson.map.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:73) at org.codehaus.jackson.map.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:65) at org.codehaus.jackson.map.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:81) at org.codehaus.jackson.map.deser.CollectionDeserializer.deserialize(CollectionDeserializer.java:118)
解決方案:
官網找到github地址下載源碼,重新編譯將jar包替換,去修改jmxtrans.sh
腳本,將項目所用jar包替換為重新編譯生成的
git clone https://github.com/jmxtrans/jmxtrans.git mvn clean package -Dmaven.test.skip=true -DskipTests=true;
cd /usr/share/jmxtrans vim jmxtrans.conf #export JAR_FILE="/usr/share/jmxtrans/jmxtrans-all.jar" export JAR_FILE="/usr/share/jmxtrans/jmxtrans-271-all.jar" vim jmxtrans.sh #JAR_FILE=${JAR_FILE:-"jmxtrans-all.jar"} JAR_FILE=${JAR_FILE:-"jmxtrans-271-all.jar"}
對比一下發現編譯的包是有這個類的,而自帶的那個沒有
[devuser@annie jmxtrans]$ grep 'com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory' ./jmxtrans-271-all.jar Binary file ./jmxtrans-271-all.jar matches [devuser@annie jmxtrans]$ grep 'com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory' ./jmxtrans-all.jar [devuser@annie jmxtrans]$
報錯二:
Starting jmxtrans: [ OK ] Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=384m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=384m; support was removed in 8.0 MaxTenuringThreshold of 16 is invalid; must be between 0 and 15 Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.
解決方案:
#JDK8 里Nimbus -XX:MaxTenuringThreshold 的最大值是15,默認配置里的是16 cd /usr/share/jmxtrans vim jmxtrans.sh #-XX:MaxTenuringThreshold=16 改為: -XX:MaxTenuringThreshold=15
jmxtrans默認讀取/var/lib/jmxtrans下的配置文件去采集數據的,所以需要把采集kafka監控數據的配置文件都放在這個目錄下,下面是是一些配置文件命名規范:
[root@annie thirdparties]# cd /var/lib/jmxtrans [root@annie jmxtrans]# ll total 0 [root@annie jmxtrans]# pwd /var/lib/jmxtrans [root@annie jmxtrans]# wget http://qu2lhckc6.hn-bkt.clouddn.com/jmxtrans-kafka/base_172.16.133.82.json [root@annie jmxtrans]# wget http://qu2lhckc6.hn-bkt.clouddn.com/jmxtrans-kafka/falcon_monitor_us_82.json [root@annie jmxtrans]# ll total 16 -rw-r--r-- 1 root root 8462 Jun 2 18:41 base_172.16.133.82.json -rw-r--r-- 1 root root 2029 Jun 2 18:41 falcon_monitor_us_82.json
重新啟動 /etc/init.d/jmxtrans start
然后在influxdb里可以看到數據已經生成
[devuser@annie jmxtrans]$ influx Connected to http://localhost:8086 version 1.6.2 InfluxDB shell version: 1.7.7 > show DATABASES name: databases name ---- _internal metrics jmxDB> use jmxDB Using database jmxDB > show MEASUREMENTS name: measurements name ---- BytesInPerSec BytesOutPerSec BytesRejectedPerSec GC MemoryUsage MessagesInPerSec ReplicaFetcherManager ReplicaManager RequestsPerSec Thread TotalTimeMs jvmMemory
小插曲:
如果這里查詢不到數據,先drop調database再重新創建,數據就能進去了
五、grafana的配置與預覽
鏈接: https://pan.baidu.com/s/1NGqdRYKRBCkzuAEESvnfCw 提取碼: qtrv
鏈接: https://pan.baidu.com/s/1xMMOuMwRQsEmTrrUxJf6lw
參考文獻
jmxtrans介紹與安裝
Kafka JMX 監控 之 jmxtrans + influxdb + grafana (內有json模板配置文件)