Flume本身提供了http, ganglia的監控服務,而我們目前主要使用zabbix做監控。因此,我們為Flume添加了zabbix監控模塊,和sa的監控服務無縫融合。
另一方面,凈化Flume的metrics。只將我們需要的metrics發送給zabbix,避免 zabbix server造成壓力。目前我們最為關心的是Flume能否及時把應用端發送過來的日志寫到Hdfs上, 對應關注的metrics為:
- Source : 接收的event數和處理的event數
- Channel : Channel中擁堵的event數
- Sink : 已經處理的event數
zabbix安裝
http://my.oschina.net/yunnet/blog/173161
zabbix監控Flume
#JVM性能監控
Young GC counts
sudo /usr/local/jdk1.7.0_21/bin/jstat -gcutil $(pgrep java)|tail -1|awk '{print $6}'
Full GC counts
sudo /usr/local/jdk1.7.0_21/bin/jstat -gcutil $(pgrep java)|tail -1|awk '{print $8}'
JVM total memory usage
sudo /usr/local/jdk1.7.0_21/bin/jmap -histo $(pgrep java)|grep Total|awk '{print $3}'
JVM total instances usage
sudo /usr/local/jdk1.7.0_21/bin/jmap -histo $(pgrep java)|grep Total|awk '{print $2}'
#flume應用參數監控
啟動時加上JSON repoting參數,這樣就可以通過http://localhost:34545/metrics訪問
bin/flume-ng agent -n consumer -c conf -f bin/conf.properties -Dflume.monitoring.type=http -Dflume.monitoring.port=34545 &
#生成一些數據
for i in {1..100};do echo "exec test$i" >> /usr/logs/log.10;echo $i;done
#通過shell腳本對JSON輸出進行排版
curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'
SOURCE.kafka:
OpenConnectionCount:0
AppendBatchAcceptedCount:0
AppendBatchReceivedCount:0
Type:SOURCE
EventAcceptedCount:7252225
AppendReceivedCount:0
StopTime:0
EventReceivedCount:0
StartTime:1407731371546
AppendAcceptedCount:0
SINK.es:
BatchCompleteCount:10697
ConnectionFailedCount:0
EventDrainAttemptCount:7253061
ConnectionCreatedCount:1
BatchEmptyCount:226
Type:SINK
ConnectionClosedCount:0
EventDrainSuccessCount:7253061
StopTime:0
StartTime:1407731371546
BatchUnderflowCount:14857
SINK.hdp:
BatchCompleteCount:1290
ConnectionFailedCount:0
EventDrainAttemptCount:8057502
ConnectionCreatedCount:35787
BatchEmptyCount:54894
Type:SINK
ConnectionClosedCount:35609
EventDrainSuccessCount:8057502
StopTime:0
StartTime:1407731371545
BatchUnderflowCount:45433
--------------$1 變量!!!eg:EventDrainSuccessCount(source,channel,sink)
#配置監控flume的腳本文件
cat /opt/monitor_flume.sh
curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'|grep $1|awk -F: '{print $2}'
curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'|grep Total|awk -F: '{print $2}'
curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'|grep StartTime|awk -F: '{print $2}'
#在zabbix agent配置文件進行部署
cat zabbix_flume_jdk.conf
UserParameter=ygc.counts,sudo /usr/local/jdk1.7.0_21/bin/jstat -gcutil $(pgrep java|head -1)|tail -1|awk '{print $6}'
UserParameter=fgc.counts,sudo /usr/local/jdk1.7.0_21/bin/jstat -gcutil $(pgrep java|head -1)|tail -1|awk '{print $8}'
UserParameter=jvm.memory.usage,sudo /usr/local/jdk1.7.0_21/bin/jmap -histo $(pgrep java|head -1)|grep Total|awk '{print $3}'
UserParameter=jvm.instances.usage,sudo /usr/local/jdk1.7.0_21/bin/jmap -histo $(pgrep java|head -1)|grep Total|awk '{print $2}'
UserParameter=flume.monitor[*],/bin/bash /opt/monitor_flume.sh $1