如果我們想知道當前運行的hadoop集群的狀態,可以通過hadoop的客戶端和web頁面來獲得,但是如果我們想知道當前集群的繁忙程度,如讀寫次數,這些工具就辦不到了。幸運的是hadoop提供了一種ganglia的數據采集方式。在這篇文章里,將介紹一下hadoop與ganglia的配置方式。
Hadoop 版本:1.2.1
OS 版本: Centos6.4
Jdk 版本: jdk1.6.0_32
Ganglia 版本:3.1.7
環境配置
機器名 |
Ip地址 |
功能 |
Hadoop1 |
192.168.124.135 |
namenode, datanode, secondNameNode jobtracker, tasktracer |
Hadoop2 |
192.168.124.136 |
Datanode, tasktracker |
Hadoop3 |
192.168.124.137 |
Datanode, tasktracker |
ganglia |
192.168.124.140 |
Gmetad,gmond ganglia-web |
基本架構
hadoop1, hadoop2, hadoop將數據發送給ganglia節點上的gmond, gmetad定期向gmond獲取數據,最后通過httpd顯示出來。
安裝ganglia
Yum倉庫中沒有ganglia,需要安裝一個epel倉庫
rpm -Uvh http://dl.Fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
在ganglia依次運行
Yum install ganglia-gmetad
Yum install ganglia-gmond
Yum install ganglia-web
運行完這三條命令后,整個ganglia環境就准備好了,包括httpd,php
配置ganglia
vi /etc/ganglia/gmetad.conf 修改data_source
data_source "my_cluster" ganglia
vi /etc/ganglia/gmond.conf
單播模式
cluster {
name = "my_cluster"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
udp_send_channel {
#bind_hostname = yes # Highly recommended, soon to be default.
# This option tells gmond to use a source address
# that resolves to the machine's hostname. Without
# this, the metrics may appear to come from any
# interface and the DNS names associated with
# those IPs will be used to create the RRDs.
#mcast_join = 239.2.11.71
host = 192.168.124.140
port = 8649
ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
#mcast_join = 239.2.11.71
port = 8649
#bind = 239.2.11.71
}
vi conf/hadoop-metrics2.properties
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
namenode.sink.ganglia.servers=192.168.124.140:8649
datanode.sink.ganglia.servers=192.168.124.140:8649
jobtracker.sink.ganglia.servers=192.168.124.140:8649
tasktracker.sink.ganglia.servers=192.168.124.140:8649
maptask.sink.ganglia.servers=192.168.124.140:8649
reducetask.sink.ganglia.servers=192.168.124.140:8649
啟動
先關閉防火牆: service iptables stop
啟動httpd: service httpd start
啟動gmetad: service gmetad start
啟動gmond: service gmond start
啟動 hadoop集群:bin/start-all.sh
結果
從圖上可以看出,我們已經成功的顯出ganglia, hadoop1, hadoop2, hadoop3的信息
Hadoop2和hadoop3都監控datanode,tasktracker,他們顯示的metric是一樣的
Hadoop1比hadoop2,hadoop3多運行三個組件:namenode, secondnamenode, jobtracker,所以會多出dfs.FSNameSystem metrics,dfs.namenode metrics,mapred.Queue metrics,mapred.jobtracker metrics
下面我們將列出hadoop1節點上所有metric的圖,有興趣的可以看一看。
結論
- 此hadoop集群是沒有啟動security,因為ugi沒有數據
- 可以看出hadoop的一些參數信息
- 可以看出目前hadoop的一些系統信息,是否繁忙