prometheus監控es,同樣采用exporter的方案。
項目地址:
elasticsearch_exporter:https://github.com/justwatchcom/elasticsearch_exporter
默認端口 9114
1、安裝部署
【1.0】封裝成系統服務、一鍵部署
前提,把二進制包復制過來放到當前目錄
vim es_exporter_install.sh
#!/bin/bash init(){ es_path_config=`ps -ef|grep elastic|grep "Des.path.conf"|sed 's# #\n#g'|grep "Des.path.conf"|awk -F'=' '{print $2}'`/ configfile=elasticsearch.yml ip=`cat ${es_path_config}${configfile}|grep network.host|awk -F":" '{print $2}'|sed 's/[[:space:]]//g'` port=`cat ${es_path_config}${configfile}|grep http.port|awk -F":" '{print $2}'|sed 's/[[:space:]]//g'` if [ ! "$ip" -o ! "$port" ];then echo "init is error,can't get the es's ip and port!" exit 1 fi if [ $ip == '0.0.0.0' ];then ip=127.0.0.1 fi tar -zxf elasticsearch_exporter-1.1.0.linux-amd64.tar.gz mv elasticsearch_exporter-1.1.0.linux-amd64 /usr/local/elasticsearch_exporter groupadd prometheus useradd -g prometheus -m -d /var/lib/prometheus -s /sbin/nologin prometheus chown -R prometheus.prometheus /usr/local/elasticsearch_exporter } run(){ if [ `uname -a |grep el7|wc -l` -eq 1 ];then cat << eof >/lib/systemd/system/es_exporter.service [Unit] Description=The es_exporter After=network.target [Service] PrivateTmp=true Type=simple User=prometheus ExecStart=/usr/local/elasticsearch_exporter/elasticsearch_exporter --es.uri=http://${ip}:${port} Restart=on-failure ExecStop=/bin/kill -s QUIT $MAINPID [Install] WantedBy=multi-user.target eof systemctl daemon-reload systemctl start es_exporter systemctl enable es_exporter elif [ `uname -a |grep el6|wc -l` -eq 1 ];then cat << eof >/etc/init.d/es_exporter #!/bin/bash # chkconfig: 2345 10 90 # description: es's exporter touch /var/log/es_exporter.log chown prometheus.prometheus /var/log/es_exporter.log es_path_config=`ps -ef|grep elastic|grep "Des.path.conf"|sed 's# #\n#g'|grep "Des.path.conf"|awk -F'=' '{print $2}'`/ configfile=elasticsearch.yml ip=`cat ${es_path_config}${configfile}|grep network.host|awk -F":" '{print $2}'|sed 's/[[:space:]]//g'` port=`cat ${es_path_config}${configfile}|grep http.port|awk -F":" '{print $2}'|sed 's/[[:space:]]//g'` if [ $ip == '0.0.0.0' ];then ip=127.0.0.1 fi su prometheus -s /bin/bash -c "/usr/local/elasticsearch_exporter/elasticsearch_exporter --es.uri=http://${ip}:${port} &" >> /var/log/es_exporter.log eof chown prometheus.prometheus /etc/init.d/es_exporter chmod +x /etc/init.d/es_exporter chkconfig --add es_exporter chkconfig --level 3 es_exporter on service es_exporter start else echo "your os not rel7/rel6,operator fail!" fi } main(){ init run } main ps -ef|grep elasticsearch_exporter
sh es_exporter_install.sh
【1.2】簡便一鍵部署、腳本方式
前提:把ES執行命令直接拿來放到一起
vim install_es.sh
#!/bin/bash mv elasticsearch_exporter /bin/elasticsearch_exporter chmod +x /bin/elasticsearch_exporter es_path_config=`ps -ef|grep elastic|grep "Des.path.conf"|sed 's# #\n#g'|grep "Des.path.conf"|awk -F'=' '{print $2}'`/ configfile=elasticsearch.yml ip=`cat ${es_path_config}${configfile}|grep network.host|awk -F":" '{print $2}'|sed 's/[[:space:]]//g'` port=`cat ${es_path_config}${configfile}|grep http.port|awk -F":" '{print $2}'|sed 's/[[:space:]]//g'` if [ ! "$ip" -o ! "$port" ];then echo 'init is error,can't get the es's ip and port!' exit 1 fi if [ $ip == '0.0.0.0' ];then ip=127.0.0.1 fi echo "nohup /bin/elasticsearch_exporter --es.uri="http://${ip}:${port}" --web.listen-address="0.0.0.0:9114" >>/var/log/es_exporter.log 2>&1 & " nohup /bin/elasticsearch_exporter --es.uri="http://${ip}:${port}" --web.listen-address="0.0.0.0:9114" >> /var/log/es_exporter.log 2>&1 & echo "nohup /bin/elasticsearch_exporter --es.uri="http://${ip}:${port}" --web.listen-address="0.0.0.0:9114" >>/var/log/es_exporter.log 2>&1 & " >>/etc/rc.local ps -ef|grep elasticsearch_exporter
【1.2】詳細步驟
接着分別在如上三台主機上進行如下配置:
wget https://github.com/justwatchcom/elasticsearch_exporter/releases/download/v1.1.0/elasticsearch_exporter-1.1.0.linux-amd64.tar.gz tar -zxf elasticsearch_exporter-1.1.0.linux-amd64.tar.gz mv elasticsearch_exporter-1.1.0.linux-amd64 /usr/local/elasticsearch_exporter
創建用戶等
groupadd prometheus useradd -g prometheus -m -d /var/lib/prometheus -s /sbin/nologin prometheus chown -R prometheus.prometheus /usr/local/elasticsearch_exporter
啟動監控客戶端:
nohup ./elasticsearch_exporter --web.listen-address ":9114" --es.uri http://192.168.75.21:9200 &
使用systemd管理:
cat << eof >>/lib/systemd/system/es_exporter.service [Unit] Description=The es_exporter After=network.target [Service] Type=simple User=prometheus ExecStart=/usr/local/elasticsearch_exporter/elasticsearch_exporter Restart=on-failure
ExecStop=/bin/kill -s QUIT $MAINPID
[Install]
WantedBy=multi-user.target
eof
啟動:
systemctl daemon-reload systemctl start es_exporter systemctl enable es_exporter
查看metrics:
curl 127.0.0.1:9114/metrics
2、配置 prometheus.yml 添加監控目標
vim /usr/local/prometheus/prometheus.yml
- job_name: 'elasticsearch'
scrape_interval: 60s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets: ['192.168.75.21:9308']
labels:
service: elasticsearch
重啟服務。
systemctl restart prometheus
或者通過命令熱加載:
curl -XPOST localhost:9090/-/reload
3、配置 Grafana 的模板
模板通過json文件進行導入,文件就在解壓的包內。
參考地址:https://shenshengkun.github.io/posts/550bdf86.html
或者通過如下ID進行導入:2322以及其他。



4、開啟認證的啟動方式
如果es開啟了認證,那么啟動的時候需要將用戶名密碼加載進去:
elasticsearch_exporter --web.listen-address ":9308" --es.uri http://username:password@192.168.75.21:9200 &
其中使用的是monitoring的用戶密碼。
當然,除去這種命令行的啟動方式之外,還可以像上邊一樣,基於systemd進行管理,只需將認證的參數信息寫入到如下內容當中:
參考網址:https://github.com/justwatchcom/elasticsearch_exporter

cat /etc/default/elasticsearch_exporter [Unit] Description=The es_exporter After=network.target [Service] Type=simple User=prometheus ExecStart=/usr/local/elasticsearch_exporter/elasticsearch_exporter --web.listen-address ":9308" --es.uri=http://username:password@192.168.75.21:9200 Restart=on-failure [Install] WantedBy=multi-user.target

【5】【最佳實踐】es_alert.yml
groups: - name: ES告警 rules: - alert: ES-集群狀態變紅 expr: elasticsearch_cluster_health_status{color="red"}==1 for: 1m labels: severity: warning annotations: description: "主/副本分片分配有誤,該問題發生在集群:{{ $labels.cluster }}" - alert: ES-集群狀態變黃 expr: elasticsearch_cluster_health_status{color="yellow"}==1 for: 1m labels: severity: warning annotations: description: "主/副本分片分配有誤,該問題發生在集群:{{ $labels.cluster }}." - alert: ES-JVM堆內存使用過高 expr: round(elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}*100,0.01)>85 for: 1m labels: severity: warning annotations: description: "JVM堆內存使用率超過80%\n當前:{{ $value }}" - alert: ES-集群健康狀態獲取失敗 expr: elasticsearch_cluster_health_up!=1 for: 1m labels: severity: warning annotations: description: "該ES節點,獲取集群監控狀態失敗 in cluster:[ {{ $labels.cluster }} ]" - alert: ES-太少節點運行 expr: elasticsearch_cluster_health_number_of_nodes < 5 for: 1m labels: severity: warning annotations: description: "ES集群運行的節點<5個(total 7) in cluster:[ {{ $labels.cluster }} ]\n當前運行節點個數:{{ $value }}" - alert: ES-GC平均執行次數過多 expr: rate(elasticsearch_jvm_gc_collection_seconds_count{}[5m])>5 for: 1m labels: severity: warning annotations: description: "JVM GC 1m內平均執行次數>5/s in cluster:[ {{ $labels.cluster }} ]\n當前:{{ $value }}/s" - alert: ES-GC平均運行時間過長 expr: round((node_filesystem_size_bytes{fstype=~"ext.?|xfs"} - node_filesystem_free_bytes{fstype=~"ext.?|xfs"}) * 100 / (node_filesystem_avail_bytes{fstype=~"ext.?|xfs"} + (node_filesystem_size_bytes{fstype=~"ext.?|xfs"} - node_filesystem_free_bytes{fstype=~"ext.?|xfs"})),0.1) > 90 for: 1m labels: severity: warning annotations: description: "ES 1m 內平均運行時間>0.3/s in cluster:[ {{ $labels.cluster }} ]\n當前:{{ $value }}/s" - alert: ES-JSON解析失敗 expr: elasticsearch_cluster_health_json_parse_failures>0 for: 5m labels: severity: warning annotations: description: "ES節點解析json失敗數 > 0 in cluster:[ {{ $labels.cluster }} ]\n當前:{{ $value }}" - alert: ES-斷路器觸發 expr: rate(elasticsearch_breakers_tripped{}[5m])>0 for: 1m labels: severity: warning annotations: description: "ES 斷路器觸發數 in cluster:[ {{ $labels.cluster }} ]> 0\n當前:{{ $value }}" - alert: ES-等待進程過多 expr: elasticsearch_cluster_health_number_of_pending_tasks>10 for: 1m labels: severity: warning annotations: description: "ES pending_tasks in cluster:[ {{ $labels.cluster }} ] > 10\n當前:{{ $value }}" - alert: ES-增加集群節點 expr: increase(elasticsearch_cluster_health_number_of_nodes[1m]) > 0 for: 1s labels: severity: warning annotations: description: "ES-增加集群節點 in cluster:[ {{ $labels.cluster }} ]\n增加個數:{{ $value }}" - alert: ES-減少集群節點 expr: increase(elasticsearch_cluster_health_number_of_nodes[1m]) > 0 for: 1s labels: severity: warning annotations: description: "ES-減少集群節點 in cluster:[ {{ $labels.cluster }} ]\n減少個數:{{ $value }}"
【6】【最佳實踐】grafana模板
模版:鏈接:https://pan.baidu.com/s/1mAtVhko18gD4LxdSkuCGEg 密碼:3mtd
【參考文檔】
基本安裝:轉自:https://www.cnblogs.com/fat-girl-spring/p/13143603.html

