先上一個架構圖

Flink App : 通過report 將數據發出去
Pushgateway : Prometheus 生態中一個重要工具
Prometheus : 一套開源的系統監控報警框架 (Prometheus 入門與實踐)
Grafana: 一個跨平台的開源的度量分析和可視化工具,可以通過將采集的數據查詢然后可視化的展示,並及時通知(可視化工具Grafana:簡介及安裝)
Node_exporter : 跟Pushgateway一樣是Prometheus 的組件,采集到主機的運行指標如CPU, 內存,磁盤等信息
以下安裝,大部分參考博客: https://www.cnblogs.com/xiao987334176/p/9930517.html#autoid-0-0-0
1、docker pull 鏡像
docker pull prom/node-exporter docker pull prom/pushgateway docker pull prom/prometheus docker pull grafana/grafana
查看下載的鏡像
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE prom/prometheus latest d5b9d7ed160a 2 weeks ago 138MB grafana/grafana latest a6e14b4109af 2 weeks ago 253MB prom/pushgateway latest 20e6dcae675f 4 weeks ago 19.2MB prom/node-exporter latest e5a616e4b9cf 2 months ago 22.9MB
2、編輯prometheus.yml 、創建 Grafana 數據存儲目錄
$ mkdir /opt/grafana-storage # grafana 數據存儲目錄
$ cat /opt/prometheus/prometheus.yml # prometheus 配置
global: scrape_interval: 60s evaluation_interval: 60s scrape_configs: - job_name: prometheus static_configs: - targets: ['localhost:9090'] labels: instance: prometheus - job_name: linux static_configs: - targets: ['venn:9100'] labels: instance: localhost - job_name: 'pushgateway' static_configs: - targets: ['venn:9091'] labels: instance: 'pushgateway'
3、啟動各個組件
docker run -d -p 3000:3000 --name=grafana -v /opt/grafana-storage:/var/lib/grafana grafana/grafana docker run -d -p 9100:9100 -v "/proc:/host/proc:ro" -v "/sys:/host/sys:ro" -v "/:/rootfs:ro" --net="host" prom/node-exporter docker run -d -p 9090:9090 -v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus docker run -d -p 9091:9091 prom/pushgateway
查看docker進程
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4a689cf48e10 prom/pushgateway "/bin/pushgateway" 5 days ago Up 5 days 0.0.0.0:9091->9091/tcp infallible_goldstine fcc40433bf75 grafana/grafana "/run.sh" 5 days ago Up 5 days 0.0.0.0:3000->3000/tcp grafana 8ba942d0cf35 prom/prometheus "/bin/prometheus --c…" 5 days ago Up 5 days 0.0.0.0:9090->9090/tcp quizzical_colden b84b0f4be2b2 prom/node-exporter "/bin/node_exporter" 5 days ago Up 5 days fervent_poitras
查看端口
$ netstat -apn | grep -E '9091|3000|9090|9100' (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 172.17.0.1:39028 172.17.0.4:9091 ESTABLISHED - tcp6 0 0 :::9100 :::* LISTEN - tcp6 0 0 :::3000 :::* LISTEN - tcp6 0 0 :::9090 :::* LISTEN - tcp6 0 0 :::9091 :::* LISTEN - tcp6 0 0 192.168.229.129:45864 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45856 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45824 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45874 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45854 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45836 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45814 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.128:9100 192.168.229.1:13405 ESTABLISHED - tcp6 0 0 192.168.229.129:45826 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45844 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.128:9091 172.17.0.2:53930 ESTABLISHED - tcp6 0 0 192.168.229.129:45846 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.128:9100 172.17.0.2:54776 ESTABLISHED - tcp6 0 0 192.168.229.129:45816 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45876 192.168.229.128:9091 ESTABLISHED 40846/java tcp6 0 0 192.168.229.129:45834 192.168.229.128:9091 TIME_WAIT - tcp6 0 0 192.168.229.129:45866 192.168.229.128:9091 TIME_WAIT -
4、查看組件頁面
node_exporter: ip:9100/metrics

查看 prometheus: ip:9090/targets

如果state 不是 UP 的,等一會就起來了
查看Grafana:

默認用戶名密碼 : amin/admin
此處不再贅述,配置數據源、創建系統負載監控參考博客:https://www.cnblogs.com/xiao987334176/p/9930517.html#autoid-0-0-0
5、配置Flink report :
在Flink 配置文件 flink-conf.yml 中添加如下內容:
##metrics metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter metrics.reporter.promgateway.host: venn metrics.reporter.promgateway.port: 9091 metrics.reporter.promgateway.jobName: myJob metrics.reporter.promgateway.randomJobNameSuffix: true metrics.reporter.promgateway.deleteOnShutdown: false
啟動一個任務(上一篇博客的案例遲到數據處理):
flink run -m yarn-cluster -ynm LateDataProcess -yn 1 -c com.venn.stream.api.sideoutput.lateDataProcess.LateDataProcess jar/flinkDemo-1.0.jar
查看任務webUI:

PS:任務已經跑了一段時間了
6、Grafana 中配置Flink監控
由於上面一句配置好Flink report、 pushgateway、prometheus,並且在Grafana中已經添加了prometheus 數據源,所以Grafana中會自動獲取到 flink job的metrics 。
Grafana 首頁,點擊New dashboard,創建一個新的dashboard

選中之后,即會出現對應的監控指標

至此,Flink 的metrics 的指標展示在Grafana 中了
flink 指標對應的指標名比較長,可以在Legend 中配置顯示內容,在{{key}} 將key換成對應需要展示的字段即可,如: {{job_name}},{{operator_name}}
對應顯示如下:

保存,搞定
