Spark應用監控解決方案--使用Prometheus和Grafana監控Spark應用


Spark任務啟動后,我們通常都是通過跳板機去Spark UI界面查看對應任務的信息,一旦任務多了之后,這將會是讓人頭疼的問題。如果能將所有任務信息集中起來監控,那將會是很完美的事情。

通過Spark官網指導文檔,發現Spark只支持以下sink

Each instance can report to zero or more sinks. Sinks are contained in the org.apache.spark.metrics.sink package:

  • ConsoleSink: Logs metrics information to the console.
  • CSVSink: Exports metrics data to CSV files at regular intervals.
  • JmxSink: Registers metrics for viewing in a JMX console.
  • MetricsServlet: Adds a servlet within the existing Spark UI to serve metrics data as JSON data.
  • GraphiteSink: Sends metrics to a Graphite node.
  • Slf4jSink: Sends metrics to slf4j as log entries.
  • StatsdSink: Sends metrics to a StatsD node.

沒有比較常用的Influxdb和Prometheus ~~~

谷歌一把發現要支持influxdb需要使用第三方包,比較有參考意義的是這篇,Monitoring Spark Streaming with InfluxDB and Grafana ,在提交任務的時候增加file和配置文件,但成功永遠不會這么輕松。。。

寫入influxdb的數據都是以application_id命名的,類似這種application_1533838659288_1030_1_jvm_heap_usage,也就是說每個任務的指標都是在單獨的表,最終我們展示在grafana不還得一個一個配置么?

顯然這個不是我想要的結果,最終目的就是:一次配置后每提交一個任務自動會在監控上看到。

谷歌是治愈一切的良葯,終究找到一個比較完美的解決方案,就是通過graphite_exporter中轉數據后接入Prometheus,再通過grafana展示出來。

所以,目前已經實踐可行的方案有兩個

方案一:

監控數據直接寫入influxdb,再通過grafana讀取數據做展示,步驟如下:

1.在spark下 conf/metrics.properties 加入以下配置

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource driver.source.jvm.class=org.apache.spark.metrics.source.JvmSourc executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

*.sink.influx.class=org.apache.spark.metrics.sink.InfluxDbSink
*.sink.influx.protocol=http
*.sink.influx.host=xx.xx.xx.xx
*.sink.influx.port=8086
*.sink.influx.database=sparkonyarn
*.sink.influx.auth=admin:admin

2.在提交任務的時候增加以下配置,並確保以下jar存在

--files /spark/conf/metrics.properties \

--conf spark.metrics.conf=metrics.properties \
--jars /spark/jars/metrics-influxdb-1.1.8.jar,/spark/jars/spark-influx-sink-0.4.0.jar \
--conf spark.driver.extraClassPath=metrics-influxdb-1.1.8.jar:spark-influx-sink-0.4.0.jar \
--conf spark.executor.extraClassPath=metrics-influxdb-1.1.8.jar:spark-influx-sink-0.4.0.jar

缺點:application_id發生變化需要重新配置grafana

方案二(目前在用的):

通過graphite_exporter將原生數據通過映射文件轉化為有 label 維度的 Prometheus 數據

1.下載graphite_exporter,解壓后執行以下命令,其中graphite_exporter_mapping需要我們自己創建,內容為數據映射文件

 nohup ./graphite_exporter --graphite.mapping-config=graphite_exporter_mapping &  

例如

mappings: - match: '*.*.jvm.*.*' name: jvm_memory_usage labels: application: $1 executor_id: $2 mem_type: $3 qty: $4

會將數據轉化成 metric name 為 jvm_memory_usagelabel 為 applicationexecutor_idmem_typeqty 的格式。

application_1533838659288_1030_1_jvm_heap_usage -> jvm_memory_usage{application="application_1533838659288_1030",executor_id="driver",mem_type="heap",qty="usage"}

2.配置 Prometheus 從 graphite_exporter 獲取數據,重啟prometheus
/path/to/prometheus/prometheus.yml
scrape_configs: - job_name: 'spark' static_configs: - targets: ['localhost:9108']

3.在spark下 conf/metrics.properties 加入以下配置

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource driver.source.jvm.class=org.apache.spark.metrics.source.JvmSourc executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.protocol=tcp
*.sink.graphite.host=xx.xx.xx.xx
*.sink.graphite.port=9109
*.sink.graphite.period=5
*.sink.graphite.unit=seconds

4.提交spark任務的時候增加  --files /spark/conf/metrics.properties

5.最后在grafana創建prometheus數據源,創建需要的指標,最終效果如下,有新提交的任務不需要再配置監控,直接選擇application_id就可以看對應的信息

需要用到的jar包

https://repo1.maven.org/maven2/com/izettle/metrics-influxdb/1.1.8/metrics-influxdb-1.1.8.jar

https://mvnrepository.com/artifact/com.palantir.spark.influx/spark-influx-sink

 

模板 

mappings:
- match: '*.*.executor.filesystem.*.*'
  name: filesystem_usage
  labels:
    application: $1
    executor_id: $2
    fs_type: $3
    qty: $4

- match: '*.*.executor.threadpool.*'
  name: executor_tasks
  labels:
    application: $1
    executor_id: $2
    qty: $3

- match: '*.*.executor.jvmGCTime.count'
  name: jvm_gcTime_count
  labels:
    application: $1
    executor_id: $2

- match: '*.*.executor.*.*'
  name: executor_info
  labels:
    application: $1
    executor_id: $2
    type: $3
    qty: $4

- match: '*.*.jvm.*.*'
  name: jvm_memory_usage
  labels:
    application: $1
    executor_id: $2
    mem_type: $3
    qty: $4

- match: '*.*.jvm.pools.*.*'
  name: jvm_memory_pools
  labels:
    application: $1
    executor_id: $2
    mem_type: $3
    qty: $4

- match: '*.*.BlockManager.*.*'
  name: block_manager
  labels:
    application: $1
    executor_id: $2
    type: $3
    qty: $4

- match: '*.driver.DAGScheduler.*.*'
  name: DAG_scheduler
  labels:
    application: $1
    type: $2
    qty: $3

- match: '*.driver.*.*.*.*'
  name: task_info
  labels:
    application: $1
    task: $2
    type1: $3
    type2: $4
    qty: $5
graphite_exporter_mapping

 

參考資料

https://github.com/palantir/spark-influx-sink

https://spark.apache.org/docs/latest/monitoring.html

https://www.linkedin.com/pulse/monitoring-spark-streaming-influxdb-grafana-christian-g%C3%BCgi

https://github.com/prometheus/prometheus/wiki/Default-port-allocations

https://github.com/prometheus/graphite_exporter

https://prometheus.io/download/

https://rokroskar.github.io/monitoring-spark-on-hadoop-with-prometheus-and-grafana.html

https://blog.csdn.net/lsshlsw/article/details/82670508

https://www.jianshu.com/p/274380bb0974

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM