Spark的監控

本文轉載自查看原文 2019-06-19 15:05 617

Monitoring

spark的監控我們目前只介紹4種，分別是

通過Spark UI進行監控
使用Spark HistoryServer UI進行監控
使用REST API進行監控
Metrics

通過Spark UI進行監控

Spark的webUI界面給我們提供了非常好的作業監控界面，通過仔細觀察那些界面我們可以做很多的事，比如可以查看正在運行的spark程序的作業的詳細信息，Duration、gc、launch的時間，這都需要在生產上觀看的。但是當任務跑完或者掛了的時候，我們是無法看到任何信息的

當啟動spark作業的時候

可以看到http://hadoop001:4040這個webUI界面的地址，嘗試打開
首先在spark中實現一個join

實現join以后，查看webUI界面

DAG圖

當我們把spark作業關掉，再刷新http://hadoop001:4040界面，發現界面打不開了

這就導致我們無法查看生產上導致job掛了的原因，也就無法做出解決

使用Spark HistoryServer UI進行監控

通過Spark HistoryServer我們可以觀看已經結束的Spark Application

想要使用HistoryServer，需要先進行配置才能使用，配置的時候需要細心一點，否則會有問題，以下內容是根據官網的最新內容進行配置

配置一

[hadoop@hadoop001 conf]$ pwd
/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/conf

[hadoop@hadoop001 conf]$ cp  spark-defaults.conf.template  spark-defaults.conf

[hadoop@hadoop001 conf]$ ll
total 52
-rw-r--r-- 1 hadoop hadoop 996 May 2 00:49 docker.properties.template
-rw-r--r-- 1 hadoop hadoop 1105 May 2 00:49 fairscheduler.xml.template
-rw-r--r-- 1 hadoop hadoop 1129 Jun 9 21:12 hive-site.xml
-rw-r--r-- 1 hadoop hadoop 2025 May 2 00:49 log4j.properties.template
-rw-r--r-- 1 hadoop hadoop 7801 May 2 00:49 metrics.properties.template
-rw-r--r-- 1 hadoop hadoop 865 May 2 00:49 slaves.template
-rw-r--r-- 1 hadoop hadoop 1406 Jun 18 22:09 spark-defaults.conf
-rw-r--r-- 1 hadoop hadoop 1292 May 2 00:49 spark-defaults.conf.template
-rwxr-xr-x 1 hadoop hadoop 4221 May 2 00:49 spark-env.sh.template

[hadoop@hadoop001 conf]$ vim spark-defaults.conf

配置界面

在spark-defaults.conf打開以下兩個選項

spark.eventLog.enabled true #開啟事件日志

spark.eventLog.dir hdfs://hadoop001:9000/g6_directory  #事件日志存放位置  
#hadoop001:9000就是你hadoop  /home/hadoop/app/hadoop/etc/hadoop/core-site.xml,你的

#core-site.xml中的fs.defaultFS配置的啥 你就寫啥

配置二

[hadoop@hadoop001 conf]$ cp spark-env.sh.template spark-env.sh

[hadoop@hadoop001 conf]$ ll
total 52
-rw-r--r-- 1 hadoop hadoop 996 May 2 00:49 docker.properties.template
-rw-r--r-- 1 hadoop hadoop 1105 May 2 00:49 fairscheduler.xml.template
-rw-r--r-- 1 hadoop hadoop 1129 Jun 9 21:12 hive-site.xml
-rw-r--r-- 1 hadoop hadoop 2025 May 2 00:49 log4j.properties.template
-rw-r--r-- 1 hadoop hadoop 7801 May 2 00:49 metrics.properties.template
-rw-r--r-- 1 hadoop hadoop 865 May 2 00:49 slaves.template
-rw-r--r-- 1 hadoop hadoop 1406 Jun 18 22:09 spark-defaults.conf
-rw-r--r-- 1 hadoop hadoop 1292 May 2 00:49 spark-defaults.conf.template
-rwxr-xr-x 1 hadoop hadoop 4581 Jun 18 22:07 spark-env.sh
-rwxr-xr-x 1 hadoop hadoop 4221 May 2 00:49 spark-env.sh.template

官網解釋

由上圖紅框可以看出，所有的spark.history.*參數都要配置在SPARK_HISTORY_OPTS之下

下表是所有的spark.history參數

Property Name	Default	Meaning
spark.history.provider	`org.apache.spark.deploy.history.FsHistoryProvider`	Name of the class implementing the application history backend. Currently there is only one implementation, provided by Spark, which looks for application logs stored in the file system.
spark.history.fs.logDirectory	file:/tmp/spark-events 日志目錄	For the filesystem history provider, the URL to the directory containing application event logs to load. This can be a local `file://` path, an HDFS path `hdfs://namenode/shared/spark-logs` or that of an alternative filesystem supported by the Hadoop APIs.
spark.history.fs.update.interval	10s #多久更新日志	The period at which the filesystem history provider checks for new or updated logs in the log directory. A shorter interval detects new applications faster, at the expense of more server load re-reading updated applications. As soon as an update has completed, listings of the completed and incomplete applications will reflect the changes.
spark.history.retainedApplications	50 #內存中最多持有程序數，多的則需要讀取磁盤	The number of applications to retain UI data for in the cache. If this cap is exceeded, then the oldest applications will be removed from the cache. If an application is not in the cache, it will have to be loaded from disk if it is accessed from the UI.
spark.history.ui.maxApplications	Int.MaxValue	The number of applications to display on the history summary page. Application UIs are still available by accessing their URLs directly even if they are not displayed on the history summary page.
spark.history.ui.port	18080 UIweb界面端口號默認的	The port to which the web interface of the history server binds.
spark.history.kerberos.enabled	false	Indicates whether the history server should use kerberos to login. This is required if the history server is accessing HDFS files on a secure Hadoop cluster. If this is true, it uses the configs `spark.history.kerberos.principal`and`spark.history.kerberos.keytab`.
spark.history.kerberos.principal	(none)	Kerberos principal name for the History Server.
spark.history.kerberos.keytab	(none)	Location of the kerberos keytab file for the History Server.
spark.history.fs.cleaner.enabled	false #是否開啟清理日志數據功能，生產上是必須清理的，需開啟	Specifies whether the History Server should periodically clean up event logs from storage.
spark.history.fs.cleaner.interval	1d	How often the filesystem job history cleaner checks for files to delete. Files are only deleted if they are older than `spark.history.fs.cleaner.maxAge`
spark.history.fs.cleaner.maxAge	7d	Job history files older than this will be deleted when the filesystem history cleaner runs.
spark.history.fs.endEventReparseChunkSize	1m	How many bytes to parse at the end of log files looking for the end event. This is used to speed up generation of application listings by skipping unnecessary parts of event log files. It can be disabled by setting this config to 0.
spark.history.fs.inProgressOptimization.enabled	true	Enable optimized handling of in-progress logs. This option may leave finished applications that fail to rename their event logs listed as in-progress.
spark.history.fs.numReplayThreads	25% of available cores	Number of threads that will be used by history server to process event logs.
spark.history.store.maxDiskUsage	10g	Maximum disk usage for the local directory where the cache application history information are stored.
spark.history.store.path	(none)	Local directory where to cache application history data. If set, the history server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in the event of a history server restart.

說明：-D是必須加在最前邊，x代表spark.history.*參數，y代表給參數賦的值

這里的

SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop001:9000/g6_directory"的hdfs://hadoop001:9000/g6_directory目錄必須跟配置一中spark-defaults.conf的spark.eventLog.dir路徑配置一致，因為日志存放在哪里就要從哪里讀取呀

提示：日志的存放目錄必須提前在HDFS上創建

啟動

[hadoop@hadoop001 spark-2.4.2-bin-2.6.0-cdh5.7.0]$ ./sbin/start-history-server.sh

[hadoop@hadoop001 spark-2.4.2-bin-2.6.0-cdh5.7.0]$ jps
17184 Jps
22609 ResourceManager
21604 NameNode
21860 DataNode
19210 HistoryServer   #啟動的日志進程
22748 NodeManager
22236 SecondaryNameNode

啟動HistoryServer以后，我們可以啟動一個spark應用程序，然后關閉，這時候我們發現，web界面的信息並沒有隨着應用程序的結束而消失，實際上hadoop001:18080這個界面顯示的只是job結束以后的信息，在job還在運行的時候是看不到的

如圖

關閉

[hadoop@hadoop001 spark-2.4.2-bin-2.6.0-cdh5.7.0]$ ./sbin/stop-history-server.sh

使用REST API進行監控

根據是否完成狀態來篩選job信息，在http://hadoop001:4040/api/v1/applications中無法看到已完成的job只能看到正在執行的job
在http://hadoop001:18080/api/v1/applications可以看到所有狀態的job

在applications路徑之下我們可以根據上圖中的job的完成狀態最早開始時間俎新開始時間等等，自行查看自己想要的信息
這只是一部分，其他詳細信息可自行去官網查看
http://spark.apache.org/docs/latest/monitoring.html

Metrics

此方式生產用的少，通常對spark研究很深的人才才可能會使用，此處暫不做介紹

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 通過SparkListener監控spark應用 Spark指標項監控 spark的運行指標監控 spark監控入門監控Spark應用方法簡介 Spark Web UI 監控詳解 Spark（五十）：使用JvisualVM監控Spark Executor JVM 通過Spark Rest 服務監控Spark任務執行情況調用spark API，監控任務的進度 Spark Streaming任務延遲監控及告警

Spark的監控

Monitoring

通過Spark UI進行監控

使用Spark HistoryServer UI進行監控

使用REST API進行監控

Metrics

通過Spark UI進行監控

使用Spark HistoryServer UI進行監控

配置一

配置二

啟動

關閉

使用REST API進行監控

Metrics

免責聲明！