Spark 在Hadoop HA下配置HistoryServer問題


我的Spark機群是部署在Yarn上的,因為之前Yarn的部署只是簡單的完全分布式,但是后來升級到HA模式,一個主NN,一個備NN,那么Spark HistoryServer的配置也需要相應的做修改,因為不做修改會報錯

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:184)
    at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
    at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
    at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1719)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1350)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4132)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)

 

1 非Hadoop HA下Spark HistoryServer配置

1.1 配置spark-defalut.conf

spark.eventLog.enabled             true
spark.eventLog.dir                 hdfs://1421-0002:9000/spark/sparklogs 
spark
.yarn.historyServer.address 1421-0002:18080
spark
.serializer org.apache.spark.serializer.KryoSerializer
spark
.executor.instances 4

其中配置了日志文件存儲的HDFS的路徑,還有Spark history server的地址

1.2 配置spark-env.sh

#指定logDirectory,在start-history-server.sh時就無需再顯示的指定路徑
export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://1421-0002:9000/spark/sparklogs"

其中指定了日志文件存儲的HDFS路徑,那么每次啟動就不需要加這個參數了

1.3 啟動Spark History Server

start-history-server.sh 

 

2 Hadoop HA下,Spark HistoryServer配置

2.1 修改spark-env.sh

#指定logDirectory,在start-history-server.sh時就無需再顯示的指定路徑
export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://hadoop-cluster/spark/sparklogs"

這里將HDFS的路徑修改了,因為之前只有一個NN,HA的情況下,指定了兩個,所以將1421-0002:9000替換成hadoop-cluster(hadoop/etc/hadoop/hdfs-site.xml 中dfs.nameservices配置的值),不需要指定端口

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM