我的Spark機群是部署在Yarn上的,因為之前Yarn的部署只是簡單的完全分布式,但是后來升級到HA模式,一個主NN,一個備NN,那么Spark HistoryServer的配置也需要相應的做修改,因為不做修改會報錯
Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:184) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1719) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1350) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4132) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)
1 非Hadoop HA下Spark HistoryServer配置
1.1 配置spark-defalut.conf
spark.eventLog.enabled true spark.eventLog.dir hdfs://1421-0002:9000/spark/sparklogs
spark.yarn.historyServer.address 1421-0002:18080
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.executor.instances 4
其中配置了日志文件存儲的HDFS的路徑,還有Spark history server的地址
1.2 配置spark-env.sh
#指定logDirectory,在start-history-server.sh時就無需再顯示的指定路徑 export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://1421-0002:9000/spark/sparklogs"
其中指定了日志文件存儲的HDFS路徑,那么每次啟動就不需要加這個參數了
1.3 啟動Spark History Server
start-history-server.sh
2 Hadoop HA下,Spark HistoryServer配置
2.1 修改spark-env.sh
#指定logDirectory,在start-history-server.sh時就無需再顯示的指定路徑 export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://hadoop-cluster/spark/sparklogs"
這里將HDFS的路徑修改了,因為之前只有一個NN,HA的情況下,指定了兩個,所以將1421-0002:9000替換成hadoop-cluster(hadoop/etc/hadoop/hdfs-site.xml 中dfs.nameservices配置的值),不需要指定端口