Kafka:ZK+Kafka+Spark Streaming集群環境搭建(三)安裝spark2.2.1


如何搭建配置centos虛擬機請參考《Kafka:ZK+Kafka+Spark Streaming集群環境搭建(一)VMW安裝四台CentOS,並實現本機與它們能交互,虛擬機內部實現可以上網。

如何安裝hadoop2.9.0請參考《Kafka:ZK+Kafka+Spark Streaming集群環境搭建(二)安裝hadoop2.9.0

如何配置hadoop2.9.0 HA 請參考《Kafka:ZK+Kafka+Spark Streaming集群環境搭建(十)安裝hadoop2.9.0搭建HA

安裝spark的服務器:

192.168.0.120      master
192.168.0.121      slave1
192.168.0.122      slave2
192.168.0.123      slave3

從spark官網下載spark安裝包:

官網地址:http://spark.apache.org/downloads.html

注意:上一篇文章中我們安裝了hadoop2.9.0,但是這里沒有發現待下載spark對應的hadoop版本可選項中發現hadoop2.9.0,因此也只能選擇“Pre-built for Apache Hadoop 2.7 and later”。

 

這spark可選版本比較多,就選擇“2.2.1(Dec 01 2017)”。

選中后,此時帶下來的spark安裝包版本信息為:

下載“spark-2.2.1-bin-hadoop2.7.tgz”,上傳到master的/opt目錄下,並解壓:

[root@master opt]# tar -zxvf spark-2.2.1-bin-hadoop2.7.tgz 
[root@master opt]# ls
hadoop-2.9.0  hadoop-2.9.0.tar.gz  jdk1.8.0_171  jdk-8u171-linux-x64.tar.gz  scala-2.11.0  scala-2.11.0.tgz  spark-2.2.1-bin-hadoop2.7  spark-2.2.1-bin-hadoop2.7.tgz
[root@master opt]# 

配置Spark

[root@master opt]# ls
hadoop-2.9.0  hadoop-2.9.0.tar.gz  jdk1.8.0_171  jdk-8u171-linux-x64.tar.gz  scala-2.11.0  scala-2.11.0.tgz  spark-2.2.1-bin-hadoop2.7  spark-2.2.1-bin-hadoop2.7.tgz
[root@master opt]# cd spark-2.2.1-bin-hadoop2.7/conf/
[root@master conf]# ls
docker.properties.template  metrics.properties.template   spark-env.sh.template
fairscheduler.xml.template  slaves.template
log4j.properties.template   spark-defaults.conf.template
[root@master conf]# scp spark-env.sh.template spark-env.sh
[root@master conf]# ls
docker.properties.template  metrics.properties.template   spark-env.sh
fairscheduler.xml.template  slaves.template               spark-env.sh.template
log4j.properties.template   spark-defaults.conf.template
[root@master conf]# vi spark-env.sh

在spark-env.sh末尾添加以下內容(這是我的配置,你需要根據自己安裝的環境情況自行修改):

export SCALA_HOME=/opt/scala-2.11.0
export JAVA_HOME=/opt/jdk1.8.0_171
export HADOOP_HOME=/opt/hadoop-2.9.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/opt/spark-2.2.1-bin-hadoop2.7
SPARK_DRIVER_MEMORY=1G

注:在設置Worker進程的CPU個數和內存大小,要注意機器的實際硬件條件,如果配置的超過當前Worker節點的硬件條件,Worker進程會啟動失敗。

vi slaves在slaves文件下填上slave主機名:

[root@master conf]# scp slaves.template slaves
[root@master conf]# vi slaves

配置內容為:

#localhost
slave1
slave2
slave3

將配置好的spark-2.2.1-bin-hadoop2.7文件夾分發給所有slaves吧

scp -r /opt/spark-2.2.1-bin-hadoop2.7 spark@slave1:/opt/
scp -r /opt/spark-2.2.1-bin-hadoop2.7 spark@slave2:/opt/
scp -r /opt/spark-2.2.1-bin-hadoop2.7 spark@slave3:/opt/

 注意:此時默認slave1,slave2,slave3上是沒有/opt/spark-2.2.1-bin-hadoop2.7,因此直接拷貝可能會出現無權限操作 。

            解決方案,分別在slave1,slave2,slave3的/opt下創建spark-2.2.1-bin-hadoop2.7,並分配777權限。

[root@slave1 opt]# mkdir spark-2.2.1-bin-hadoop2.7
[root@slave1 opt]# chmod 777 spark-2.2.1-bin-hadoop2.7
[root@slave1 opt]# 

之后,再次操作拷貝就有權限操作了。

啟動Spark

在spark安裝目錄下執行下面命令才行 , 目前的master安裝目錄在/opt/spark-2.2.1-bin-hadoop2.7

sbin/start-all.sh

此時,我使用非root賬戶(spark用戶名的用戶)啟動spark,出現master上spark無權限寫日志的問題:

[spark@master opt]$ cd /opt/spark-2.2.1-bin-hadoop2.7
[spark@master spark-2.2.1-bin-hadoop2.7]$ sbin/start-all.sh
mkdir: cannot create directory ‘/opt/spark-2.2.1-bin-hadoop2.7/logs’: Permission denied
chown: cannot access ‘/opt/spark-2.2.1-bin-hadoop2.7/logs’: No such file or directory
starting org.apache.spark.deploy.master.Master, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out
/opt/spark-2.2.1-bin-hadoop2.7/sbin/spark-daemon.sh: line 128: /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out: No such file or directory
failed to launch: nice -n 0 /opt/spark-2.2.1-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.master.Master --host master --port 7077 --webui-port 8080
tail: cannot open ‘/opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out’ for reading: No such file or directory
full log in /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave3: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave3.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave2.out
[spark@master spark-2.2.1-bin-hadoop2.7]$ cd ..
[spark@master opt]$ su root
Password: 
[root@master opt]# chmod 777 spark-2.2.1-bin-hadoop2.7
[root@master opt]# su spark
[spark@master opt]$ cd spark-2.2.1-bin-hadoop2.7
[spark@master spark-2.2.1-bin-hadoop2.7]$ sbin/start-all.sh           
starting org.apache.spark.deploy.master.Master, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out
slave2: org.apache.spark.deploy.worker.Worker running as process 3153.  Stop it first.
slave3: org.apache.spark.deploy.worker.Worker running as process 3076.  Stop it first.
slave1: org.apache.spark.deploy.worker.Worker running as process 3241.  Stop it first.
[spark@master spark-2.2.1-bin-hadoop2.7]$ sbin/stop-all.sh 
slave1: stopping org.apache.spark.deploy.worker.Worker
slave3: stopping org.apache.spark.deploy.worker.Worker
slave2: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master
[spark@master spark-2.2.1-bin-hadoop2.7]$ sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-master.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave3: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave3.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark-2.2.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-slave2.out

解決方案:給master的spark安裝目錄也分配777操作權限。

驗證 Spark 是否安裝成功

啟動過程發現問題:

1)以spark on yarn方式運行spark-shell拋出異常:ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful:解決方案參考《Kafka:ZK+Kafka+Spark Streaming集群環境搭建(六)針對spark2.2.1以yarn方式啟動spark-shell拋出異常:ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful

jps檢查,在 master 上正常啟動包含以下幾個進程:

$ jps
7949 Jps
7328 SecondaryNameNode
7805 Master
7137 NameNode
7475 ResourceManager

在 slave 上正常啟動包含以下幾個進程:

$jps
3132 DataNode
3759 Worker
3858 Jps
3231 NodeManager

進入Spark的Web管理頁面: http://192.168.0.120:8080

運行示例

本地方式兩線程運行測試:

[spark@master spark-2.2.1-bin-hadoop2.7]$ cd /opt/spark-2.2.1-bin-hadoop2.7
[spark@master spark-2.2.1-bin-hadoop2.7]$ ./bin/run-example SparkPi 10 --master local[2]

Spark Standalone 集群模式運行

[spark@master spark-2.2.1-bin-hadoop2.7]$ cd /opt/spark-2.2.1-bin-hadoop2.7
[spark@master spark-2.2.1-bin-hadoop2.7]$ ./bin/spark-submit \
> --class org.apache.spark.examples.SparkPi \
> --master spark://master:7077 \
> examples/jars/spark-examples_2.11-2.2.1.jar \
> 100

此時是可以從spark監控界面查看到運行狀況:

Spark on YARN 集群上 yarn-cluster 模式運行

[spark@master spark-2.2.1-bin-hadoop2.7]$ cd /opt/spark-2.2.1-bin-hadoop2.7
[spark@master spark-2.2.1-bin-hadoop2.7]$ ./bin/spark-submit \
> --class org.apache.spark.examples.SparkPi \
> --master yarn-cluster \
> /opt/spark-2.2.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.1.jar \
> 10

執行日志信息:

[spark@master hadoop-2.9.0]$ cd /opt/spark-2.2.1-bin-hadoop2.7
[spark@master spark-2.2.1-bin-hadoop2.7]$ ./bin/spark-submit \
> --class org.apache.spark.examples.SparkPi \
> --master yarn-cluster \
> /opt/spark-2.2.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.1.jar \
> 10
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
18/06/30 22:55:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/06/30 22:55:37 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.120:8032
18/06/30 22:55:38 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
18/06/30 22:55:38 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container)
18/06/30 22:55:38 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
18/06/30 22:55:38 INFO yarn.Client: Setting up container launch context for our AM
18/06/30 22:55:38 INFO yarn.Client: Setting up the launch environment for our AM container
18/06/30 22:55:38 INFO yarn.Client: Preparing resources for our AM container
18/06/30 22:55:40 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/06/30 22:55:47 INFO yarn.Client: Uploading resource file:/opt/spark-2.2.1-bin-hadoop2.7/spark-f46b4dc7-8074-4bb3-babd-c3124d1a7e07/__spark_libs__1523582418834894726.zip -> hdfs://master:9000/user/spark/.sparkStaging/application_1530369937777_0001/__spark_libs__1523582418834894726.zip
18/06/30 22:56:02 INFO yarn.Client: Uploading resource file:/opt/spark-2.2.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.1.jar -> hdfs://master:9000/user/spark/.sparkStaging/application_1530369937777_0001/spark-examples_2.11-2.2.1.jar
18/06/30 22:56:02 INFO yarn.Client: Uploading resource file:/opt/spark-2.2.1-bin-hadoop2.7/spark-f46b4dc7-8074-4bb3-babd-c3124d1a7e07/__spark_conf__4967231916988729566.zip -> hdfs://master:9000/user/spark/.sparkStaging/application_1530369937777_0001/__spark_conf__.zip
18/06/30 22:56:02 INFO spark.SecurityManager: Changing view acls to: spark
18/06/30 22:56:02 INFO spark.SecurityManager: Changing modify acls to: spark
18/06/30 22:56:02 INFO spark.SecurityManager: Changing view acls groups to: 
18/06/30 22:56:02 INFO spark.SecurityManager: Changing modify acls groups to: 
18/06/30 22:56:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(spark); groups with view permissions: Set(); users  with modify permissions: Set(spark); groups with modify permissions: Set()
18/06/30 22:56:02 INFO yarn.Client: Submitting application application_1530369937777_0001 to ResourceManager
18/06/30 22:56:02 INFO impl.YarnClientImpl: Submitted application application_1530369937777_0001
18/06/30 22:56:03 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:03 INFO yarn.Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1530370563128
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1530369937777_0001/
         user: spark
18/06/30 22:56:04 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:05 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:06 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:07 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:08 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:09 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:10 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:11 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:12 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:13 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:14 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:15 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:16 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:17 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:18 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:19 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:20 INFO yarn.Client: Application report for application_1530369937777_0001 (state: ACCEPTED)
18/06/30 22:56:22 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:22 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.0.121
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1530370563128
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1530369937777_0001/
         user: spark
18/06/30 22:56:23 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:24 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:25 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:26 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:27 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:29 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:30 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:31 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:32 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:33 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:34 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:35 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:36 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:37 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:38 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:39 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:40 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:41 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:42 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:43 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:45 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:46 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:47 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:48 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:49 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:50 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:51 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:52 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:53 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:54 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:55 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:56 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:57 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:58 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:56:59 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:57:00 INFO yarn.Client: Application report for application_1530369937777_0001 (state: RUNNING)
18/06/30 22:57:01 INFO yarn.Client: Application report for application_1530369937777_0001 (state: FINISHED)
18/06/30 22:57:01 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.0.121
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1530370563128
         final status: SUCCEEDED
         tracking URL: http://master:8088/proxy/application_1530369937777_0001/
         user: spark
18/06/30 22:57:01 INFO util.ShutdownHookManager: Shutdown hook called
18/06/30 22:57:01 INFO util.ShutdownHookManager: Deleting directory /opt/spark-2.2.1-bin-hadoop2.7/spark-f46b4dc7-8074-4bb3-babd-c3124d1a7e07

從hadoop yarn監控界面查看執行任務:

另外也可以進入http://slave1:8042查看slave1的信息:

 

注意:Spark on YARN 支持兩種運行模式,分別為yarn-cluster和yarn-client,具體的區別可以看這篇博文,從廣義上講,yarn-cluster適用於生產環境;而yarn-client適用於交互和調試,也就是希望快速地看到application的輸出。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM