Spark安裝過程


Precondition:jdk、Scala安裝,/etc/profile文件部分內容如下:

 

JAVA_HOME=/home/Spark/husor/jdk
CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME
export CLASSPATH

HADOOP_HOME=/home/Spark/husor/hadoop
HBASE_HOME=/home/Spark/husor/hbase
SCALA_HOME=/home/Spark/husor/scala
SPARK_HOME=/home/Spark/husor/spark
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
export HADOOP_HOME
export HBASE_HOME
export SCALA_HOME
export SPARK_HOME
"/etc/profile" 99L, 2415C written
[root@Master husor]# source /etc/profile
[root@Master husor]# echo $SPARK_HOME
/home/Spark/husor/spark
[root@Master husor]# echo $SCALA_HOME
/home/Spark/husor/scala
[root@Master husor]# scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

  

 

1. expect安裝

Expect是基於Tcl語言的一種腳本語言,其實無論是交互還是非交互的應用場合,Expect都可以大顯身手,但是對於交互式的特定場合,還非Except莫屬。 

第1步:使用root用戶登錄 
    
第2步:下載安裝文件expect-5.43.0.tar.gz 和 tcl8.4.11-src.tar.gz 
       
第3步:解壓安裝包 
       解壓tcl8.4.11-src.tar.gz 
       tar –xvf tcl8.4.11-src.tar.gz 
       解壓后將創建tcl8.4.11 文件夾 

       解壓expect-5.43.0.tar.gz 
       tar –xvf expect-5.43.0.tar.gz 
       解壓后將創建expect-5.43 文件夾 
       
第4步:安裝tcl 
       進入tcl8.4.11/unix 目錄 
        a.執行sed -i "s/relid'/relid/" configure 
        b.執行./configure --prefix=/expect 
        c.執行make 
        d.執行make install 
        e.執行mkdir -p /tools/lib 
        f.執行cp tclConfig.sh /tools/lib/ 
        g. 將/tools/bin目錄export到環境變量 
           tclpath=/tools/bin 
           export tclpath 

第5步:安裝Expect 
        進入/soft/expect-5.43目錄 
        執行./configure --prefix=/tools --with-tcl=/tools/lib --with-x=no 
        如果最后一行提示: 
        configure: error: Can't find Tcl private headers 
        需要添加一個頭文件目錄參數 
        --with-tclinclude=../tcl8.4.11/generic,即 
        ./configure --prefix=/tools --with-tcl=/tools/lib --with-x=no --with-tclinclude=../tcl8.4.11/generic 
        ../tcl8.4.11/generic 就是tcl解壓安裝后的路徑,一定確保該路徑存在 
        執行make 
        執行make install 
        編譯完成后會生在/tools/bin內生成expect命令 
        執行/tools/bin/expect出現expect1.1>提示符說明expect安裝成功. 

第6步:創建一個符號鏈接 
        ln -s /tools/bin/expect /usr/bin/expect 
        查看符號連接 
        ls -l /usr/bin/expect 
        lrwxrwxrwx 1 root root 17 06-09 11:38 /usr/bin/expect -> /tools/bin/expect 

        這個符號鏈接將在編寫expect腳本文件時用到,例如在expect文件頭部會指定用於執行該腳本的shell 
        #!/usr/bin/expect 

2. SSH免輸入密碼登陸

主機Master操作如下:

[Spark@Master ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /home/Spark/.ssh/id_rsa.
Your public key has been saved in /home/Spark/.ssh/id_rsa.pub.
The key fingerprint is:
c9:d0:1f:92:43:42:85:f1:c5:23:76:f8:df:80:e5:66 Spark@Master
The key's randomart image is:
+--[ RSA 2048]----+
| .++oo. |
| .=+o+ . |
| ..*+.= |
| o =o.E |
| S .+ o |
| . . |
| |
| |
| |
+-----------------+
[Spark@Master ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

3. 然后執行如下自動化傳輸公鑰腳本SSH.sh,將主機Master上的公鑰傳輸給各個從節點Slave1,Slave2......

Note:將SSH.sh和NoPwdAccessSSH.exp腳本文件添加執行權限,如下:

[Spark@Master test]$ chmod +x SSH.sh

[Spark@Master test]$ chmod +x NoPwdAccessSSH.exp

//執行自動化無密碼訪問腳本SSH.sh

[Spark@Master test]$ ./SSH.sh
spawn ssh-copy-id -i /home/Spark/.ssh/id_rsa.pub Spark@Master
The authenticity of host 'master (192.168.8.29)' can't be established.
RSA key fingerprint is f0:3f:04:51:36:b5:91:c7:fa:47:5a:49:bc:fd:fe:40.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master,192.168.8.29' (RSA) to the list of known hosts.
Now try logging into the machine, with "ssh 'Spark@Master'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

No Password Access Master is Succeed!!!
spawn ssh-copy-id -i /home/Spark/.ssh/id_rsa.pub Spark@Slave1
Spark@slave1's password:
Now try logging into the machine, with "ssh 'Spark@Slave1'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

No Password Access Slave1 is Succeed!!!
spawn ssh-copy-id -i /home/Spark/.ssh/id_rsa.pub Spark@Slave2
Spark@slave2's password:
Now try logging into the machine, with "ssh 'Spark@Slave2'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

No Password Access Slave2 is Succeed!!!
[Spark@Master test]$ ssh Slave1
Last login: Wed Nov 19 02:35:28 2014 from 192.168.8.29
Welcome to your pre-built HUSOR STANDARD WEB DEVELOP VM.

PHP5.3 (/usr/local/php-cgi) service:php-fpm
PHP5.4 (/usr/local/php-54) service:php54-fpm
Tengine1.4.6, mysql-5.5.29, memcached 1.4.15, tokyocabinet-1.4.48, tokyotyrant-1.1.41, httpsqs-1.7, coreseek-4.1

WEBROOT: /data/webroot/www/

[Spark@Slave1 ~]$ exit
logout
Connection to Slave1 closed.
[Spark@Master test]$ ssh Slave2
Last login: Wed Nov 19 01:48:01 2014 from 192.168.8.1
Welcome to your pre-built HUSOR STANDARD WEB DEVELOP VM.

PHP5.3 (/usr/local/php-cgi) service:php-fpm
PHP5.4 (/usr/local/php-54) service:php54-fpm
Tengine1.4.6, mysql-5.5.29, memcached 1.4.15, tokyocabinet-1.4.48, tokyotyrant-1.1.41, httpsqs-1.7, coreseek-4.1

WEBROOT: /data/webroot/www/

[Spark@Slave2 ~]$ 

以上自動化執行腳本文件如下:

SSH.sh

復制代碼
#!/bin/bash

bin=`which $0`

bin=`dirname ${bin}`
bin=`cd "$bin"; pwd`

if [ ! -x "$bin/NoPwdAccessSSH.exp" ]; then
  echo "Sorry, $bin/NoPwdAccessSSH.exp is not executable file,please chmod +x $bin/NoPwdAccessSSH.exp."
  exit 1
fi

for hostInfo in $(cat $bin/SparkCluster);do
    
    host_name=$(echo "$hostInfo"|cut -f1 -d":")
    user_name=$(echo "$hostInfo"|cut -f2 -d":")
    user_pwd=$(echo "$hostInfo"|cut -f3 -d":")
    
    local_host=`ifconfig eth0 | grep "Mask" | cut -d: -f2 | awk '{print $1}'`
    if [ $host_name = $local_host ]; then
        continue;
    else 
        expect $bin/NoPwdAccessSSH.exp $host_name $user_name $user_pwd //調用expect應答式腳本NoPwdAccessSSH.exp
    fi

    if [ $? -eq 0 ]
    then
        echo "No Password Access $host_name is Succeed!!!"
    else
        echo "No Password Access $host_name is failed!!!"
    fi

done
復制代碼

NoPwdAccessSSH.exp

復制代碼
#!/usr/bin/expect -f

# auto ssh login

if { $argc<3} {
  puts stderr "Usage: $argv0(hostname) $argv1(username) $argv2(userpwd).\n "
 exit 1
}

set hostname [lindex $argv 0]
set username [lindex $argv 1]
set userpwd [lindex $argv 2]

spawn ssh-copy-id -i /home/Spark/.ssh/id_rsa.pub $username@$hostname

expect {
   "*yes/no*" { send "yes\r";exp_continue }
   "*password*" { send "$userpwd\r";exp_continue }
   "*password*" { send "$userpwd\r"; }
}
復制代碼

其中的SparkCluster文件內容如下:

Master:Spark:111111
Slave1:Spark:111111
Slave2:Spark:111111

 3. 安裝hadoop2.4.1(呵呵,我博客上有的。。。。。。)

Note:

1> 將hadoop,jdk安裝到統一新添用戶Spark相應目錄下:/home/Spark)(不然會引起一系列權限問題)

2> 將hadoop安裝目錄bin和sbin下添加執行權限(chmod 777 *)

3> 將主機Master上配置好的hadoop安裝目錄scp到所有從機Slave相同的新增用戶Spark相同目錄下:(/home/Spark) -> scp -r /home/Spark/* Spark@SlaveX:/home/Spark

4> 統一使用root用戶修改/etc/hosts,添加相關hostname識別(192.168.8.29 Master 192.168.8.30 Slave1 192.168.8.31 Slave2)

所遇異常1:

Hadoop 2.2.0 - warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard.

Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/Spark/hadoop2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
localhost]
sed: -e expression #1, char 6: unknown option to `s'
HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Name or service not known
64-Bit: ssh: Could not resolve hostname 64-Bit: Name or service not known
Java: ssh: Could not resolve hostname Java: Name or service not known
Server: ssh: Could not resolve hostname Server: Name or service not known
VM: ssh: Could not resolve hostname VM: Name or service not known
 
Reason:
因為官網下載的prebuild hadoop中使用的本地庫文件(例如lib/native/libhadoop.so.1.0.0)都是基於32位編譯的,運行在64位系統上就會出現上述錯誤。
 
解決方案1:
在64位系統上重新編譯hadoop
 
解決方案2:
以root用戶在/etc/profile中添加:
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
使對應配置立即生效:
source /etc/profile
 
解決方案3:
在hadoop-env.sh和yarn-env.sh中添加如下兩行:
export HADOOP_HOME=/home/Spark/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
 
namenode格式化:
bin/hdfs namenode -format
啟動或停止namenode,datanode:
sbin/start-dfs.sh    -> sbin/stop-dfs.sh
啟動或停止resourcemanage和nodemanager資源管理器
sbin/start-yarn.sh  -> sbin/stop-yarn.sh
 
所遇異常2:
當用window 7瀏覽器查看Hadoop相關界面時,即http://Master:50070時,window 7無法識別Master,當使用Master對應的IP地址時,是可以查看相關界面的。
Reason:
window 7無法識別Master。
解決方案:
在win7地址欄上輸入 %systemroot%\system32\drivers\etc 內容,回車后就能看到 hosts 文件了,添加對應主機名識別即可(192.168.8.29 Master 192.168.8.30 Slave1 192.168.8.31 Slave2)。
 
4. 驗證界面
 
 
5. Spark集群安裝
 
配置spark-env.sh文件
添加如下內容:

export JAVA_HOME=/home/Spark/husor/jdk
export HADOOP_HOME=/home/Spark/husor/hadoop
export HADOOP_CONF_DIR=/home/Spark/husor/hadoop/etc/hadoop
export SCALA_HOME=/home/Spark/husor/scala
export SPARK_MASTER_IP=Master
export SPARK_WORKER_MEMORY=512m

配置slaves文件

刪除localhost,添加相關內容:

Slave1

Slave2

驗證Spark啟動
 
Spark Shell啟動
[Spark@Master spark]$ bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/11/20 12:17:42 INFO spark.SecurityManager: Changing view acls to: Spark,
14/11/20 12:17:42 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/11/20 12:17:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/11/20 12:17:42 INFO spark.HttpServer: Starting HTTP Server
14/11/20 12:17:42 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/20 12:17:42 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:34246
14/11/20 12:17:42 INFO util.Utils: Successfully started service 'HTTP class server' on port 34246.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
14/11/20 12:17:52 INFO spark.SecurityManager: Changing view acls to: Spark,
14/11/20 12:17:52 INFO spark.SecurityManager: Changing modify acls to: Spark,
14/11/20 12:17:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )
14/11/20 12:17:53 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/11/20 12:17:54 INFO Remoting: Starting remoting
14/11/20 12:17:54 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:38507]
14/11/20 12:17:54 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:38507]
14/11/20 12:17:54 INFO util.Utils: Successfully started service 'sparkDriver' on port 38507.
14/11/20 12:17:54 INFO spark.SparkEnv: Registering MapOutputTracker
14/11/20 12:17:54 INFO spark.SparkEnv: Registering BlockManagerMaster
14/11/20 12:17:54 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141120121754-651a
14/11/20 12:17:54 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 48273.
14/11/20 12:17:54 INFO network.ConnectionManager: Bound socket to port 48273 with id = ConnectionManagerId(Master,48273)
14/11/20 12:17:54 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
14/11/20 12:17:54 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/11/20 12:17:54 INFO storage.BlockManagerMasterActor: Registering block manager Master:48273 with 267.3 MB RAM
14/11/20 12:17:54 INFO storage.BlockManagerMaster: Registered BlockManager
14/11/20 12:17:54 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-7decc3d6-acce-4793-98c3-172c680de719
14/11/20 12:17:54 INFO spark.HttpServer: Starting HTTP Server
14/11/20 12:17:54 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/20 12:17:54 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46326
14/11/20 12:17:54 INFO util.Utils: Successfully started service 'HTTP file server' on port 46326.
14/11/20 12:17:55 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/20 12:17:55 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/11/20 12:17:55 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/11/20 12:17:55 INFO ui.SparkUI: Started SparkUI at http://Master:4040
14/11/20 12:17:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/20 12:17:59 INFO executor.Executor: Using REPL class URI: http://192.168.8.29:34246
14/11/20 12:17:59 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@Master:38507/user/HeartbeatReceiver
14/11/20 12:17:59 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.

scala> 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM