spark集群配置細則總結


修改目錄與目錄組:

sudo chown -R hadoop:hadoop spark-1.6.1-bin-hadoop2.6

sudo chown -R hadoop:hadoop jdk1.8.0_101

sudo chown -R hadoop:hadoop scala2.11.6

 

1./etc目錄下

vi hosts

192.168.xxx.xxx data6(master節點)

192.168.xxx.xxx data2(worker節點)

192.168.xxx.xxx data3(worker節點)

 

2.spark/conf/目錄下

vi slaves

data6

data2

data3

 

vi spark-env

export JAVA_HOME=/app/jdk1.7

export SPARK_MASTER_IP=data6

export SPARK_WORKER_INSTANCES=1

export SPARK_WORKER_MEMORY=30g

export SPARK_WORKER_CORES=6

export SPARK_LOG_DIR=/data/tmp

export SPARK_PID_DIR=/data/tmp

export SPARK_DAEMON_JAVA_OPTS="-Djava.io.tmpdir=/home/tmp"

 

export PYSPARK_PYTHON=/opt/anaconda3/bin/python3

export PYSPARK_DRIVER_PYTHON=/opt/anaconda3/bin/ipython3

export PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip 0.0.0.0 --port 9999"

 

export PATH=$PATH:/usr/local/bin

export SPARK_CLASSPATH=/app/spark-1.6.1/lib/spark-examples-1.6.1-hadoop2.4.0.jar:/app/spark-1.6.1/lib/spark-assembly-1.6.1-hadoop2.4.0.jar:/app/spark-1.6.1/lib/spark-1.6.1-yarn-shuffle.jar:/app/spark-1.6.1/lib/nlp-lang-1.5.jar:/app/spark-1.6.1/lib/mysql-connector-java-5.1.26-bin.jar:/app/spark-1.6.1/lib/datanucleus-rdbms-3.2.9.jar:/app/spark-1.6.1/lib/datanucleus-core-3.2.10.jar:/app/spark-1.6.1/lib/datanucleus-api-jdo-3.2.6.jar:/app/spark-1.6.1/lib/ansj_seg-3.7.3-all-in-one.jar

 

vi hive-site.xml

<configuration>

<property>

<name>hive.metastore.uris</name>

<value>thirft://data6:9083</value>

<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>

</property>

 

<property>

<name>hive.server2.thrift.min.worker.threads</name>

<value>5</value>

<description>maximum number of Thrift worker threads</description>

</property>

 

<property>

<name>hive.server2.thrift.port</name>

<value>500</value>

<description>Port number of HiveSercer2 Thrift interfaace.</description>

</property>

 

<property>

<name>hive.server2.thrift.min.worker.threads</name>

<value>11000</value>

<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>

</property>

 

<property>

<name>hive.server2.thrift.bind.host</name>

<value>data6</value>

<description>bind host on which to run the HiveSercer2 Thrift interface</description>

</property>

 

<property>

<name>mapred.reduce.tasks</name>

<value>40</value>

</property>

 

vi log4j.properties

#Setting to quiet third party logs that are too verbose

log4j.logger.org.spark-project.jetty=WARN

log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR

log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO

log4j.logger.org.apache.spark.repl.SparkILoop$SparkLoopInterpreter=INFO

log4j.logger.parquet=ERROR

 

#SPARK-9183:Setting to avoid annoying messages when looking up nonexitent UDFs in SparkSQL with Hive support

log4j.logger.org.apachce.hadoop.hive.metastore.RetryingHMSHandler=FATAL

log4j.logger.org.apachce.hadoop.hive.ql.exec.FunctionRegistry=ERROR

 

Spark集群搭建——SSH免密碼驗證登陸

機器准備

筆者有三台機器,左側欄為ip,右側為hostname,三台機器都有一個名為spark的用戶。通過ping驗證三台是可以通信的。

192.168.248.150 spark-master 192.168.248.153 ubuntu-worker 192.168.248.155 spark-worker1

根據機器信息加入到三台機器的/etc/hosts中。

 

配置

我們需要設置spark-master 能夠免密碼驗證登陸ubuntu-worker、spark-worker1兩台機器。

  1. 安裝ssh

    sudo apt-get install openssh-server

  2. 生成秘鑰

    執行命令ssh-keygen -t rsa,然后一直按回車鍵即可。

  3. 復制spark-master結點的id_rsa.pub文件到另外兩個結點:

    scp id_rsa.pub spark@ubuntu-worker:~/.ssh/

  4. 到另外兩個結點,將公鑰加到用於認證的公鑰文件中:

    cat id_rsa.pub >> authorized_keys

     

  5. 修改兩個worker的authorized_keys權限為600或者644、將.ssh文件權限改為700

    chmod 700 .ssh

    chmod 600  authorized_keys

     

  6. 驗證:

     

    登陸spark-master,在終端輸入ssh ubuntu-worker,登陸成功則說明配置成功。

 

HDFS下載不成功問題

用windows上的IE來訪問namenode節點的監控web頁下載不了,需要修改了windows機器的C:\WINDOWS\system32\drivers\etc\hosts文件,把hadoop集群中的幾台機的主機名和IP地址加進去(一般在目錄下/etc/hosts),讓IE能解析就OK了。

 

Namenode沒有啟動問題,進行格式化

hadoop namenode -format

再啟動HDFS


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM