二、Hadoop部署
2.1 Hadoop安裝(三台機器可同步進行)
- 下載hadoop2.7.7(hadoop-2.7.7.tar.gz)
- 解壓 tar -zxvf hadoop-2.7.7.tar.gz ,並在主目錄下創建tmp、dfs、dfs/name、dfs/node、dfs/data
cd /opt/hadoop-2.7.7 mkdir tmp mkdir dfs mkdir dfs/name mkdir dfs/node mkdir dfs/data
2.2 Hadoop配置
以下操作都在hadoop-2.7.7/etc/hadoop下進行
2.2.1 編輯hadoop-env.sh文件,修改JAVA_HOME配置項為JDK安裝目錄
export JAVA_HOME=/opt/jdk
2.2.2 編輯core-site.xml文件,添加以下內容
其中master為計算機名,/opt/hadoop-2.7.7/tmp為手動創建的目錄
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/hadoop-2.7.7/tmp</value> <description>Abasefor other temporary directories.</description> </property> <property> <name>hadoop.proxyuser.spark.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.spark.groups</name> <value>*</value> </property> </configuration>
2.2.3 編輯hdfs-site.xml文件,添加以下內容
其中master為計算機名,
file:/opt/hadoop-2.7.7/dfs/name和file:/opt/hadoop-2.7.7/dfs/data為手動創建目錄
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop-2.7.7/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop-2.7.7/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
復制mapred-site.xml.template並重命名為mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
2.2.4 編輯mapred-site.xml文件,添加以下內容
其中master為計算機名
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
2.2.5 編輯yarn-site.xml文件,添加以下內容
其中master為計算機名
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration>
2.2.6 修改slaves文件,添加集群節點(多機添加多個)
添加以下
master
slave1
slave2
2.2.7 Hadoop集群搭建
hadoop配置集群,可以將配置文件etc/hadoop下內容同步到其他機器上,既2.2.1-2.2.6無需在一個個配置。
cd /opt/hadoop-2.7.7/etc scp -r hadoop root@另一台機器名:/opt/hadoop-2.7.7/etc
2.3 Hadoop啟動
1.格式化一個新的文件系統,進入到hadoop-2.7.7/bin下執行:
./hadoop namenode -format
2.啟動hadoop,進入到hadoop-2.7.7/sbin下執行:
./start-all.sh
看到如下內容說明啟動成功
root@master:/opt/hadoop-2.7.7/sbin# ./start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [master] master: starting namenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-namenode-master.out slave2: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-slave2.out master: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-master.out slave1: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-slave1.out Starting secondary namenodes [master] master: starting secondarynamenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-master.out starting yarn daemons starting resourcemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-master.out slave2: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-slave2.out slave1: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-slave1.out master: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-master.out
2.4 Hadoop集群檢查
方法1:檢查hadoop集群,進入hadoop-2.7.7/bin下執行
./hdfs dfsadmin -report
查看Live datanodes 節點個數,例如:Live datanodes (3),則表示3台都啟動成功
root@master:/opt/hadoop-2.7.7/bin# ./hdfs dfsadmin -report Configured Capacity: 621051420672 (578.40 GB) Present Capacity: 577317355520 (537.67 GB) DFS Remaining: 577317281792 (537.67 GB) DFS Used: 73728 (72 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (3):
方法2:訪問8088端口,http://192.168.241.132:8088/cluster/nodes

方法3:訪問50070端口http://192.168.241.132:50070/

三、Spark部署
3.1 Spark安裝(三台機器可同步進行)
- 下載spark-2.1.0-bin-hadoop2.7.tgz,放到opt下解壓。
- 將spark環境變量配置到/etc/profile中
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7 export PATH=$JAVA_HOME/bin:$SPARK_HOME/bin:$PATH
3.2 Spark配置
1.進入spark-2.1.0-bin-hadoop2.7/conf復制spark-env.sh.template並重命名為spark-env.sh
cp spark-env.sh.template spark-env.sh
編輯spark-env.sh文件,添加以下內容
export JAVA_HOME=/opt/jdk export SPARK_MASTER_IP=192.168.241.132 export SPARK_WORKER_MEMORY=8g export SPARK_WORKER_CORES=4 export SPARK_EXECUTOR_MEMORY=4g export HADOOP_HOME=/opt/hadoop-2.7.7/ export HADOOP_CONF_DIR=/opt/hadoop-2.7.7/etc/hadoop export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/jdk/jre/lib/amd64
2.把slaves.template拷貝為slaves,並編輯 slaves文件
cp slaves.template slaves
編輯slaves文件,添加以下內容(多機添加多個)
master
slave1
slave2
3.3 配置Spark集群
可以將配置文件spark-2.1.0-bin-hadoop2.7/conf下內容同步到其他機器上,既3.2無需在一個個配置。
scp -r conf root@另一台機器名:/opt/spark-2.1.0-bin-hadoop2.7
3.4 Spark啟動
啟動spark,進入spark-2.1.0-bin-hadoop2.7/sbin下執行
./start-all.sh
3.5 Spark集群檢查
訪問http://192.168.241.134:8080/
注意:配置Spark集群,需要保證子節點內容和主節點內容一致。
