安裝
基於CentOS 7 安裝,系統非最小化安裝,選擇部分Server 服務,開發工具組。全程使用root用戶,因為操作系統的權限、安全,在啟動時會和使用其它用戶有差別。
Step 1:下載hadoop.apache.org
選擇推薦的下載鏡像結點;
https://hadoop.apache.org/releases.html
Step 2:下載JDK
http://www.oracle.com/technetwork/pt/java/javase/downloads/jdk8-downloads-2133151.html
Java 9 is not compatible with Hadoop 3 (and possibly any hadoop version) yet.
Step 4: 解壓下載好的文件
解壓:JDK文件
# Tar –zxvf /root/Download/jdk-8u192-linux-x64.tar.gz -C /opt
解壓:Hadoop文件
# Tar –zxvf /root/Download/ hadoop-3.1.1.tar.gz –C /opt
Step 5 安裝JSVC
# rpm –ivh apache-commons-daemon-jsvc-1.0.13-7.el7.x86_64.rpm
Step 6:修改主機名
# vi /etc/hosts
添加所有涉及的服務器別名
192.168.154.116 master
192.168.154.117 slave1
192.168.154.118 slave2
添加主機的名稱
# vi /etc/hostname
master
##這個文件里第一行必須是主機名,第二行開始的內容對OS和Hadoop來說是沒意義的
Step 7: ssh互信(免密碼登錄)
注意我這里配置的是root用戶,所以以下的家目錄是/root
如果你配置的是用戶是xxxx,那么家目錄應該是/home/xxxxx/
復制代碼
#在主節點執行下面的命令:
# ssh-keygen -t rsa -P '' #一路回車直到生成公鑰
scp /root/.ssh/id_rsa.pub root@slave1:/root/.ssh/id_rsa.pub.master #從master節點拷貝id_rsa.pub到worker主機上,並且改名為id_rsa.pub.master
scp /root/.ssh/id_rsa.pub root@slave2:/root/.ssh/id_rsa.pub.master #同上,以后使用workerN代表worker1和worker2.
scp /etc/hosts root@workerN:/etc/hosts #統一hosts文件,讓幾個主機能通過host名字來識別彼此
#在對應的主機下執行如下命令:
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys #master主機
cat /root/.ssh/id_rsa.pub.master >> /root/.ssh/authorized_keys #workerN主機
復制代碼
這樣master主機就可以無密碼登錄到其他主機,這樣子在運行master上的啟動腳本時和使用scp命令時候,就可以不用輸入密碼了。
Step 8: 添加環境變量
命令#: vi .bash_profile
PATH=/usr/local/webserver/mysql/bin:/usr/python/bin:/opt/hadoop-3.1.1/etc/hadoop:/opt/jdk-10.0.2/bin:/opt/hadoop-3.1.1/bin:/opt/hadoop-3.1.1/sbin:$PATH:$HOME/bin:/opt/spark/bin:/opt/spark/sbin:/opt/hive/bin:/opt/flume/bin:/opt/kafka/bin
export PATH
JAVA_HOME=/opt/jdk-10.0.2
export JAVA_HOME
export HADOOP_HOME=/opt/hadoop-3.1.1
export LD_LIBRARY_PATH=/usr/local/lib:/usr/python/lib:/usr/local/webserver/mysql/lib
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin
export HIVE_HOME=/opt/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin
export YARN_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export SQOOP_HOME=/opt/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export FLUME_HOM=/opt/flume
Step9: 修改 vi /opt/hadoop-3.1.1/etc/hadoop/hadoop-env.sh
添加
JAVA_HOME=/opt/jdk-10.0.2
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export JSVC_HOME=/usr/bin
Step 10: 修改 vi /opt/hadoop-3.1.1/etc/hadoop/core-site.xml
<configuration>
<!-- 指定HDFS(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.154.116:9000</value>
</property>
<!-- 指定hadoop運行時產生文件的存儲路徑 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-3.1.1/tmp</value>
</property>
</configuration>
Step 11: 修改 vi /opt/hadoop-3.1.1/etc/hadoop/hdfs-site.xml
<configuration>
<!-- 設置hdfs副本數量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- 設置namenode的http通訊地址 -->
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<!-- 設置secondarynamenode的http通訊地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
<!-- 設置namenode存放的路徑 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-3.1.1/name</value>
</property>
<!-- 設置datanode存放的路徑 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-3.1.1/data</value>
</property>
</configuration>
Step 12:修改vi /opt/hadoop-3.1.1/etc/hadoop/mapred-site.xml
如果此文件不存在可是復制 template文件 cp /opt/hadoop-3.1.1/etc/hadoop/mapred-site.xml.template /opt/hadoop-3.1.1/etc/hadoop/mapred-site.xml
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1</value>
</property>
</configuration>
Step 13:修改 vi /opt/hadoop-3.1.1/etc/hadoop/yarn-site.xml
<configuration>
<!-- reducer取數據的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 開啟日志聚合 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志聚合目錄 -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/opt/hadoop-3.1.1/logs</value>
</property>
<property>
<!-- 指定ResourceManager 所在的節點 -->
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
Step 14:修改 vi /opt/hadoop-3.1.1/etc/hadoop/masters
192.168.154.116
Step 15:修改 vi /opt/hadoop-3.1.1/etc/hadoopslaves
192.168.154.116
Step 16:修改 vi /opt/hadoop-3.1.1/etc/hadoopworkers
192.168.154.116
##使用一台VM按cluster的方式搭建,屬於分布式。當使用多台機器時,同樣的配置方式,並將多台機器互信,則為真正的分布式。
Step 17:修改 vi /opt/hadoop-3.1.1/etc/hadoop/yarn-env.sh
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
Step 18:重啟Linux系統
reboot 或者 init 6
Step 19:格式化hadoop
# cd /opt/hadoop-3.1.1/etc/hadoop/
# hdfs namenode -format
格式化一次就好,多次格式化可能導致datanode無法識別,如果想要多次格式化,需要先刪除數據再格式化
Step 20:啟動hdfs和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh
Step 21:檢查是否安裝成功
#jps
12006 NodeManager
11017 NameNode
11658 ResourceManager
13068 Jps
11197 DataNode
11389 SecondaryNameNode
Step 22:上傳文件測試
# cd ~
# vi helloworld.txt
# hdfs dfs -put helloworld.txt helloworld.txt