1. 集群規划:
192.168.1.252 palo252 Namenode+Datanode 192.168.1.253 palo253 YarnManager+Datanode+SecondaryNameNode 192.168.1.254 palo254 Datanode
2. 設定固定IP地址
vi /etc/sysconfig/network-scripts/ifcfg-eth0
TYPE=Ethernet BOOTPROTO=static DEFROUTE=yes NAME=eth0 UUID=7ac09286-c35b-4f15-a9ba-701c093832bf DEVICE=eth0 IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no IPV6_ADDR_GEN_MODE=stable-privacy IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_PRIVACY=no ONBOOT=yes DNS1=192.168.1.1 IPADDR=192.168.1.252 #三台機器都要分別設置 PREFIX=24 GATEWAY=192.168.1.1
3. 修改主機名:
192.168.1.252
hostnamectl set-hostname palo252 hostnamectl --static set-hostname palo252
192.168.1.253
hostnamectl set-hostname palo253 hostnamectl --static set-hostname palo253
192.168.1.254
hostnamectl set-hostname palo254 hostnamectl --static set-hostname palo254
4. 修改hosts文件
vi /etc/hosts
127.0.0.1 localhost ::1 localhost 192.168.1.252 palo252 192.168.1.253 palo253 192.168.1.254 palo254
5. 安裝JDK(所有節點)
具體到oracle官網下載
6. SSH免密登錄
Precondition: install ssh server if not avalible
#install ssh client and ssh-server sudo yum install -y openssl openssh-server #enable ssh server to start at system start up systemctl enable sshd.service #start ssh server service systemctl start sshd.service
A) 每台機器生成訪問秘鑰,復制到192.168.1.252:/home/workspace目錄下
192.168.1.252:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys cp ~/.ssh/authorized_keys /home/workspace/authorized_keys252 rm -rf ~/.ssh/authorized_keys #刪除公鑰文件
192.168.1.253:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys scp ~/.ssh/authorized_keys 192.168.1.252:/home/workspace/authorized_keys253 rm -rf ~/.ssh/authorized_keys #刪除公鑰文件
192.168.1.254:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys scp ~/.ssh/authorized_keys 192.168.1.252:/home/workspace/authorized_keys254 rm -rf ~/.ssh/authorized_keys #刪除公鑰文件
B) 在192.168.1.252上將所有的公鑰合並成一個公鑰文件
cat /home/workspace/authorized_keys252 >> /home/workspace/authorized_keys cat /home/workspace/authorized_keys253 >> /home/workspace/authorized_keys cat /home/workspace/authorized_keys254 >> /home/workspace/authorized_keys
C) 將合並后的公鑰文件復制到集群中的各個主機中
scp /home/workspace/authorized_keys 192.168.1.253:~/.ssh/ scp /home/workspace/authorized_keys 192.168.1.254:~/.ssh/ cp /home/workspace/authorized_keys ~/.ssh/ #因為目前在252主機中,所以使用的命令為cp而不是scp
注:也可以借助 ssh-copy-id -i ~/.ssh/id_rsa.pub {ip or hostname}來往遠程機器復制公鑰
以本集群的配置為例,以上ABC三步的操作亦可以通過下面的操作來完成,操作方法如下:
192.168.1.252,192.168.1.253,192168.1.254 均做以下操作,就完成了私鑰的生成,公鑰的分發
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa #生成本機公鑰和私鑰 ssh-copy-id -i ~/.ssh/id_rsa.pub palo252 #復制本機的公鑰到palo252機器上,默認會存儲在遠程機器的~/.ssh/authorized_keys文件中,如果此文件不存在,會創建該文件 ssh-copy-id -i ~/.ssh/id_rsa.pub palo253 #復制本機的公鑰到palo252機器上,默認會存儲在遠程機器的~/.ssh/authorized_keys文件中,如果此文件不存在,會創建該文件 ssh-copy-id -i ~/.ssh/id_rsa.pub palo254 #復制本機的公鑰到palo252機器上,默認會存儲在遠程機器的~/.ssh/authorized_keys文件中,如果此文件不存在,會創建該文件
D) 每台機器:
chmod 755 ~ #當前用戶根目錄訪問權限 chmod 700 ~/.ssh/ #.ssh目錄權限 chmod 600 ~/.ssh/id_rsa #id_rsa的訪問權限 chmod 644 ~/.ssh/id_rsa.pub #id_rsa.pub的訪問權限 chmod 644 ~/.ssh/authorized_keys #authorized_keys的訪問權限
說明:
如果ssh 登錄的時候失敗或者需要密碼才能登陸,可以查看sshd的日志信息。日志信息目錄為,/var/log/secure
你會發現如下字樣的日志信息。
Jul 22 14:20:33 v138020.go sshd[4917]: Authentication refused: bad ownership or modes for directory /home/edw
則需要設置權限:sshd為了安全,對屬主的目錄和文件權限有所要求。如果權限不對,則ssh的免密碼登陸不生效。
用戶目錄權限為 755 或者 700,就是不能是77x。
.ssh目錄權限一般為755或者700。
rsa_id.pub 及authorized_keys權限一般為644
rsa_id權限必須為600
可通過來查看ssh過程中的日志.
cat /var/log/secure
7. 配置hadoop
7-1) 解壓
下載地址:https://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
tar xzvf hadoop-2.7.3.tar.gz -C /opt/
7-2) 創建存放數據的目錄(必須事先創建好,否則會報錯)
mkdir -p /opt/hadoop-2.7.3/data/full/tmp/ mkdir -p /opt/hadoop-2.7.3/data/full/tmp/dfs/name mkdir -p /opt/hadoop-2.7.3/data/full/tmp/dfs/data
7-3) 配置/opt/hadoop-2.7.3/etc/hadoop下面的配置文件
cd opt/hadoop-2.7.3/etc/hadoop #定位到配置文件目錄
7-3-1) core-site.xml
<configuration> <!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://127.0.0.1:9000</value> <description>hdfs://127.0.0.1:9000</description> </property> <!-- 指定hadoop運行時產生文件的存儲目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/home/lenmom/workspace/software/hadoop-2.7.3/data/tmp</value> <description>是hadoop文件系統依賴的基礎配置,很多路徑都依賴它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默認就放在這個路徑中</description> </property> <!--啟用 webhdfs--> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> <description>啟用 webhdfs</description> </property> <!--use hadoop native library--> <property> <name>hadoop.native.lib</name> <value>true</value> <description>Should native hadoop libraries, if present, be used.</description> </property> </configuration>
7-3-2) yarn-site.xml
<configuration> <property> <!-- reducer獲取數據的方式 --> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <!-- 指定YARN的ResourceManager的地址 --> <name>yarn.resourcemanager.hostname</name> <value>palo253</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>palo253:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>palo253:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>palo253:8031</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>10240</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> </configuration>
7-3-3) slaves
palo252
palo253
palo254
7-3-4) mapred-site.xml
<configuration> <!-- 指定mr運行在yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>palo252:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>palo252:19888</value> </property> </configuration>
7-3-5) hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> <description>不能大於datanode的數量,默認為3</description> </property> <!-- 設置secondname的端口 --> <property> <name>dfs.namenode.secondary.http-address</name> <value>palo253:50090</value> </property> <property> <name>dfs.data.dir</name> <value>file:/opt/hadoop-2.7.3/data/full/tmp/dfs/data</value> <description>用於確定將HDFS文件系統的數據保存在什么目錄下,可以將這個參數設置為多個分區上目錄,即可將HDFS建立在不同分區上。</description> </property> <property> <name>dfs.name.dir</name> <value>file:/opt/hadoop-2.7.3/data/full/tmp/dfs/name</value> <description>這個參數用於確定將HDFS文件系統的元信息保存在什么目錄下,如果這個參數設置為多個目錄,那么這些目錄下都保存着元信息的多個備份.</description> </property> <!--設置 hadoop的代理用戶--> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> <description>配置成*的意義,表示任意節點使用 hadoop 集群的代理用戶hadoop 都能訪問 hdfs 集群</description> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> <description>代理用戶所屬的組</description> </property> </configuration>
7-3-6) hadoop-env.sh
配置JAVA_HOME
1 # Licensed to the Apache Software Foundation (ASF) under one 2 # or more contributor license agreements. See the NOTICE file 3 # distributed with this work for additional information 4 # regarding copyright ownership. The ASF licenses this file 5 # to you under the Apache License, Version 2.0 (the 6 # "License"); you may not use this file except in compliance 7 # with the License. You may obtain a copy of the License at 8 # 9 # http://www.apache.org/licenses/LICENSE-2.0 10 # 11 # Unless required by applicable law or agreed to in writing, software 12 # distributed under the License is distributed on an "AS IS" BASIS, 13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 # See the License for the specific language governing permissions and 15 # limitations under the License. 16 17 # Set Hadoop-specific environment variables here. 18 19 # The only required environment variable is JAVA_HOME. All others are 20 # optional. When running a distributed configuration it is best to 21 # set JAVA_HOME in this file, so that it is correctly defined on 22 # remote nodes. 23 24 # The java implementation to use. 25 #export JAVA_HOME=${JAVA_HOME} 26 export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64
8. 配置環境變量(每台機器都必須做)
vi /etc/profile
在文件尾部添加:
#####set jdk enviroment export JAVA_HOME=/usr/java/jdk1.8.0_172-amd64 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH ##### set hadoop_home enviroment export HADOOP_HOME=/opt/hadoop-2.7.3 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export YARN_HOME=/home/lenmom/workspace/software/hadoop-2.7.3 export YARN_CONF_DIR=${YARN_HOME}/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin ###enable hadoop native library export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
命令行終端執行 source /etc/profile,讓配置的環境變量生效
source /etc/profile ####make the env variable to take effect right now.
9. 啟動:
NameNode:(master 252)
#格式化namenode
hdfs namenode -format
#啟動dfs
start-dfs.sh # (master 252)
#啟動Yarn: yarn節點(253)
#注意:Namenode和ResourceManger如果不是同一台機器,
#不能在NameNode上啟動 yarn,
#應該在ResouceManager所在的機器上啟動yarn。
start-yarn.sh
#驗證啟動情況:
jps #查看java進程
http://namenode:50070/
10 Hadoop啟動停止方式
1)各個服務組件逐一啟動 分別啟動hdfs組件: hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode 啟動yarn: yarn-daemon.sh start|stop resourcemanager|nodemanager 2)各個模塊分開啟動(配置ssh是前提)常用 start|stop-dfs.sh start|stop-yarn.sh 3)全部啟動(不建議使用) start|stop-all.sh
4) 開啟historyserver(任意節點啟動即可)
mr-jobhistory-daemon.sh start|stop historyserver
reference: