大數據 Hadoop安裝
_(centos7下配置Hadoop3.2.2)_一、安裝centos7
二、下載和解壓jak hadoop
(克隆之前進行)
-
建立安裝目錄
cd / mkdir software
-
下載
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
wget https://repo.huaweicloud.com/java/jdk/8u202-b08/jdk-8u202-linux-x64.tar.gz
-
解壓
tar -zxvf hadoop-3.2.2.tar.gz -C /software/ tar -zxvf jdk-8u202-linux-x64.tar.gz -C /software/
三、配置環境變量
-
環境變量
vi /etc/profile
export JAVA_HOME=/software/jdk1.8.0_202 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar: $JAVA_HOME/lib/tools.jar export JAVA_HOME PATH CLASSPATH export HADOOP_HOME=/software/hadoop-3.2.2 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export JAVA_HOME=/software/jdk1.8.0_202 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export JAVA_HOME PATH CLASSPATH export HADOOP_HOME=/software/hadoop-3.2.2 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
-
保存配置
source /etc/profile
-
或者使用用戶環境變量
vi ~/.bash_profile
export JAVA_HOME=/software/jdk1.8.0_202 export JAVA_BIN=$JAVA_HOME/bin export JAVA_LIB=$JAVA_HOME/lib export CLASSPATH=.:$JAVA_LIB/tools.jar:$JAVA_LIB/dt.jar export HADOOP_HOME=/software/hadoop-3.2.2 PATH=$PATH:$JAVA_BIN:$HADOOP_HOME/bin: $HADOOP_HOME/sbin export PATH
-
保存用戶的環境變量
source ~/.bash_profile
五、修改固定IP
(所有機器都建議修改,修改方式相同)
-
ifconfig查看網卡
ifconfig
-
進入網卡配置文件
vi /etc/sysconfig/network-scripts/ifcfg-ens33
修改
BOOTPROTO="static"(將原來的修改為static)
添加
IPADDR=192.168.158.137(自定義) (和虛擬網卡設置有關 -->網關IP的值) GATEWAY=192.168.158.2 DNS1=192.168.158.2
-
重啟網絡
service network restart
-
關閉防火牆
systemctl disable firewalld
-
重啟,后查看防火牆狀態
systemctl status firewalld
六、修改hostname和hosts
(多台機器同時進行)
-
修改主機名
vi /etc/hostname
刪除所有然后寫上自己的命名,我的命名node0(不同主機不同名)
-
添加其他節點
vi /etc/hosts
192.168.158.137 node0 192.168.158.138 node1 192.168.158.139 node2 192.168.158.140 node3
七、免密登錄
(所有主機同時進行)
-
使用腳本快速創建
在家目錄里建立一個1.sh,復制下面的東西進入1.sh
cd ~ vi 1.sh
-
復制進去
ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub node0 ssh-copy-id -i ~/.ssh/id_rsa.pub node1 ssh-copy-id -i ~/.ssh/id_rsa.pub node2
-
執行1.sh
cd ~ bash 1.sh
八、建立自己需要的文件夾
-
建立logs文件夾
(也可以不建立 后面運行時會自己creating,並且所有機器都要建 立)
自己決定logs文件夾配置在哪里
-
建立配置文件需要的文件夾
(同樣所有主機都要建立相同的)
cd / mkdir data cd data mkdir hadoop cd hadoop mkdir hdfs tmp cd hdfs mkdir name data
九、修改hadoop配置文件
/software/hadoop-3.2.2/etc/hadoop目錄下
-
hadoop-env.sh
export JAVA_HOME=/software/jdk1.8.0_202 export HADOOP_HOME=/software/hadoop-3.2.2 export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root
-
core-site.xml
<configuration> <property> <!-- 必須設置:默認文件系統(存儲層和運算層解耦 --> <!-- 此處值為uri結構: 使用內置的hdfs系統 端口號一般都是9000 --> <name>fs.defaultFS</name> <value>hdfs://node0:9000</value> </property> <property> <!-- 必須設置:hadoop在本地的工作目錄,用於放hadoop進程的臨時數據,可以自己指定 --> <name>hadoop.tmp.dir</name> <value>/data/hadoop/tmp</value> </property> </configuration>
-
hdfs-site.xml
(需要自己建立文件夾,前面已經建立了) <configuration> <!-- hdfs存儲數據的副本數量(避免一台宕機),可以不設置,默認值是3--> <property> <name>dfs.replication</name> <value>2</value> </property> <!--hdfs 監聽namenode的web的地址,默認就是9870端口,如果不改端口也可以不設置 --> <property> <name>dfs.namenode.http-address</name> <value>node0:9870</value> </property> <!-- hdfs保存datanode當前數據的路徑,默認值需要配環境變量,建議使用自己創建的路徑,方便管理--> <property> <name>dfs.datanode.data.dir</name> <value>/data/hadoop/hdfs/data</value> </property> <!-- hdfs保存namenode當前數據的路徑,默認值需要配環境變量,建議使用自己創建的路徑,方便管理--> <property> <name>dfs.namenode.name.dir</name> <value>/data/hadoop/hdfs/name</value> </property> </configuration>
-
mapred-site.xml
<configuration> <!-- 必須設置,mapreduce程序使用的資源調度平台,默認值是local,若不改就只能單機運行,不會到集群上了 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 這是3.2以上版本需要增加配置的,不配置運行mapreduce任務可能會有問題,記得使用自己的路徑 --> <property> <name>mapreduce.application.classpath</name> <value> /software/hadoop-3.2.2/etc/hadoop, /software/hadoop-3.2.2/share/hadoop/common/*, /software/hadoop-3.2.2/share/hadoop/common/lib/*, /software/hadoop-3.2.2/hadoop/hdfs/*, /software/hadoop-3.2.2/share/hadoop/hdfs/lib/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/lib/*, /software/hadoop-3.2.2/share/hadoop/yarn/*, /software/hadoop-3.2.2/share/hadoop/yarn/lib/* </value> </property> </configuration>
-
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <!-- 必須配置 指定YARN的老大(ResourceManager)在哪一台主機 --> <property> <name>yarn.resourcemanager.hostname</name> <value>node0</value> </property> <!-- 必須配置 提供mapreduce程序獲取數據的方式 默認為空 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
-
workers
node0 node1 node2 node3
十、發送配置文件
-
進入配置文件夾
cd /software/hadoop-3.2.2/etc
scp -r hadoop root@node1:/software/hadoop-3.2.2/etc/ scp -r hadoop root@node2:/software/hadoop-3.2.2/etc/ scp -r hadoop root@node3:/software/hadoop-3.2.2/etc/
十一、格式化namenode
-
初始化hadoop
hadoop namenode -format
十二、運行查看
-
運行
start-all.sh
-
查看
jps
十三、windos通過web訪問
-
作業端口(主節點IP + 8088端口)
-
資源管理端口
無法訪問,二進制編譯問題 如果自己編譯就不會出現問題
進階配置
classPath獲取方式
hadoop classpath
yarn.application.classpath需要
-
yarn-site.xml新增加
<property> <name>yarn.application.classpath</name> <value> /software/hadoop-3.2.2/etc/hadoop, /software/hadoop-3.2.2/share/hadoop/common/*, /software/hadoop-3.2.2/share/hadoop/common/lib/*, /software/hadoop-3.2.2/hadoop/hdfs/*, /software/hadoop-3.2.2/share/hadoop/hdfs/lib/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/lib/*, /software/hadoop-3.2.2/share/hadoop/yarn/*, /software/hadoop-3.2.2/share/hadoop/yarn/lib/* </value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>node0</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>node0</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>node0</value> </property>
-
mapred-site.xml新增加
<property> <name>yarn.application.classpath</name> <value> /software/hadoop-3.2.2/etc/hadoop, /software/hadoop-3.2.2/share/hadoop/common/*, /software/hadoop-3.2.2/share/hadoop/common/lib/*, /software/hadoop-3.2.2/hadoop/hdfs/*, /software/hadoop-3.2.2/share/hadoop/hdfs/lib/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/lib/*, /software/hadoop-3.2.2/share/hadoop/yarn/*, /software/hadoop-3.2.2/share/hadoop/yarn/lib/* </value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/software/hadoop-3.2.2</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/software/hadoop-3.2.2</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/software/hadoop-3.2.2</value> </property>