大数据 Hadoop安装
_(centos7下配置Hadoop3.2.2)_一、安装centos7
二、下载和解压jak hadoop
(克隆之前进行)
-
建立安装目录
cd / mkdir software
-
下载
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
wget https://repo.huaweicloud.com/java/jdk/8u202-b08/jdk-8u202-linux-x64.tar.gz
-
解压
tar -zxvf hadoop-3.2.2.tar.gz -C /software/ tar -zxvf jdk-8u202-linux-x64.tar.gz -C /software/
三、配置环境变量
-
环境变量
vi /etc/profile
export JAVA_HOME=/software/jdk1.8.0_202 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar: $JAVA_HOME/lib/tools.jar export JAVA_HOME PATH CLASSPATH export HADOOP_HOME=/software/hadoop-3.2.2 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export JAVA_HOME=/software/jdk1.8.0_202 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export JAVA_HOME PATH CLASSPATH export HADOOP_HOME=/software/hadoop-3.2.2 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
-
保存配置
source /etc/profile
-
或者使用用户环境变量
vi ~/.bash_profile
export JAVA_HOME=/software/jdk1.8.0_202 export JAVA_BIN=$JAVA_HOME/bin export JAVA_LIB=$JAVA_HOME/lib export CLASSPATH=.:$JAVA_LIB/tools.jar:$JAVA_LIB/dt.jar export HADOOP_HOME=/software/hadoop-3.2.2 PATH=$PATH:$JAVA_BIN:$HADOOP_HOME/bin: $HADOOP_HOME/sbin export PATH
-
保存用户的环境变量
source ~/.bash_profile
五、修改固定IP
(所有机器都建议修改,修改方式相同)
-
ifconfig查看网卡
ifconfig
-
进入网卡配置文件
vi /etc/sysconfig/network-scripts/ifcfg-ens33
修改
BOOTPROTO="static"(将原来的修改为static)
添加
IPADDR=192.168.158.137(自定义) (和虚拟网卡设置有关 -->网关IP的值) GATEWAY=192.168.158.2 DNS1=192.168.158.2
-
重启网络
service network restart
-
关闭防火墙
systemctl disable firewalld
-
重启,后查看防火墙状态
systemctl status firewalld
六、修改hostname和hosts
(多台机器同时进行)
-
修改主机名
vi /etc/hostname
删除所有然后写上自己的命名,我的命名node0(不同主机不同名)
-
添加其他节点
vi /etc/hosts
192.168.158.137 node0 192.168.158.138 node1 192.168.158.139 node2 192.168.158.140 node3
七、免密登录
(所有主机同时进行)
-
使用脚本快速创建
在家目录里建立一个1.sh,复制下面的东西进入1.sh
cd ~ vi 1.sh
-
复制进去
ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub node0 ssh-copy-id -i ~/.ssh/id_rsa.pub node1 ssh-copy-id -i ~/.ssh/id_rsa.pub node2
-
执行1.sh
cd ~ bash 1.sh
八、建立自己需要的文件夹
-
建立logs文件夹
(也可以不建立 后面运行时会自己creating,并且所有机器都要建 立)
自己决定logs文件夹配置在哪里
-
建立配置文件需要的文件夹
(同样所有主机都要建立相同的)
cd / mkdir data cd data mkdir hadoop cd hadoop mkdir hdfs tmp cd hdfs mkdir name data
九、修改hadoop配置文件
/software/hadoop-3.2.2/etc/hadoop目录下
-
hadoop-env.sh
export JAVA_HOME=/software/jdk1.8.0_202 export HADOOP_HOME=/software/hadoop-3.2.2 export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root
-
core-site.xml
<configuration> <property> <!-- 必须设置:默认文件系统(存储层和运算层解耦 --> <!-- 此处值为uri结构: 使用内置的hdfs系统 端口号一般都是9000 --> <name>fs.defaultFS</name> <value>hdfs://node0:9000</value> </property> <property> <!-- 必须设置:hadoop在本地的工作目录,用于放hadoop进程的临时数据,可以自己指定 --> <name>hadoop.tmp.dir</name> <value>/data/hadoop/tmp</value> </property> </configuration>
-
hdfs-site.xml
(需要自己建立文件夹,前面已经建立了) <configuration> <!-- hdfs存储数据的副本数量(避免一台宕机),可以不设置,默认值是3--> <property> <name>dfs.replication</name> <value>2</value> </property> <!--hdfs 监听namenode的web的地址,默认就是9870端口,如果不改端口也可以不设置 --> <property> <name>dfs.namenode.http-address</name> <value>node0:9870</value> </property> <!-- hdfs保存datanode当前数据的路径,默认值需要配环境变量,建议使用自己创建的路径,方便管理--> <property> <name>dfs.datanode.data.dir</name> <value>/data/hadoop/hdfs/data</value> </property> <!-- hdfs保存namenode当前数据的路径,默认值需要配环境变量,建议使用自己创建的路径,方便管理--> <property> <name>dfs.namenode.name.dir</name> <value>/data/hadoop/hdfs/name</value> </property> </configuration>
-
mapred-site.xml
<configuration> <!-- 必须设置,mapreduce程序使用的资源调度平台,默认值是local,若不改就只能单机运行,不会到集群上了 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 这是3.2以上版本需要增加配置的,不配置运行mapreduce任务可能会有问题,记得使用自己的路径 --> <property> <name>mapreduce.application.classpath</name> <value> /software/hadoop-3.2.2/etc/hadoop, /software/hadoop-3.2.2/share/hadoop/common/*, /software/hadoop-3.2.2/share/hadoop/common/lib/*, /software/hadoop-3.2.2/hadoop/hdfs/*, /software/hadoop-3.2.2/share/hadoop/hdfs/lib/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/lib/*, /software/hadoop-3.2.2/share/hadoop/yarn/*, /software/hadoop-3.2.2/share/hadoop/yarn/lib/* </value> </property> </configuration>
-
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <!-- 必须配置 指定YARN的老大(ResourceManager)在哪一台主机 --> <property> <name>yarn.resourcemanager.hostname</name> <value>node0</value> </property> <!-- 必须配置 提供mapreduce程序获取数据的方式 默认为空 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
-
workers
node0 node1 node2 node3
十、发送配置文件
-
进入配置文件夹
cd /software/hadoop-3.2.2/etc
scp -r hadoop root@node1:/software/hadoop-3.2.2/etc/ scp -r hadoop root@node2:/software/hadoop-3.2.2/etc/ scp -r hadoop root@node3:/software/hadoop-3.2.2/etc/
十一、格式化namenode
-
初始化hadoop
hadoop namenode -format
十二、运行查看
-
运行
start-all.sh
-
查看
jps
十三、windos通过web访问
-
作业端口(主节点IP + 8088端口)
-
资源管理端口
无法访问,二进制编译问题 如果自己编译就不会出现问题
进阶配置
classPath获取方式
hadoop classpath
yarn.application.classpath需要
-
yarn-site.xml新增加
<property> <name>yarn.application.classpath</name> <value> /software/hadoop-3.2.2/etc/hadoop, /software/hadoop-3.2.2/share/hadoop/common/*, /software/hadoop-3.2.2/share/hadoop/common/lib/*, /software/hadoop-3.2.2/hadoop/hdfs/*, /software/hadoop-3.2.2/share/hadoop/hdfs/lib/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/lib/*, /software/hadoop-3.2.2/share/hadoop/yarn/*, /software/hadoop-3.2.2/share/hadoop/yarn/lib/* </value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>node0</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>node0</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>node0</value> </property>
-
mapred-site.xml新增加
<property> <name>yarn.application.classpath</name> <value> /software/hadoop-3.2.2/etc/hadoop, /software/hadoop-3.2.2/share/hadoop/common/*, /software/hadoop-3.2.2/share/hadoop/common/lib/*, /software/hadoop-3.2.2/hadoop/hdfs/*, /software/hadoop-3.2.2/share/hadoop/hdfs/lib/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/*, /software/hadoop-3.2.2/share/hadoop/mapreduce/lib/*, /software/hadoop-3.2.2/share/hadoop/yarn/*, /software/hadoop-3.2.2/share/hadoop/yarn/lib/* </value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/software/hadoop-3.2.2</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/software/hadoop-3.2.2</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/software/hadoop-3.2.2</value> </property>