上篇已經安裝好了虛擬機了,現在開始配置環境,安裝hadoop
注:hadoop集群最少需要三台機,因為hdfs副本數最少為3,單機不算
我搭了三台機
1、創建hadoop用戶,hadoopgroup組
groupadd -g 102 hadoopgroup # 創建用戶組 useradd -d /opt/hadoop -u 10201 -g 102 hadoop #創建用戶 passwd hadoop #給用戶設置密碼
2、安裝ftp工具
yum -y install vsftpd
啟動ftp:systemctl start vsftpd.service
停止ftp:systemctl stop vsftpd.service
重啟ftp:systemctl restart vsftpd.service
[root@venn08 ~]# systemctl start vsftpd.service # 啟動,無提示信息 [root@venn08 ~]# ps -ef|grep vsft #查看進程已存在,直接使用ftp工具連接 root 1257 1 0 09:41 ? 00:00:00 /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf root 1266 1125 0 09:42 pts/0 00:00:00 grep --color=auto vsft [root@venn08 ~]# systemctl restart vsftpd.service
注:使用vsftpd 安裝后就可以使用系統用戶作為ftp用戶登錄,選項同系統權限,不需要額外配置。
2、安裝jdk、hadoop
將下載的jdk、hadoop拷貝到服務器上,解壓,修改目錄名
[hadoop@venn05 ~]$ pwd /opt/hadoop [hadoop@venn05 ~]$ ll drwxr-xr-x. 11 hadoop hadoopgroup 172 Apr 3 20:49 hadoop3 -rw-r--r--. 1 hadoop hadoopgroup 307606299 Apr 2 22:30 hadoop-3.0.1.tar.gz drwxr-xr-x. 8 hadoop hadoopgroup 255 Apr 1 2016 jdk1.8 -rw-r--r--. 1 hadoop hadoopgroup 181367942 May 26 2016 jdk-8u91-linux-x64.tar.gz
修改目錄名,是為了方便書寫
3、配置Java、hadoop環境變量
在最后添加Java、hadoop環境變量,注意路徑不要寫錯即可
[hadoop@venn05 ~]$ vim .bashrc [hadoop@venn05 ~]$ more .bashrc # .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # Uncomment the following line if you don't like systemctl's auto-paging feature: # export SYSTEMD_PAGER= # User specific aliases and functions #jdk export JAVA_HOME=/opt/hadoop/jdk1.8 export JRE_HOME=${JAVA_HOME}/jre export CLASS_PATH=${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH #hadoop export HADOOP_HOME=/opt/hadoop/hadoop3 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
4、切換root用戶,修改各機/etc/hosts
[root@venn05 hadoop]# vim /etc/hosts [root@venn05 hadoop]# more /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.5 venn05 192.168.1.6 venn06 192.168.1.7 venn07
其他幾台機操作相同
5、創建ssh密鑰
[hadoop@venn08 ~]$ mkdir .ssh # 創建.ssh 目錄 [hadoop@venn08 ~]$ cd .ssh/ [hadoop@venn08 .ssh]$ ls [hadoop@venn08 .ssh]$ pwd /opt/hadoop/.ssh [hadoop@venn08 .ssh]$ ssh-keygen -t rsa # 創建ssh秘鑰,一路回車下去 Generating public/private rsa key pair. Enter file in which to save the key (/opt/hadoop/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /opt/hadoop/.ssh/id_rsa. Your public key has been saved in /opt/hadoop/.ssh/id_rsa.pub. The key fingerprint is: SHA256:rrlJxkA6o4eKwDKWjbx5CgyH+9EzbUiyfFHnJzgTL5w hadoop@venn08 The key's randomart image is: +---[RSA 2048]----+ | | | | | . o . | | . o o B | |o * + E S . | |=*+B * = o | |OB=.* * . | |B+o+ * + | |o++ =. | +----[SHA256]-----+ [hadoop@venn08 .ssh]$ ll # 查看 total 8 -rw-------. 1 hadoop hadoopgroup 1679 Apr 24 10:17 id_rsa # 私鑰,本機保存 -rw-r--r--. 1 hadoop hadoopgroup 395 Apr 24 10:17 id_rsa.pub #公鑰,復制到其他機器
每台機都執行以上步驟,創建 ssh 密鑰
6、合並每台機器的公鑰,放到每台機器上
Venn05 : 復制公鑰到文件 : cat id_rsa.pub >> authorized_keys 拷貝到 venn 06 : scp authorized_keys hadoop@venn06:~/.ssh/authorized_keys Venn 06 : 拷貝venn06的公鑰到 authorized_keys : cat id_rsa.pub >> authorized_keys 拷貝到 venn07 : scp authorized_keys hadoop@venn07:~/.ssh/authorized_keys Venn07 : 復制公鑰到文件 : cat id_rsa.pub >> authorized_keys 拷貝到 venn 05 : scp authorized_keys hadoop@venn05:~/.ssh/authorized_keys 拷貝到 venn 06 : scp authorized_keys hadoop@venn05:~/.ssh/authorized_keys
多機類推
至此,配置完成,現在各機hadoop用戶可以免密登錄。
7、修改 hadoop環境配置:hadoop-env.sh
進入路徑: /opt/hadoop/hadoop3/etc/hadoop,打開 hadoop-env.sh 修改:
export JAVA_HOME=/opt/hadoop/jdk1.8 # 執行jdk
8、修改hadoop核心配置文件 : core-site.sh ,添加如下內容
<configuration> <!--hdfs臨時路徑--> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/hadoop3/tmp</value> </property> <!--hdfs 的默認地址、端口 訪問地址--> <property> <name>fs.defaultFS</name> <value>hdfs://venn05:8020</value> </property> </configuration>
9、修改yarn-site.sh ,添加如下內容
<configuration> <!-- Site specific YARN configuration properties --> <!--集群master,--> <property> <name>yarn.resourcemanager.hostname</name> <value>venn05</value> </property> <!-- NodeManager上運行的附屬服務--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--容器可能會覆蓋的環境變量,而不是使用NodeManager的默認值--> <property> <name>yarn.nodemanager.env-whitelist</name> <value> JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ</value> </property> <!-- 關閉內存檢測,虛擬機需要,不配會報錯--> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
10、修改mapred-site.xml ,添加如下內容
<configuration> <!--local表示本地運行,classic表示經典mapreduce框架,yarn表示新的框架--> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--如果map和reduce任務訪問本地庫(壓縮等),則必須保留原始值 當此值為空時,設置執行環境的命令將取決於操作系統: Linux:LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native. windows:PATH =%PATH%;%HADOOP_COMMON_HOME%\\bin. --> <property> <name>mapreduce.admin.user.env</name> <value>HADOOP_MAPRED_HOME=/opt/hadoop/hadoop3</value> </property> <!-- 可以設置AM【AppMaster】端的環境變量 如果上面缺少配置,可能會造成mapreduce失敗 --> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/opt/hadoop/hadoop3</value> </property> </configuration>
11、修改hdfs-site.xml ,添加如下內容
<configuration> <!--hdfs web的地址 --> <property> <name>dfs.namenode.http-address</name> <value>venn05:50070</value> </property> <!-- 副本數--> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- 是否啟用hdfs權限檢查 false 關閉 --> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!-- 塊大小,默認字節, 可使用 k m g t p e--> <property> <name>dfs.blocksize</name> <!--128m--> <value>134217728</value> </property> </configuration>
12、修workers 文件
[hadoop@venn05 hadoop]$ more workers venn05 # 第一個為master venn06 venn07
至此,hadoop master配置完成
13、scp .bashrc 、jdk 、hadoop到各個節點
進入hadoop home目錄 cd ~
scp -r .bashrc jdk1.8 hadoop3 hadoop@192.168.1.8:/opt/hadoop/
至此hadoop集群搭建完成。
14、啟動hadoop:
格式化命名空間:
hdfs namenode –formate
啟動集群:
start-all.sh
輸出:
[hadoop@venn05 ~]$ start-all.sh WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds. WARNING: This is not a recommended production deployment configuration. WARNING: Use CTRL-C to abort. Starting namenodes on [venn05] Starting datanodes Starting secondary namenodes [venn05] Starting resourcemanager Starting nodemanagers
jps 查看進程:
[hadoop@venn05 ~]$ jps 5904 Jps 5733 NodeManager 4871 NameNode 5431 ResourceManager 5211 SecondaryNameNode [hadoop@venn05 ~]$
查看其它節點狀態:
[hadoop@venn06 hadoop]$ jps 3093 NodeManager 3226 Jps 2973 DataNode
hadoop啟動成功
查看yarn web 控制台:
http://venn05:8088/cluster
查看 hdfs web 控制台:
http://venn05:50070/dfshealth.html#tab-overview
搞完收工,明天繼續寫ntp 和搭建過程中遇到的坑