懶人記錄 Hadoop2.7.1 集群搭建過程
2016-07-02 13:15:45
- 總結
- 除了配置hosts ,和免密碼互連之外,先在一台機器上裝好所有東西
- 配置好之后,拷貝虛擬機,配置hosts和免密碼互連
- 之前在公司裝的時候jdk用的32位,hadoop的native包不能正常加載,浪費好多時間自己編譯,所以jdk務必64位
- 配置免密碼互連 其它也沒什么了,注意下文件的用戶組,不一定是"hadoop",根據自己的情況設置
- sudo chown -R hadoop /opt
- sudo chgrp -R hadoop /opt
- 准備文件
- 虛擬機安裝和配置
我們需要三台虛擬機,可以先裝一台虛擬機,下載好hadoop,配置好JDK,設置好環境變量后拷貝虛擬機
-
- 安裝第一台虛擬機
- 安裝的步驟不說了,說一下注意點
- 注意共享粘貼板
- 多添加一塊網卡,選擇"HostOnly",這樣我們應該就有兩個網卡一個NAT,一個HostOnly.
- 安裝的時候會有一步下載語言包,這步直接Skip.
- 安裝需要用到的軟件和一些設置,其實這塊才是最麻煩的
- 查看一下你當前的組groups
- sudo chown -R hadoop /opt
- sudo chgrp -R hadoop /opt
- 安裝openssh-server,linuxmint默認應該是沒有裝過的. sudo apt-get install openssh-server
- 關閉防火牆 sudo ufw disable
- 查看防火牆狀態 sudo ufw status inactive
- 安裝vim,sudo apt-get install vim
- 修改hostname(三台機器的hostname最好不一樣,比如我是master-hadoop,slave1-hadoop,slave2-hadoop,為了好區分)
- Debian系: vi /etc/hostname
- Redhat系: vi /etc/sysconfig/network
- 重啟
- Debian系: vi /etc/hostname
- 安裝JDK
- 用MobaXterm連接到虛擬機(自己查看一下IP,第一台應該是192.168.56.101)
- 創建lib目錄用來存放一些會用到的組建,比如jdk
- mkdir /opt/lib
- 把下載的jdk上傳到/opt/lib中(用MX直接可以拖放進去)
- 解壓jdk tar -zxvf jdk-8u92-linux-x64.tar.gz
- mv jdk1.8.0_92 jdk8 重命名一下文件夾名稱
- 看一下現在的目錄結構,注意下own和grp 都是hadoop(也可以不是hadoop,但是最好和hadoop相關的文件目錄都屬於一個組,防止權限不足等情況)
-
hadoop@hadoop-pc / $ cd /opt/ hadoop@hadoop-pc /opt $ ll total 16 drwxr-xr-x 4 hadoop hadoop 4096 Jul 2 00:33 ./ drwxr-xr-x 23 root root 4096 Jul 1 23:23 ../ drwxr-xr-x 3 hadoop hadoop 4096 Nov 29 2015 firefox/ drwxr-xr-x 3 hadoop hadoop 4096 Jul 2 01:04 lib/ hadoop@hadoop-pc /opt $ cd lib/ hadoop@hadoop-pc /opt/lib $ ll total 177156 drwxr-xr-x 3 hadoop hadoop 4096 Jul 2 01:04 ./ drwxr-xr-x 4 hadoop hadoop 4096 Jul 2 00:33 ../ drwxr-xr-x 8 hadoop hadoop 4096 Apr 1 12:20 jdk8/ -rw-rw-r-- 1 hadoop hadoop 181389058 Jul 2 01:00 jdk-8u92-linux-x64.tar.gz hadoop@hadoop-pc /opt/lib $ mkdir package hadoop@hadoop-pc /opt/lib $ mv jdk-8u92-linux-x64.tar.gz package/ hadoop@hadoop-pc /opt/lib $ ll total 16 drwxr-xr-x 4 hadoop hadoop 4096 Jul 2 01:08 ./ drwxr-xr-x 4 hadoop hadoop 4096 Jul 2 00:33 ../ drwxr-xr-x 8 hadoop hadoop 4096 Apr 1 12:20 jdk8/ drwxrwxr-x 2 hadoop hadoop 4096 Jul 2 01:08 package/ hadoop@hadoop-pc /opt/lib $ cd jdk8/ hadoop@hadoop-pc /opt/lib/jdk8 $ ll total 25916 drwxr-xr-x 8 hadoop hadoop 4096 Apr 1 12:20 ./ drwxr-xr-x 4 hadoop hadoop 4096 Jul 2 01:08 ../ drwxr-xr-x 2 hadoop hadoop 4096 Apr 1 12:17 bin/ -r--r--r-- 1 hadoop hadoop 3244 Apr 1 12:17 COPYRIGHT drwxr-xr-x 4 hadoop hadoop 4096 Apr 1 12:17 db/ drwxr-xr-x 3 hadoop hadoop 4096 Apr 1 12:17 include/ -rwxr-xr-x 1 hadoop hadoop 5090294 Apr 1 11:33 javafx-src.zip* drwxr-xr-x 5 hadoop hadoop 4096 Apr 1 12:17 jre/ drwxr-xr-x 5 hadoop hadoop 4096 Apr 1 12:17 lib/ -r--r--r-- 1 hadoop hadoop 40 Apr 1 12:17 LICENSE drwxr-xr-x 4 hadoop hadoop 4096 Apr 1 12:17 man/ -r--r--r-- 1 hadoop hadoop 159 Apr 1 12:17 README.html -rw-r--r-- 1 hadoop hadoop 525 Apr 1 12:17 release -rw-r--r-- 1 hadoop hadoop 21104834 Apr 1 12:17 src.zip -rwxr-xr-x 1 hadoop hadoop 110114 Apr 1 11:33 THIRDPARTYLICENSEREADME-JAVAFX.txt* -r--r--r-- 1 hadoop hadoop 177094 Apr 1 12:17 THIRDPARTYLICENSEREADME.txt hadoop@hadoop-pc /opt/lib/jdk8 $
- 設置JAVA_HOME和環境變量
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1)) # and Bourne compatible shells (bash(1), ksh(1), ash(1), ...). if [ "$PS1" ]; then if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then # The file bash.bashrc already sets the default PS1. # PS1='\h:\w\$ ' if [ -f /etc/bash.bashrc ]; then . /etc/bash.bashrc fi else if [ "`id -u`" -eq 0 ]; then PS1='# ' else PS1='$ ' fi fi fi # The default umask is now handled by pam_umask. # See pam_umask(8) and /etc/login.defs. if [ -d /etc/profile.d ]; then for i in /etc/profile.d/*.sh; do if [ -r $i ]; then . $i fi done unset i fi #ADD HERE JAVA_HOME=/opt/lib/jdk8 CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar PATH=$JAVA_HOME/bin:$PATH export JAVA_HOME export CLASSPATH export PATH
- 檢查JAVA版本和環境變量
hadoop@hadoop-pc / $ java -version java version "1.8.0_92" Java(TM) SE Runtime Environment (build 1.8.0_92-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode) hadoop@hadoop-pc / $ echo $JAVA_HOME /opt/lib/jdk8 hadoop@hadoop-pc / $ echo $CLASSPATH .:/opt/lib/jdk8/lib/dt.jar:/opt/lib/jdk8/lib/tools.jar hadoop@hadoop-pc / $ echo $PATH /opt/lib/jdk8/bin:/opt/lib/jdk8/bin:/opt/lib/jdk8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games hadoop@hadoop-pc / $
- 建立幾個hadoop需要用的文件夾
- tmp目錄
- mkdir /opt/hadoop-tmp
- hdfs目錄
- mkdir /opt/hadoop-dfs
- name 目錄
- mkdir /opt/hadoop-dfs/name
- data目錄
- mkdir /opt/hadoop-dfs/data
- tmp目錄
- 上傳hadoop
- 用MX把hadoop的壓縮包上傳到/opt,或者在/opt下 wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
- tar -zxvf hadoop-2.7.1.tar.gz
- mv hadoop-2.7.1.tar.gz lib/package/ 把壓縮包備份到package
- mv hadoop-2.7.1 hadoop 重命名一下文件夾
- 修改一下hadoop的配置文件
- hadoop 的配置文件在/opt/hadoop/etc/hadoop下面
- core-site.xml
-
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/hadoop-tmp</value> <description>Abasefor other temporary directories.</description> </property> </configuration>
-
- hdfs-site.xml
-
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop-dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop-dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
-
- mapred-site.xml
- cp mapred-site.xml.template mapred-site.xml
-
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
- yarn-site.xml
-
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration>
-
- slaves
-
slave1 slave2
-
- hadoop-env.sh
- 修改JAVA_HOME
export JAVA_HOME=/opt/lib/jdk8
- 修改JAVA_HOME
- yarn-env.sh
- 添加JAVA_HOME環境變量
export JAVA_HOME=/opt/lib/jdk
- 添加JAVA_HOME環境變量
- core-site.xml
- 安裝的步驟不說了,說一下注意點
- 安裝第一台虛擬機
- 到此第一個虛擬機配置的差不多了,把這個虛擬機拷貝兩份(注意是完全復制,並且需要重置mac地址),就有了三台虛擬機,分別為 master,slave1,slave2
-
- 修改slave1和slave2的hostname為slave1-hadoop,slave2-hadoop
- 修改三台機器的hosts
- 192.168.56.101 master
192.168.56.102 slave1
192.168.56.103 slave2
- 192.168.56.101 master
- ip不一定,需要自己看下虛機的ip
-
- 配置master可以免密碼登錄其它兩台機器和自己
- 在master上操作
- ssh-keygen -t rsa -P '',一切都選擇默認操作,該輸密碼輸密碼
- ssh-copy-id hadoop@master
- ssh-copy-id hadoop@slave1
- ssh-copy-id hadoop@slave2
- 完成之后測試一下ssh slave1 正常情況下應該不用密碼就直接連接到slave1上
hadoop@master-hadoop ~ $ ssh-keygen -t rsa -P '' Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'. Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: 5c:c9:4c:0c:b6:28:eb:21:b9:6f:db:6e:3f:ee:0d:9a hadoop@master-hadoop The key's randomart image is: +--[ RSA 2048]----+ | oo. | | o =.. | | . . . = | | . o . . | | o o S | | + . | | . . . | | ....o.o | | .o+E++.. | +-----------------+ hadoop@master-hadoop ~ $ ssh-copy-id hadoop@slave1 The authenticity of host 'slave1 (192.168.56.102)' can't be established. ECDSA key fingerprint is d8:fc:32:ed:a7:2c:e1:c7:d7:15:89:b9:f6:97:fb:c3. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys hadoop@slave1's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'hadoop@slave1'" and check to make sure that only the key(s) you wanted were added.
- 格式化namenode
- ./bin/hdfs namenode –format
- 啟動hadoop驗證一下
- ./sbin/start-all.sh
- 正常的日志應該是這樣:
hadoop@master-hadoop /opt/hadoop/sbin $ ./start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [master] master: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-master-hadoop.out slave1: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave1-hadoop.out slave2: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-slave2-hadoop.out Starting secondary namenodes [master] master: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-master-hadoop.out starting yarn daemons starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-master-hadoop.out slave1: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave1-hadoop.out slave2: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-slave2-hadoop.out
- 看下三個節點的jps
hadoop@master-hadoop /opt/hadoop/sbin $ jps 5858 ResourceManager 5706 SecondaryNameNode 5514 NameNode 6108 Jps hadoop@slave2-hadoop ~ $ jps 3796 Jps 3621 NodeManager 3510 DataNode hadoop@slave1-hadoop ~ $ jps 3786 Jps 3646 NodeManager 3535 DataNode
- 一切正常,安裝完畢