一、本文說明:
本次測試在一台虛擬機系統上進行偽分布式搭建。Hadoop偽分布式模式是在單機上模擬Hadoop分布式,單機上的分布式並不是真正的偽分布式,而是使用線程模擬分布式。Hadoop本身是無法區分偽分布式和分布式的,兩種配置也很相似。唯一不同的地方是偽分布式是在單機器上配置,數據節點和名字節點均是一個機器。
環境說明:
操作系統:red hat 5.4 x86
hadoop版本:hadoop-0.20.2
JDK版本:jdk1.7
二、JDK安裝及Java環境變量的配置
----首先把壓縮包解壓出來----
1 [root@localhost ~]# tar -zxvf jdk-7u9-linux-i586.tar.gz 2 ----修改目錄名----
3 [root@localhost ~]# mv jdk1.7.0_09 /jdk1.7 4 ----在/etc/profile文件中添加下面幾行----
5 [root@localhost ~]# vi /etc/profile 6 7 export JAVA_HOME=/jdk1.7 8 export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 9 export PATH=$JAVA_HOME/bin:$PATH 10 ----驗證是否已經成功安裝jdk1.7----
11 [root@localhost ~]# java -version 12 java version "1.7.0_09" 13 Java(TM) SE Runtime Environment (build 1.7.0_09-b05) 14 Java HotSpot(TM) Client VM (build 23.5-b02, mixed mode)
三、SSH無密碼驗證設置
Hadoop需要使用SSH協議,namemode將使用SSH協議啟動namenode和datanode進程,偽分布式模式數據節點和名稱節點均是本身,必須配置SSH localhost無密碼驗證。
1 [root@localhost bin]# ssh-keygen -t rsa 2 Generating public/private rsa key pair. 3 Enter file in which to save the key (/root/.ssh/id_rsa): 4 /root/.ssh/id_rsa already exists. 5 Overwrite (y/n)? y 6 Enter passphrase (empty for no passphrase): 7 Enter same passphrase again: 8 Your identification has been saved in /root/.ssh/id_rsa. 9 Your public key has been saved in /root/.ssh/id_rsa.pub. 10 The key fingerprint is: 11 2f:eb:6c:c5:c5:3b:0b:26:a4:7f:0f:7a:d7:3b:5e:e5 root@localhost.localdomain 12 You have mail in /var/spool/mail/root 13 [root@localhost bin]# cd 14 [root@localhost ~]# cd .ssh 15 [root@localhost .ssh]# ls 16 authorized_keys id_rsa id_rsa.pub known_hosts 17 [root@localhost .ssh]# cat id_rsa.pub > authorized_keys 18 [root@localhost .ssh]# ssh 192.168.20.150 19 Last login: Fri Apr 26 11:07:21 2013 from 192.168.20.103 20 [root@localhost ~]# ssh localhost 21 Last login: Fri Apr 26 12:45:43 2013 from master
四、Hadoop配置
4.1、下載hadoop-0.20.2.tar.gz,將其解壓縮到/123目錄下
1 [root@localhost 123]# tar -zxvf hadoop-0.20.2.tar.gz
4.2、進入/123/hadoop-0.20.2/conf,配置Hadoop配置文件
4.3、配置hadoop-env.sh文件
1 [root@localhost conf]# pwd 2 /123/hadoop-0.20.2/conf 3 [root@localhost conf]# vi hadoop-env.sh 4 5 # Set Hadoop-specific environment variables here. 6 7 # The only required environment variable is JAVA_HOME. All others are 8 # optional. When running a distributed configuration it is best to 9 # set JAVA_HOME in this file, so that it is correctly defined on 10 # remote nodes. 11 12 # The java implementation to use. Required. ----下面這句是添加進去的----
13 export JAVA_HOME=/jdk1.7 14 15 # Extra Java CLASSPATH elements. Optional. 16 # export HADOOP_CLASSPATH=
4.4、配置core-site.xml
1 [root@localhost conf]# cat core-site.xml 2 <?xml version="1.0"?> 3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 4 5 <!-- Put site-specific property overrides in this file. --> 6 7 <configuration> 8 <property> 9 <name>fs.default.name</name> 10 <value>hdfs://192.168.20.150:9000</value> 11 </property> 12 <property> 13 <name>hadoop.tmp.dir</name> 14 <value>/123/hadooptmp</value> 15 </property> 16 </configuration>
4.6、配置hdfs-site.xml
1 [root@localhost conf]# cat hdfs-site.xml 2 <?xml version="1.0"?> 3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 4 5 <!-- Put site-specific property overrides in this file. --> 6 7 <configuration> 8 <property> 9 <name>dfs.name.dir</name> 10 <value>/123/hdfs/name</value> 11 </property> 12 <property> 13 <name>dfs.data.dir</name> 14 <value>/123/hdfs/data</value> 15 </property> 16 <property> 17 <name>dfs.replication</name> 18 <value>1</value> 19 </property> 20 </configuration>
4.7、配置mapred-site.xml
1 [root@localhost conf]# cat mapred-site.xml 2 <?xml version="1.0"?> 3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 4 5 <!-- Put site-specific property overrides in this file. --> 6 7 <configuration> 8 <property> 9 <name>mapred.job.tracker</name> 10 <value>localhost:9001</value> 11 </property> 12 </configuration>
4.8、配置masters文件和slaves文件
1 [root@localhost conf]# cat masters 2 192.168.20.150 3 [root@localhost conf]# cat slaves 4 192.168.20.150
注:因為在偽分布式模式下,作為master的namenode與作為slave的datanode是同一台服務器,所以配置文件中的ip是一樣的
4.9、編輯主機名
1 [root@localhost conf]# cat /etc/hosts 2 # Do not remove the following line, or various programs 3 # that require network functionality will fail. 4 127.0.0.1 localhost.localdomain localhost 5 ::1 localhost6.localdomain6 localhost6 6 192.168.20.150 master 7 192.168.20.150 slave
4.10、創建上面被編輯文件中的目錄
1 [root@localhost conf]# mkdir -p /123/hadooptmp 2 3 [root@localhost conf]# mkdir -p /123/hdfs/name 4 5 [root@localhost conf]# mkdir -p /123/hdfs/data
五、啟動Hadoop並進行驗證
5.1、對namenode進行格式化
1 [root@localhost bin]# ./hadoop namenode -format 2 13/04/26 11:08:05 INFO namenode.NameNode: STARTUP_MSG: 3 /************************************************************ 4 STARTUP_MSG: Starting NameNode 5 STARTUP_MSG: host = localhost.localdomain/127.0.0.1 6 STARTUP_MSG: args = [-format] 7 STARTUP_MSG: version = 0.20.2 8 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 9 ************************************************************/ 10 Re-format filesystem in /123/hdfs/name ? (Y or N) Y 11 13/04/26 11:08:09 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel 12 13/04/26 11:08:09 INFO namenode.FSNamesystem: supergroup=supergroup 13 13/04/26 11:08:09 INFO namenode.FSNamesystem: isPermissionEnabled=true 14 13/04/26 11:08:09 INFO common.Storage: Image file of size 94 saved in 0 seconds. 15 13/04/26 11:08:09 INFO common.Storage: Storage directory /123/hdfs/name has been successfully formatted. 16 13/04/26 11:08:09 INFO namenode.NameNode: SHUTDOWN_MSG: 17 /************************************************************ 18 SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1 19 ************************************************************/
5.2、啟動hadoop所有進程
1 [root@localhost bin]# ./start-all.sh 2 starting namenode, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-localhost.localdomain.out 3 192.168.20.150: starting datanode, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-localhost.localdomain.out 4 192.168.20.150: starting secondarynamenode, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out 5 starting jobtracker, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out 6 192.168.20.150: starting tasktracker, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out
5.2、使用jps命令查看hadoop進程是否啟動完全。
1 [root@localhost bin]# jps 2 15219 JobTracker 3 15156 SecondaryNameNode 4 15495 Jps 5 15326 TaskTracker 6 15044 DataNode 7 14959 NameNode
5.3、查看集群狀態:
1 [root@localhost bin]# ./hadoop dfsamin -report 2 Error: Could not find or load main class dfsamin 3 [root@localhost bin]# ./hadoop dfsadmin -report 4 Configured Capacity: 19751522304 (18.4 GB) 5 Present Capacity: 14953619456 (13.93 GB) 6 DFS Remaining: 14953582592 (13.93 GB) 7 DFS Used: 36864 (36 KB) 8 DFS Used%: 0% 9 Under replicated blocks: 0 10 Blocks with corrupt replicas: 0 11 Missing blocks: 0 12 13 ------------------------------------------------- 14 Datanodes available: 1 (1 total, 0 dead) 15 16 Name: 192.168.20.150:50010 17 Decommission Status : Normal 18 Configured Capacity: 19751522304 (18.4 GB) 19 DFS Used: 36864 (36 KB) 20 Non DFS Used: 4797902848 (4.47 GB) 21 DFS Remaining: 14953582592(13.93 GB) 22 DFS Used%: 0% 23 DFS Remaining%: 75.71% 24 Last contact: Fri Apr 26 13:06:15 CST 2013