偽分布模式:
Hadoop可以在單節點上以偽分布模式運行,用不同的Java進程模擬分布式運行中的各類節點。
1、安裝Hadoop
確保系統已安裝好JDK和ssh。
1)在官網下載Hadoop: http://hadoop.apache.org/ 我這里下載的是 hadoop-1.1.1-bin.tar.gz
2)下載后放到/softs目錄下
3)將hadoop-1.1.1-bin.tar.gz解壓到/usr目錄
[root@localhost usr]# tar -zxvf /softs/hadoop-1.1.1-bin.tar.gz
[root@localhost usr]# ls bin etc games hadoop-1.1.1 include java lib libexec local lost+found sbin share src tmp [root@localhost usr]#
2、配置Hadoop
1)配置/usr/hadoop-1.1.1/conf/hadoop-env.sh文件,找到 export JAVA_HOME,修改為JDK的安裝路徑
export JAVA_HOME=/usr/java/jdk1.6.0_38
2)配置/usr/hadoop-1.1.1/conf/core-site.xml,內容如下:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
3)配置/usr/hadoop-1.1.1/conf/hdfs-site.xml,內容如下:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
4)配置/usr/hadoop-1.1.1/conf/mapred-site.xml,內容如下:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
3、免密碼SSH設置
Hadoop運行過程中需要使用ssh管理遠端Hadoop守護進程,需要不用輸入密碼就可以訪問,所以這里設置用密鑰驗證。
1)生成密鑰,執行如下命令:
[root@localhost ~]# ssh-keygen -t rsa
提示輸入Enter passphrase (empty for no passphrase): 和 Enter same passphrase again:時,無需輸入任何東西,直接按回車鍵即可。
[root@localhost ~]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 16:54:ed:23:0c:04:fa:74:1b:b0:b5:eb:c3:87:43:52 root@localhost.localdomain The key's randomart image is: +--[ RSA 2048]----+ | oo+... | | . =.. . | | . o Eo . | | o o =o o | | o S . . | | * . | | * . | | + | | | +-----------------+ [root@localhost ~]# ls
從上面執行的結果可以看到,生成的密鑰已經保存到/root/.ssh/id_rsa中。
2)進入到/root/.ssh目錄,執行如下命令
[root@localhost .ssh]# cp id_rsa.pub authorized_keys
3)接着執行:
[root@localhost .ssh]# ssh localhost
可以實現無需輸入密碼就可以用ssh連接
[root@localhost .ssh]# ssh localhost The authenticity of host 'localhost (::1)' can't be established. RSA key fingerprint is e5:44:06:97:b4:66:ba:89:40:95:ba:23:0a:06:2a:74. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Last login: Tue Jan 15 22:08:06 2013 from 192.168.0.101 Hello,man [root@localhost ~]#
4、運行Hadoop
1)格式化分布式文件系統:
[root@localhost hadoop-1.1.1]# bin/hadoop namenode -format
[root@localhost hadoop-1.1.1]# bin/hadoop namenode -format 13/01/15 23:56:53 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost.localdomain/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.1.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1411108; compiled by 'hortonfo' on Mon Nov 19 10:48:11 UTC 2012 ************************************************************/ 13/01/15 23:56:54 INFO util.GSet: VM type = 32-bit 13/01/15 23:56:54 INFO util.GSet: 2% max memory = 19.33375 MB 13/01/15 23:56:54 INFO util.GSet: capacity = 2^22 = 4194304 entries 13/01/15 23:56:54 INFO util.GSet: recommended=4194304, actual=4194304 13/01/15 23:56:55 INFO namenode.FSNamesystem: fsOwner=root 13/01/15 23:56:55 INFO namenode.FSNamesystem: supergroup=supergroup 13/01/15 23:56:55 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/01/15 23:56:55 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/01/15 23:56:55 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/01/15 23:56:55 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/01/15 23:56:55 INFO common.Storage: Image file of size 110 saved in 0 seconds. 13/01/15 23:56:55 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-root/dfs/name/current/edits 13/01/15 23:56:55 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-root/dfs/name/current/edits 13/01/15 23:56:55 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted. 13/01/15 23:56:55 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1 ************************************************************/ [root@localhost hadoop-1.1.1]#
2)啟動Hadoop守護進程
[root@localhost hadoop-1.1.1]# bin/start-all.sh
[root@localhost hadoop-1.1.1]# bin/start-all.sh starting namenode, logging to /usr/hadoop-1.1.1/libexec/../logs/hadoop-root-namenode-localhost.localdomain.out localhost: starting datanode, logging to /usr/hadoop-1.1.1/libexec/../logs/hadoop-root-datanode-localhost.localdomain.out localhost: starting secondarynamenode, logging to /usr/hadoop-1.1.1/libexec/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out starting jobtracker, logging to /usr/hadoop-1.1.1/libexec/../logs/hadoop-root-jobtracker-localhost.localdomain.out localhost: starting tasktracker, logging to /usr/hadoop-1.1.1/libexec/../logs/hadoop-root-tasktracker-localhost.localdomain.out [root@localhost hadoop-1.1.1]#
啟動后,可以通過 http://localhost:50070/訪問到NameNode:
NameNode是HDFS的守護程序,負責記錄文件是如何分割成數據塊的,以及這些數據塊分別被存儲到哪些數據節點上。它的主要功能是對內存以及I/O進行集中管理。
通過 http://localhost:50030/ 訪問JobTracker
JobTracker后台程序用來連接應用程序與Hadoop。用戶代碼提交到集群以后,由JobTracker決定哪個文件將被處理,並且為不同的task分配節點。
3)停止Hadoop守護進程
[root@localhost hadoop-1.1.1]# bin/stop-all.sh