首先我們要准備如下環境及軟件:
win7(64位) cygwin 1.7.9-1 jdk-6u25-windows-x64.zip hadoop-0.20.2.tar.gz
1.在win7系統上正常安裝jdk,同時注意設置好java環境的變量:
主要的變量包括:JAVA_HOME,PATH,CLASSPATH
(不會設置的請自備梯子)
2.接下來是安裝Hadoop,我目前安裝的版本為0.20.2版本,為了方便,
我暫時是直接放到了cygwin64的/home目錄下(正常情況下,請放在/usr目錄下),
並使用tar命令進行解壓操作。
lenovo@lenovo-PC /home $ tar -zxvf hadoop-0.20.2.tar.gz
3.光安裝完Hadoop是不夠的,還需要一些簡單的配置工作,主要的配置文件有4個,
它們位於Hadoop的安裝目錄的conf子目錄下,分別是:
hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml
下面將是如何修改的詳細部分:
(1) 修改hadoop-env.sh文件:
這步比較簡單,只需要將JAVA_HOME 修改成JDK 的安裝目錄即可:
紅色標出的是修改后的樣子。
# Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME=/cygdrive/d/android/java/jdk1.7.0_15 # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is 1000. # export HADOOP_HEAPSIZE=2000 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" # export HADOOP_TASKTRACKER_OPTS= # The following applies to multiple commands (fs, dfs, fsck, distcp etc) # export HADOOP_CLIENT_OPTS # Extra ssh options. Empty by default. # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. # export HADOOP_PID_DIR=/var/hadoop/pids # A string representing this instance of hadoop. $USER by default. # export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10
(注意:這里的路徑不能是windows 風格的目錄d:\java\jdk1.7.0_15,而是LINUX 風格/cygdrive/d/java/jdk1.7.0_15)
(2) 修改core-site.xml:
紅色標出的是增加的代碼。
(3)修改hdfs-site.xml(指定副本為1)
紅色標出的是增加的代碼。
(4) 修改mapred-site.xml (指定jobtracker)
紅色標出的是增加的代碼。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
4.驗證安裝是否成功,並運行Hadoop
(1) 驗證安裝
(2) 格式化並啟動Hadoop
$ bin/hadoop namenode –format 15/07/09 10:47:51 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = lenovo-PC/192.168.41.1 STARTUP_MSG: args = [▒Cformat] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ Usage: java NameNode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] 15/07/09 10:47:51 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at lenovo-PC/192.168.41.1 ************************************************************/
$ bin/start-all.sh starting namenode, logging to /home/hadoop-0.20.2/bin/../logs/hadoop-lenovo-namenode-lenovo-PC.out localhost: /home/hadoop-0.20.2/bin/slaves.sh: line 61: ssh: command not found localhost: /home/hadoop-0.20.2/bin/slaves.sh: line 61: ssh: command not found starting jobtracker, logging to /home/hadoop-0.20.2/bin/../logs/hadoop-lenovo-jobtracker-lenovo-PC.out localhost: /home/hadoop-0.20.2/bin/slaves.sh: line 61: ssh: command not found
(3) 查看Hadoop
命令行查看:
(注意:win7下cygwin中DateNode和TaskTracker進程是無法顯示的,應該是cygwin的問題)
現在可以網頁查看效果了:
(4) 關閉Hadoop
bin/stop-all.sh
版權申明:本文有部分內容是參考網上的資料,如有疑問請聯系,謝謝合作。