0. 安裝JDK
參考網上教程在OSX下安裝jdk
1. 下載及安裝hadoop
a) 下載地址:
http://hadoop.apache.org
b) 配置ssh環境
在terminal里面輸入: ssh localhost
如果有錯誤提示信息,表示當前用戶沒有權限。這個多半是系統為安全考慮,默認設置的。
更改設置如下:進入system preference --> sharing --> 勾選remote login,並設置allow access for all users。
再次輸入“ssh localhost",再輸入密碼並確認之后,可以看到ssh成功。
c) ssh免登陸配置
命令行輸入:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
ssh-keygen表示生成秘鑰;-t表示秘鑰類型;-P用於提供密語;-f指定生成的秘鑰文件。
這個命令在”~/.ssh/“文件夾下創建兩個文件id_dsa和id_dsa.pub,是ssh的一對兒私鑰和公鑰。
接下來,將公鑰追加到授權的key中去,輸入:
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
********************************************************************************
免密碼登錄localhost
1. ssh-keygen -t rsa Press enter for each line 提示輸入直接按回車就好
2. cat ~/.ssh/id_rsa.pub
Host localhost AddKeysToAgent yes UseKeychain yes IdentityFile ~/.ssh/id_rsa
測試 ssh localhost
,不再提示需要輸入密碼。
********************************************************************************
d) 設置環境變量
在實際啟動Hadoop之前,有三個文件需要進行配置。
但在這之前,我們需要在我們的bash_profile中配置如下幾個配置
命令行輸入:
open ~/.bash_profile;
# hadoop
export HADOOP_HOME=/Users/YourUserName/Documents/Dev/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
e) 配置hadoop-env.sh
在${HADOOP_HOME}/etc/hadoop目錄下,找到hadoop-env.sh,打開編輯確認如下設置是否正確:
export JAVA_HOME=${JAVA_HOME}
export HADOOP_HEAPSIZE=2000(去掉注釋)
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"(去掉注釋)
f) 配置core-site.xml——指定了NameNode的主機名與端口
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>/Users/YourUserName/Documents/Dev/hadoop-2.7.3/hadoop-${user.name}</name> <value>hdfs://localhost:9000</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> </configuration>
g) 配置hdfs-site.xml——指定了HDFS的默認參數副本數,因為僅運行在一個節點上,所以這里的副本數為1
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
h) 配置mapred-site.xml——指定了JobTracker的主機名與端口
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>hdfs://localhost:9001</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> </property> </configuration>
i) 安裝HDFS
經過以上的配置,就可以進行HDFS的安裝了
命令行輸入:
cd $HADOOP_HOME/bin
hadoop namenode -format
如果出現下圖, 說明你的HDFS已經安裝成功了
j) 啟動Hadoop
cd ${HADOOP_HOME}/sbin
start-dfs.sh
start-yarn.sh
k) 驗證hadoop
如果在啟動過程中沒有發生任何錯誤
啟動完成之后,在命令行輸入: jps
如果結果如下:
3761 DataNode
4100 Jps
3878 SecondaryNameNode
3673 NameNode
4074 NodeManager
3323 ResourceManager
以上幾個節點都打印出來,那么恭喜你,你已經成功安裝和啟動hadoop了!
最后,我們可以在瀏覽器通過http的方式進行驗證
瀏覽器輸入:
http://localhost:8088/
結果如下:
瀏覽器輸入:
http://localhost:50070/
結果如下:
2. 常見錯誤解決
hadoop namenode不能啟動
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-javoft/dfs/name is in an inconsistent state: storage di rectory does not exist or is not accessible.
原因在於core-site.xml
你必須覆蓋hadoop.tmp.dir為你自己的hadoop目錄
...
hadoop.tmp.dir
/home/javoft/Documents/hadoop/hadoop-${user.name}