雖然我已經裝了個Cloudera的CDH集群(教程詳見:http://www.cnblogs.com/pojishou/p/6267616.html),但實在太吃內存了,而且給定的組件版本是不可選的,如果只是為了研究研究技術,而且是單機,內存較小的情況下,還是建議安裝Apache的原生的集群拿來玩,生產上自然是Cloudera的集群,除非有十分強大的運維。
我這次配了3台虛擬機節點。各給了4G,要是宿主機內存就8G的,可以搞3台2G,應該也是ok的。
Apache Hadoop集群離線安裝部署(一)——Hadoop(HDFS、YARN、MR)安裝:http://www.cnblogs.com/pojishou/p/6366542.html
Apache Hadoop集群離線安裝部署(二)——Spark-2.1.0 on Yarn安裝:http://www.cnblogs.com/pojishou/p/6366570.html
Apache Hadoop集群離線安裝部署(三)——Hbase安裝:http://www.cnblogs.com/pojishou/p/6366806.html
〇、安裝文件准備
Hadoop 2.7.3:http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
一、虛擬機准備
設置ip地址,hosts,ssh免密登錄,scp,sudo,關閉防火牆,yum,ntp時間同步 略。
Java安裝 略。
參考:http://www.cnblogs.com/pojishou/p/6267616.html
二、Hadoop安裝
1、解壓
tar -zxvf hadoop-2.7.3.tar.gz -C /opt/program/ ln -s /opt/hadoop-2.7.3 /opt/hadoop
2、修改配置文件
(1)、hadoop-env.sh
vi /opt/hadoop/etc/hadoop/hadoop-env.sh export JAVA_HOME=/opt/java
(2)、core-site.xml
vi /opt/hadoop/etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node00:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> </configuration>
(3)、hdfs-site.xml
vi /opt/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hadoop/data/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hadoop/data/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.secondary.http.address</name> <value>node00:50090</value> </property> </configuration>
(4)、mapred-site.xml
vi /opt/hadoop/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
(5)、yarn-site.xml
vi /opt/hadoop/etc/hadoop/yarn-site.xml <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>node00</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
(6)、slaves
node01
node02
3、初始化HDFS
/opt/hadoop/bin/hadoop namenode -format
4、啟動集群
/opt/hadoop/sbin/start-all.sh
5、測試
/opt/hadoop/bin/hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 5 10
求出pi就ok了