軟件版本
Hadoop版本號:hadoop-2.6.0-cdh5.7.0;
VMWare版本號:VMware 9或10
Linux系統:CentOS 6.4-6.5 或Ubuntu版本號:ubuntu-14.04.1-desktop-i386
Jdk版本號:Jdk1.7.0._79
后三項對版本要求不嚴格,如果使用Hbase1.0.0版本,需要JDK1.8以上版本
安裝教程
1、VMWare安裝教程
VMWare虛擬機是個軟件,安裝后可用來創建虛擬機,在虛擬機上再安裝系統,在這個虛擬系統上再安裝應用軟件,所有應用就像操作一台真正的電腦,
請直接到VMWare官方網站下載相關軟件 http://www.vmware.com/cn/products/workstation/workstation-evaluation
以上鏈接如果因為官方網站變動發生變化,可以直接在搜索引擎中搜索VMWare來查找其下載地址,建議不要在非官方網站下載。
安裝試用版后有30天的試用期。
2、Ubuntu安裝教程
打開VMWare點擊創建新的虛擬機
選擇典型
點擊瀏覽
選擇ubuntu
暫時只建兩個虛擬機,注意分別給兩個虛擬機起名為Ubuntu1和Ubuntu2;也可以按照自己的習慣取名,但是后續的許多配置文件要相應更改,會帶來一些麻煩。
密碼也請記牢,后面會經常使用。
3、安裝VMWare-Tools
Ubuntu中會顯示有光盤插入了光驅
雙擊打開光盤將光盤中VMwareTools-9.6.1-1378637.tar.gz復制到桌面,復制方法類似windows系統操作。
點擊Extract Here
從菜單打開Ubuntu的控制終端
cd Desktop/vmware-tools-distrib/
sudo ./vmware-install.pl
輸入root密碼,一路回車,重啟系統
注意1: ubuntu安裝后, root 用戶默認是被鎖定了的,不允許登錄,也不允許“ su” 到 root 。
允許 su 到 root
非常簡單,下面是設置的方法:
注意2:ubuntu安裝后要更新軟件源:
cd /etc/apt
sudo apt-get update
安裝各種軟件比較方便
4、共享文件夾的創建
宿主機與虛擬機共享文件夾的創建
1)點擊虛擬機->設置,點擊選項->共享文件夾,選擇總是啟用,點擊添加按鈕
2)點擊下一步
3)選擇共享文件夾路徑(此路徑為本地文件路徑),點擊下一步
4)選擇啟用該共享,點擊完成
5)點擊確定
6)則可以在如圖所示文件夾下尋找共享文件夾
5、用戶創建
創建hadoop用戶組: sudo addgroup hadoop
創建hduser用戶:sudo adduser -ingroup hadoop hduser
注意這里為hduser用戶設置同主用戶相同的密碼
為hadoop用戶添加權限:sudo gedit /etc/sudoers,在root ALL=(ALL) ALL下添加
hduser ALL=(ALL) ALL。
設置好后重啟機器:sudo reboot
切換到hduser用戶登錄;
6、克隆Ubuntu
通過克隆的方法安裝Ubnutu
1)在安裝好的ubnutu上右鍵單機,選擇管理->克隆
2)點擊下一步
3)選擇虛擬機的當前狀態,點擊下一步
4)選擇創建一個完整克隆,點擊下一步
5)填寫新虛擬機的名稱和安裝位置,點擊完成
6)點擊關閉,完成克隆
7、主機配置
Hadoop集群中包括2個節點:1個Master,2個Salve,其中虛擬機Ubuntu1既做Master,也做Slave;虛擬機Ubuntu2只做Slave。
配置hostname:Ubuntu下修改機器名稱: sudo gedit /etc/hostname ,改為Ubuntu1;修改成功后用重啟命令:hostname,查看當前主機名是否設置成功;
此時可以用虛擬機克隆的方式再復制一個。(先關機 vmware 菜單--虛擬機-管理--克隆)
注意:修改克隆的主機名為Ubuntu2。
配置hosts文件:查看Ubuntu1和Ubuntu2的ip:ifconfig;
打開hosts文件:sudo gedit /etc/hosts,添加如下內容:
192.168.xxx.xxx Ubuntu1
192.168.xxx.xxx Ubuntu2
注意這里的ip地址需要學員根據自己的電腦的ip設置。
在Ubuntu1上執行命令:ping Ubuntu2,若能ping通,則說明執行正確。
8、SSH無密碼驗證配置
安裝ssh服務器,默認安裝了ssh客戶端:sudo apt-get install openssh-server;
在Ubuntu1上生成公鑰和秘鑰:ssh-keygen -t rsa -P "" ;
查看路徑 /home/hduser/.ssh文件里是否有id_rsa和id_rsa.pub;
將公鑰賦給authorized_keys:cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys;無密碼登錄:ssh localhost;
無密碼登陸到Ubuntu2,在Ubuntu1上執行:ssh-copy-id Ubuntu2,查看Ubuntu2的/home/hduser/.ssh文件里是否有authorized_keys;
在Ubuntu1上執行命令:ssh Ubuntu2,首次登陸需要輸入密碼,再次登陸則無需密碼;
若要使Ubuntu2無密碼登錄Ubuntu1,則在Ubutu2上執行上述相同操作即可。
注:若無密碼登錄設置不成功,則很有可能是文件夾/文件權限問題,修改文件夾/文件權限即可。sudo chmod 777 “文件夾” 即可。
注意:
在執行命令sudo apt-get install openssh-server時,可能出現如下錯誤:
這個問題的原因是ubuntu的/etc/apt/source.list中的源比較舊了,需要更新一下。
更新方法:執行命令sudo apt-get -y update
更新完畢之后,在使用sudo apt-get install openssh-server就沒有問題了。
當執行命令sudo apt-get -y update時有報如下錯:
9、Java環境配置
獲取opt文件夾權限:sudo chmod 777 /opt
將java壓縮包放在/opt/,root模式執行sudo ./jdk-6u45-linux-i586.bin
配置jdk的環境變量:sudo gedit /etc/profile,將一下內容復制進去並保存
# java export JAVA_HOME=/opt/jdk1.6.0_45 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
執行命令,使配置生效:source /etc/profile;
執行命令:java -version,若出現java版本號,則說明安裝成功。
10. hadoop全分布式集群安裝
10.1 安裝
將hadoop壓縮包hadoop-2.6.0.tar.gz放在/home/hduser目錄下,並解壓縮到本地,重命名為hadoop;配置hadoop環境變量,執行:sudo gedit /etc/profile,將以下復制到profile內:
#hadoop export HADOOP_HOME=/home/hduser/hadoop export PATH=$HADOOP_HOME/bin:$PATH
執行:source /etc/profile
注意:Ubuntu1、ubuntu2都要配置以上步驟;
10.2 配置
主要涉及的配置文件有7個:都在/hadoop/etc/hadoop文件夾下,可以用gedit命令對其進行編輯
(1)進去hadoop配置文件目錄
cd /home/hduser/hadoop/etc/hadoop/
(2)配置hadoop-env.sh文件-->修改JAVA_HOME
gedit hadoop-env.sh
添加如下內容
# The java implementation to use. export JAVA_HOME=/opt/jdk1.6.0_45
(3)配置yarn-env.sh 文件-->>修改JAVA_HOME
添加如下內容
# some Java parameters export JAVA_HOME=/opt/jdk1.6.0_45
(4)配置slaves文件-->>增加slave節點
(刪除原來的localhost)
添加如下內容
Ubuntu1 Ubuntu2
(5)配置core-site.xml文件-->>增加hadoop核心配置
(hdfs文件端口是9000、file:/home/hduser/hadoop/tmp)
添加如下內容
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Ubuntu1:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hduser/hadoop/tmp</value> <description>Abasefor other temporary directories.</description> </property> <property> <name>hadoop.native.lib</name> <value>true</value> <description>Should native hadoop libraries, if present, be used.</description> </property> </configuration>
(6)配置 hdfs-site.xml文件-->>增加hdfs配置信息
(namenode、datanode端口和目錄位置)
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>Ubuntu1:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hduser/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value> file:/home/hduser/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
(7)配置 mapred-site.xml文件-->>增加mapreduce配置
(使用yarn框架、jobhistory使用地址以及web地址)
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>Ubuntu1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value> Ubuntu1:19888</value> </property> </configuration>
(8)配置 yarn-site.xml文件-->>增加yarn功能
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>Ubuntu1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Ubuntu1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Ubuntu1:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>Ubuntu1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>Ubuntu1:8088</value> </property> </configuration>
(9)將配置好的Ubuntu1中/hadoop/etc/hadoop文件夾復制到到Ubuntu2對應位置(刪除Ubuntu2原來的文件夾/hadoop/etc/hadoop)
scp -r /home/hduser/hadoop/etc/hadoop/ hduser@Ubuntu2:/home/hduser/hadoop/etc/
10.3 驗證
下面驗證Hadoop配置是否正確:
(1)格式化namenode:
hduser@Ubuntu1:~$ cd hadoop hduser@Ubuntu1:~/hadoop$ ./bin/hdfs namenode -format hduser@Ubuntu2:~$ cd hadoop hduser@Ubuntu2:~/hadoop$ ./bin/hdfs namenode -format 注意:上面只要出現“successfully formatted”就表示成功了。
(2)啟動hdfs:
hduser@Ubuntu1:~/hadoop$ ./sbin/start-dfs.sh 15/04/27 04:18:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [Ubuntu1] Ubuntu1: starting namenode, logging to /home/hduser/hadoop/logs/hadoop-hduser-namenode-Ubuntu1.out Ubuntu1: starting datanode, logging to /home/hduser/hadoop/logs/hadoop-hduser-datanode-Ubuntu1.out Ubuntu2: starting datanode, logging to /home/hduser/hadoop/logs/hadoop-hduser-datanode-Ubuntu2.out Starting secondary namenodes [Ubuntu1] Ubuntu1: starting secondarynamenode, logging to /home/hduser/hadoop/logs/hadoop-hduser-secondarynamenode-Ubuntu1.out 15/04/27 04:19:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
查看java進程(Java Virtual Machine Process Status Tool)
hduser@Ubuntu1:~/hadoop$ jps 8008 NameNode 8443 Jps 8158 DataNode 8314 SecondaryNameNode
(3)停止hdfs:
hduser@Ubuntu1:~/hadoop$ ./sbin/stop-dfs.sh Stopping namenodes on [Ubuntu1] Ubuntu1: stopping namenode Ubuntu1: stopping datanode Ubuntu2: stopping datanode Stopping secondary namenodes [Ubuntu1] Ubuntu1: stopping secondarynamenode
查看java進程
hduser@Ubuntu1:~/hadoop$ jps 8850 Jps
(4)啟動yarn:
hduser@Ubuntu1:~/hadoop$ ./sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-resourcemanager-Ubuntu1.out Ubuntu2: starting nodemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-nodemanager-Ubuntu2.out Ubuntu1: starting nodemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-nodemanager-Ubuntu1.out
查看java進程
hduser@Ubuntu1:~/hadoop$ jps 8911 ResourceManager 9247 Jps 9034 NodeManager
(5)停止yarn:
hduser@Ubuntu1:~/hadoop$ ./sbin/stop-yarn.sh stopping yarn daemons stopping resourcemanager Ubuntu1: stopping nodemanager Ubuntu2: stopping nodemanager no proxyserver to stop
查看java進程
hduser@Ubuntu1:~/hadoop$ jps 9542 Jps
(6)查看集群狀態:
首先啟動集群:./sbin/start-dfs.sh hduser@Ubuntu1:~/hadoop$ ./bin/hdfs dfsadmin -report Configured Capacity: 39891361792 (37.15 GB) Present Capacity: 28707627008 (26.74 GB) DFS Remaining: 28707569664 (26.74 GB) DFS Used: 57344 (56 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Live datanodes (2): Name: 192.168.159.132:50010 (Ubuntu2) Hostname: Ubuntu2 Decommission Status : Normal Configured Capacity: 19945680896 (18.58 GB) DFS Used: 28672 (28 KB) Non DFS Used: 5575745536 (5.19 GB) DFS Remaining: 14369906688 (13.38 GB) DFS Used%: 0.00% DFS Remaining%: 72.05% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Mon Apr 27 04:26:09 PDT 2015 Name: 192.168.159.131:50010 (Ubuntu1) Hostname: Ubuntu1 Decommission Status : Normal Configured Capacity: 19945680896 (18.58 GB) DFS Used: 28672 (28 KB) Non DFS Used: 5607989248 (5.22 GB) DFS Remaining: 14337662976 (13.35 GB) DFS Used%: 0.00% DFS Remaining%: 71.88% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Mon Apr 27 04:26:08 PDT 2015
(7)查看hdfs:http://Ubuntu1:50070/
三、運行wordcount程序
(1)創建 file目錄
hduser@Ubuntu1:~$ mkdir file
(2)在file創建file1.txt、file2.txt並寫內容(在圖形界面)
分別填寫如下內容
file1.txt輸入內容:Hello world hi HADOOP
file2.txt輸入內容:Hello hadoop hi CHINA
創建后查看:
hduser@Ubuntu1:~ /hadoop $ cat file/file1.txt Hello world hi HADOOP hduser@Ubuntu1:~ /hadoop $ cat file/file2.txt Hello hadoop hi CHINA
(3)在hdfs創建/input2目錄
hduser@Ubuntu1:~/hadoop$ ./bin/hadoop fs -mkdir /input2
(4)將file1.txt、file2.txt文件copy到hdfs /input2目錄
hduser@Ubuntu1:~/hadoop$ ./bin/hadoop fs -put file/file*.txt /input2
(5)查看hdfs上是否有file1.txt、file2.txt文件
hduser@Ubuntu1:~/hadoop$ bin/hadoop fs -ls /input2/ Found 2 items -rw-r--r-- 2 hduser supergroup 21 2015-04-27 05:54 /input2/file1.txt -rw-r--r-- 2 hduser supergroup 24 2015-04-27 05:54 /input2/file2.txt
(6)執行wordcount程序
先啟動hdfs和yarn hduser@Ubuntu1:~/hadoop$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input2/ /output2/wordcount1 15/04/27 05:57:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/04/27 05:57:17 INFO client.RMProxy: Connecting to ResourceManager at Ubuntu1/192.168.159.131:8032 15/04/27 05:57:19 INFO input.FileInputFormat: Total input paths to process : 2 15/04/27 05:57:19 INFO mapreduce.JobSubmitter: number of splits:2 15/04/27 05:57:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1430138907536_0001 15/04/27 05:57:20 INFO impl.YarnClientImpl: Submitted application application_1430138907536_0001 15/04/27 05:57:20 INFO mapreduce.Job: The url to track the job: http://Ubuntu1:8088/proxy/application_1430138907536_0001/ 15/04/27 05:57:20 INFO mapreduce.Job: Running job: job_1430138907536_0001 15/04/27 05:57:32 INFO mapreduce.Job: Job job_1430138907536_0001 running in uber mode : false 15/04/27 05:57:32 INFO mapreduce.Job: map 0% reduce 0% 15/04/27 05:57:43 INFO mapreduce.Job: map 100% reduce 0% 15/04/27 05:57:58 INFO mapreduce.Job: map 100% reduce 100% 15/04/27 05:57:59 INFO mapreduce.Job: Job job_1430138907536_0001 completed successfully 15/04/27 05:57:59 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=84 FILE: Number of bytes written=317849 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=247 HDFS: Number of bytes written=37 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=16813 Total time spent by all reduces in occupied slots (ms)=12443 Total time spent by all map tasks (ms)=16813 Total time spent by all reduce tasks (ms)=12443 Total vcore-seconds taken by all map tasks=16813 Total vcore-seconds taken by all reduce tasks=12443 Total megabyte-seconds taken by all map tasks=17216512 Total megabyte-seconds taken by all reduce tasks=12741632 Map-Reduce Framework Map input records=2 Map output records=8 Map output bytes=75 Map output materialized bytes=90 Input split bytes=202Combine input records=8 Combine output records=7 Reduce input groups=5 Reduce shuffle bytes=90 Reduce input records=7 Reduce output records=5 pilled Records=14 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=622 CPU time spent (ms)=2000 Physical memory (bytes) snapshot=390164480 Virtual memory (bytes) snapshot=1179254784 Total committed heap usage (bytes)=257892352 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=45 File Output Format Counters Bytes Written=37
(7)查看運行結果
hduser@Ubuntu1:~/hadoop$ ./bin/hdfs dfs -cat /output2/wordcount1/* CHINA 1 Hello 2 hadoop 2 hi 2 world 1
——————————————
顯示出以上結果,表明您已經成功安裝了Hadoop!