Hadoop2.3+Hive0.12集群部署


0 機器說明

 

IP

Role

192.168.1.106

NameNodeDataNodeNodeManagerResourceManager

192.168.1.107

SecondaryNameNodeNodeManagerDataNode

192.168.1.108

NodeManagerDataNode

192.168.1.106

HiveServer

1 打通無密鑰

    配置HDFS,首先就得把機器之間的無密鑰配置上。我們這里為了方便,把機器之間的雙向無密鑰都配置上。

(1)產生RSA密鑰信息

ssh-keygen -t rsa

一路回車,直到產生一個圖形結構,此時便產生了RSA的私鑰id_rsa和公鑰id_rsa.pub,位於/home/user/.ssh目錄中。

(2)將所有機器節點的ssh證書公鑰拷貝至/home/user/.ssh/authorized_keys文件中,三個機器都一樣。

(3)切換到root用戶,修改/etc/ssh/sshd_config文件,配置:

RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile      .ssh/authorized_keys

(4)重啟ssh服務:service sshd restart 

(5)使用ssh服務,遠程登錄:

 ssh配置成功。

2 安裝Hadoop2.3

     將對應的hadoop2.3的tar包解壓縮到本地之后,主要就是修改配置文件,文件的路徑都在etc/hadoop中,下面列出幾個主要的。

(1)core-site.xml

 1 <configuration>
 2     <property>
 3         <name>hadoop.tmp.dir</name>
 4         <value>file:/home/sdc/tmp/hadoop-${user.name}</value>
 5     </property>
 6     <property>
 7         <name>fs.default.name</name>
 8         <value>hdfs://192.168.1.106:9000</value>
 9     </property>
10 </configuration>

(2)hdfs-site.xml

 1 <configuration>
 2     <property>
 3         <name>dfs.replication</name>
 4         <value>3</value>
 5     </property>
 6     <property>
 7          <name>dfs.namenode.secondary.http-address</name>
 8          <value>192.168.1.107:9001</value>
 9     </property>
10     <property>
11          <name>dfs.namenode.name.dir</name>
12          <value>file:/home/sdc/dfs/name</value>
13     </property>
14     <property>
15          <name>dfs.datanode.data.dir</name>
16          <value>file:/home/sdc/dfs/data</value>
17     </property>
18     <property>
19          <name>dfs.replication</name>
20          <value>3</value>
21     </property>
22     <property>
23          <name>dfs.webhdfs.enabled</name>
24          <value>true</value>
25     </property>
26 </configuration>

(3)hadoop-env.sh 

主要是將其中的JAVA_HOME賦值:

export JAVA_HOME=/usr/local/jdk1.6.0_27

(4)mapred-site.xml

 1 <configuration>
 2     <property>
 3         <!-- 使用yarn作為資源分配和任務管理框架 -->
 4         <name>mapreduce.framework.name</name>
 5         <value>yarn</value>
 6     </property>
 7     <property>
 8         <!-- JobHistory Server地址 -->
 9         <name>mapreduce.jobhistory.address</name>
10         <value>centos1:10020</value>
11     </property>
12     <property>
13         <!-- JobHistory WEB地址 -->
14         <name>mapreduce.jobhistory.webapp.address</name>
15         <value>centos1:19888</value>
16     </property>
17     <property>
18         <!-- 排序文件的時候一次同時最多可並行的個數 -->
19         <name>mapreduce.task.io.sort.factor</name>
20         <value>100</value>
21     </property>
22     <property>
23         <!-- reuduce shuffle階段並行傳輸數據的數量 -->
24         <name>mapreduce.reduce.shuffle.parallelcopies</name>
25         <value>50</value>
26     </property>
27     <property>
28         <name>mapred.system.dir</name>
29         <value>file:/home/sdc/Data/mr/system</value>
30     </property>
31     <property>
32         <name>mapred.local.dir</name>
33         <value>file:/home/sdc/Data/mr/local</value>
34     </property>
35     <property>
36         <!-- 每個Map Task需要向RM申請的內存量 -->
37         <name>mapreduce.map.memory.mb</name>
38         <value>1536</value>
39     </property>
40     <property>
41         <!-- 每個Map階段申請的Container的JVM參數 -->
42         <name>mapreduce.map.java.opts</name>
43         <value>-Xmx1024M</value>
44     </property>
45     <property>
46         <!-- 每個Reduce Task需要向RM申請的內存量 -->
47         <name>mapreduce.reduce.memory.mb</name>
48         <value>2048</value>
49     </property>
50     <property>
51         <!-- 每個Reduce階段申請的Container的JVM參數 -->
52         <name>mapreduce.reduce.java.opts</name>
53         <value>-Xmx1536M</value>
54     </property>
55     <property>
56         <!-- 排序內存使用限制 -->
57         <name>mapreduce.task.io.sort.mb</name>
58         <value>512</value>
59     </property>
60 </configuration>

  注意上面的幾個內存大小的配置,其中Container的大小一般都要小於所能申請的最大值,否則所運行的Mapreduce任務可能無法運行。

(5)yarn-site.xml

 1 <configuration>
 2     <property>
 3         <name>yarn.nodemanager.aux-services</name>
 4         <value>mapreduce_shuffle</value>
 5     </property>
 6     <property>
 7         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
 8         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 9     </property>
10     <property>
11         <name>yarn.resourcemanager.address</name>
12         <value>centos1:8080</value>
13     </property>
14     <property>
15         <name>yarn.resourcemanager.scheduler.address</name>
16         <value>centos1:8081</value>
17     </property>
18     <property>        
19         <name>yarn.resourcemanager.resource-tracker.address</name>
20         <value>centos1:8082</value>
21     </property>
22     <property>
23         <!-- 每個nodemanager可分配的內存總量 -->
24         <name>yarn.nodemanager.resource.memory-mb</name>
25         <value>2048</value>
26     </property>
27     <property>
28         <name>yarn.nodemanager.remote-app-log-dir</name>
29         <value>${hadoop.tmp.dir}/nodemanager/remote</value>
30     </property>
31     <property>
32         <name>yarn.nodemanager.log-dirs</name>
33         <value>${hadoop.tmp.dir}/nodemanager/logs</value>
34     </property>
35     <property>
36         <name>yarn.resourcemanager.admin.address</name>
37         <value>centos1:8033</value>
38     </property>
39     <property>
40         <name>yarn.resourcemanager.webapp.address</name>
41         <value>centos1:8088</value>
42     </property>
43 </configuration>

 

    此外,配置好對應的HADOOP_HOME環境變量之后,將當前hadoop文件發送到所有的節點,在sbin目錄中有start-all.sh腳本,啟動可見:

    啟動完成之后,有如下兩個WEB界面:

http://192.168.1.106:8088/cluster

 

http://192.168.1.106:50070/dfshealth.html

 

使用最簡單的命令檢查下HDFS:

3 安裝Hive0.12

    將Hive的tar包解壓縮之后,首先配置下HIVE_HOME的環境變量。然后便是一些配置文件的修改:

(1)hive-env.sh

將其中的HADOOP_HOME變量修改為當前系統變量值。

(2)hive-site.xml

  • 修改hive.server2.thrift.sasl.qop屬性

修改為:

 

  • 將hive.metastore.schema.verification對應的值改為false

        強制metastore的schema一致性,開啟的話會校驗在metastore中存儲的信息的版本和hive的jar包中的版本一致性,並且關閉自動schema遷移,用戶必須手動的升級hive並且遷移schema,關閉的話只會在版本不一致時給出警告。

  • 修改hive的元數據存儲位置,改為mysql存儲:
 1 <property>
 2   <name>javax.jdo.option.ConnectionURL</name>
 3   <value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8</value>
 4   <description>JDBC connect string for a JDBC metastore</description>
 5 </property>
 6 
 7 <property>
 8   <name>javax.jdo.option.ConnectionDriverName</name>
 9   <value>com.mysql.jdbc.Driver</value>
10   <description>Driver class name for a JDBC metastore</description>
11 </property>
12 
13 <property>
14   <name>javax.jdo.PersistenceManagerFactoryClass</name>
15   <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
16   <description>class implementing the jdo persistence</description>
17 </property>
18 
19 <property>
20   <name>javax.jdo.option.DetachAllOnCommit</name>
21   <value>true</value>
22   <description>detaches all objects from session so that they can be used after transaction is committed</description>
23 </property>
24 
25 <property>
26   <name>javax.jdo.option.NonTransactionalRead</name>
27   <value>true</value>
28   <description>reads outside of transactions</description>
29 </property>
30 
31 <property>
32   <name>javax.jdo.option.ConnectionUserName</name>
33   <value>hive</value>
34   <description>username to use against metastore database</description>
35 </property>
36 
37 <property>
38   <name>javax.jdo.option.ConnectionPassword</name>
39   <value>123</value>
40   <description>password to use against metastore database</description>
41 </property>

    

    在bin下啟動hive腳本,運行幾個hive語句:

4 安裝Mysql5.6

 見http://www.cnblogs.com/Scott007/p/3572604.html

5 Pi計算實例、Hive表的計算實例運行

     在Hadoop的安裝目錄bin子目錄下,執行hadoop自帶的示例,pi的計算,命令為:

./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 10 10

運行日志為:

 1 Number of Maps  = 10
 2 Samples per Map = 10
 3 14/03/20 23:50:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 4 Wrote input for Map #0
 5 Wrote input for Map #1
 6 Wrote input for Map #2
 7 Wrote input for Map #3
 8 Wrote input for Map #4
 9 Wrote input for Map #5
10 Wrote input for Map #6
11 Wrote input for Map #7
12 Wrote input for Map #8
13 Wrote input for Map #9
14 Starting Job
15 14/03/20 23:50:06 INFO client.RMProxy: Connecting to ResourceManager at centos1/192.168.1.106:8080
16 14/03/20 23:50:07 INFO input.FileInputFormat: Total input paths to process : 10
17 14/03/20 23:50:07 INFO mapreduce.JobSubmitter: number of splits:10
18 14/03/20 23:50:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1395323769116_0001
19 14/03/20 23:50:08 INFO impl.YarnClientImpl: Submitted application application_1395323769116_0001
20 14/03/20 23:50:08 INFO mapreduce.Job: The url to track the job: http://centos1:8088/proxy/application_1395323769116_0001/
21 14/03/20 23:50:08 INFO mapreduce.Job: Running job: job_1395323769116_0001
22 14/03/20 23:50:18 INFO mapreduce.Job: Job job_1395323769116_0001 running in uber mode : false
23 14/03/20 23:50:18 INFO mapreduce.Job:  map 0% reduce 0%
24 14/03/20 23:52:21 INFO mapreduce.Job:  map 10% reduce 0%
25 14/03/20 23:52:27 INFO mapreduce.Job:  map 20% reduce 0%
26 14/03/20 23:52:32 INFO mapreduce.Job:  map 30% reduce 0%
27 14/03/20 23:52:34 INFO mapreduce.Job:  map 40% reduce 0%
28 14/03/20 23:52:37 INFO mapreduce.Job:  map 50% reduce 0%
29 14/03/20 23:52:41 INFO mapreduce.Job:  map 60% reduce 0%
30 14/03/20 23:52:43 INFO mapreduce.Job:  map 70% reduce 0%
31 14/03/20 23:52:46 INFO mapreduce.Job:  map 80% reduce 0%
32 14/03/20 23:52:48 INFO mapreduce.Job:  map 90% reduce 0%
33 14/03/20 23:52:51 INFO mapreduce.Job:  map 100% reduce 0%
34 14/03/20 23:52:59 INFO mapreduce.Job:  map 100% reduce 100%
35 14/03/20 23:53:02 INFO mapreduce.Job: Job job_1395323769116_0001 completed successfully
36 14/03/20 23:53:02 INFO mapreduce.Job: Counters: 49
37     File System Counters
38         FILE: Number of bytes read=226
39         FILE: Number of bytes written=948145
40         FILE: Number of read operations=0
41         FILE: Number of large read operations=0
42         FILE: Number of write operations=0
43         HDFS: Number of bytes read=2670
44         HDFS: Number of bytes written=215
45         HDFS: Number of read operations=43
46         HDFS: Number of large read operations=0
47         HDFS: Number of write operations=3
48     Job Counters 
49         Launched map tasks=10
50         Launched reduce tasks=1
51         Data-local map tasks=10
52         Total time spent by all maps in occupied slots (ms)=573584
53         Total time spent by all reduces in occupied slots (ms)=20436
54         Total time spent by all map tasks (ms)=286792
55         Total time spent by all reduce tasks (ms)=10218
56         Total vcore-seconds taken by all map tasks=286792
57         Total vcore-seconds taken by all reduce tasks=10218
58         Total megabyte-seconds taken by all map tasks=440512512
59         Total megabyte-seconds taken by all reduce tasks=20926464
60     Map-Reduce Framework
61         Map input records=10
62         Map output records=20
63         Map output bytes=180
64         Map output materialized bytes=280
65         Input split bytes=1490
66         Combine input records=0
67         Combine output records=0
68         Reduce input groups=2
69         Reduce shuffle bytes=280
70         Reduce input records=20
71         Reduce output records=0
72         Spilled Records=40
73         Shuffled Maps =10
74         Failed Shuffles=0
75         Merged Map outputs=10
76         GC time elapsed (ms)=710
77         CPU time spent (ms)=71800
78         Physical memory (bytes) snapshot=6531928064
79         Virtual memory (bytes) snapshot=19145916416
80         Total committed heap usage (bytes)=5696757760
81     Shuffle Errors
82         BAD_ID=0
83         CONNECTION=0
84         IO_ERROR=0
85         WRONG_LENGTH=0
86         WRONG_MAP=0
87         WRONG_REDUCE=0
88     File Input Format Counters 
89         Bytes Read=1180
90     File Output Format Counters 
91         Bytes Written=97
92 Job Finished in 175.556 seconds
93 Estimated value of Pi is 3.20000000000000000000

    如果運行不起來,那說明HDFS的配置有問題啊!

    Hive中執行count等語句,可以觸發mapduce任務:

 

    如果運行的時候出現類似於如下的錯誤:

Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

說明元數據存儲有問題,可能是以下兩方面的原因:

(1)HDFS的元數據存儲有問題:

 $HADOOP_HOME/bin/hadoop fs -mkdir       /tmp
 $HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse
 $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp
 $HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse

(2)Mysql的授權有問題:

在mysql中執行如下命令,其實就是給Mysql中的Hive數據庫賦權

grant all on db.* to hive@'%' identified by '密碼';(使用戶可以遠程連接Mysql)
grant all on db.* to hive@'localhost' identified by '密碼';(使用戶可以本地連接Mysql)
flush privileges;

    具體哪方面的原因,可以查看hive的日志。

 

-------------------------------------------------------------------------------

如果您看了本篇博客,覺得對您有所收獲,請點擊右下角的 [推薦]

如果您想轉載本博客,請注明出處

如果您對本文有意見或者建議,歡迎留言

感謝您的閱讀,請關注我的后續博客


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM