0 機器說明
IP |
Role |
192.168.1.106 |
NameNode、DataNode、NodeManager、ResourceManager |
192.168.1.107 |
SecondaryNameNode、NodeManager、DataNode |
192.168.1.108 |
NodeManager、DataNode |
192.168.1.106 |
HiveServer |
1 打通無密鑰
配置HDFS,首先就得把機器之間的無密鑰配置上。我們這里為了方便,把機器之間的雙向無密鑰都配置上。
(1)產生RSA密鑰信息
ssh-keygen -t rsa
一路回車,直到產生一個圖形結構,此時便產生了RSA的私鑰id_rsa和公鑰id_rsa.pub,位於/home/user/.ssh目錄中。
(2)將所有機器節點的ssh證書公鑰拷貝至/home/user/.ssh/authorized_keys文件中,三個機器都一樣。
(3)切換到root用戶,修改/etc/ssh/sshd_config文件,配置:
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
(4)重啟ssh服務:service sshd restart
(5)使用ssh服務,遠程登錄:
ssh配置成功。
2 安裝Hadoop2.3
將對應的hadoop2.3的tar包解壓縮到本地之后,主要就是修改配置文件,文件的路徑都在etc/hadoop中,下面列出幾個主要的。
(1)core-site.xml
1 <configuration> 2 <property> 3 <name>hadoop.tmp.dir</name> 4 <value>file:/home/sdc/tmp/hadoop-${user.name}</value> 5 </property> 6 <property> 7 <name>fs.default.name</name> 8 <value>hdfs://192.168.1.106:9000</value> 9 </property> 10 </configuration>
(2)hdfs-site.xml
1 <configuration> 2 <property> 3 <name>dfs.replication</name> 4 <value>3</value> 5 </property> 6 <property> 7 <name>dfs.namenode.secondary.http-address</name> 8 <value>192.168.1.107:9001</value> 9 </property> 10 <property> 11 <name>dfs.namenode.name.dir</name> 12 <value>file:/home/sdc/dfs/name</value> 13 </property> 14 <property> 15 <name>dfs.datanode.data.dir</name> 16 <value>file:/home/sdc/dfs/data</value> 17 </property> 18 <property> 19 <name>dfs.replication</name> 20 <value>3</value> 21 </property> 22 <property> 23 <name>dfs.webhdfs.enabled</name> 24 <value>true</value> 25 </property> 26 </configuration>
(3)hadoop-env.sh
主要是將其中的JAVA_HOME賦值:
export JAVA_HOME=/usr/local/jdk1.6.0_27
(4)mapred-site.xml
1 <configuration> 2 <property> 3 <!-- 使用yarn作為資源分配和任務管理框架 --> 4 <name>mapreduce.framework.name</name> 5 <value>yarn</value> 6 </property> 7 <property> 8 <!-- JobHistory Server地址 --> 9 <name>mapreduce.jobhistory.address</name> 10 <value>centos1:10020</value> 11 </property> 12 <property> 13 <!-- JobHistory WEB地址 --> 14 <name>mapreduce.jobhistory.webapp.address</name> 15 <value>centos1:19888</value> 16 </property> 17 <property> 18 <!-- 排序文件的時候一次同時最多可並行的個數 --> 19 <name>mapreduce.task.io.sort.factor</name> 20 <value>100</value> 21 </property> 22 <property> 23 <!-- reuduce shuffle階段並行傳輸數據的數量 --> 24 <name>mapreduce.reduce.shuffle.parallelcopies</name> 25 <value>50</value> 26 </property> 27 <property> 28 <name>mapred.system.dir</name> 29 <value>file:/home/sdc/Data/mr/system</value> 30 </property> 31 <property> 32 <name>mapred.local.dir</name> 33 <value>file:/home/sdc/Data/mr/local</value> 34 </property> 35 <property> 36 <!-- 每個Map Task需要向RM申請的內存量 --> 37 <name>mapreduce.map.memory.mb</name> 38 <value>1536</value> 39 </property> 40 <property> 41 <!-- 每個Map階段申請的Container的JVM參數 --> 42 <name>mapreduce.map.java.opts</name> 43 <value>-Xmx1024M</value> 44 </property> 45 <property> 46 <!-- 每個Reduce Task需要向RM申請的內存量 --> 47 <name>mapreduce.reduce.memory.mb</name> 48 <value>2048</value> 49 </property> 50 <property> 51 <!-- 每個Reduce階段申請的Container的JVM參數 --> 52 <name>mapreduce.reduce.java.opts</name> 53 <value>-Xmx1536M</value> 54 </property> 55 <property> 56 <!-- 排序內存使用限制 --> 57 <name>mapreduce.task.io.sort.mb</name> 58 <value>512</value> 59 </property> 60 </configuration>
注意上面的幾個內存大小的配置,其中Container的大小一般都要小於所能申請的最大值,否則所運行的Mapreduce任務可能無法運行。
(5)yarn-site.xml
1 <configuration> 2 <property> 3 <name>yarn.nodemanager.aux-services</name> 4 <value>mapreduce_shuffle</value> 5 </property> 6 <property> 7 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 8 <value>org.apache.hadoop.mapred.ShuffleHandler</value> 9 </property> 10 <property> 11 <name>yarn.resourcemanager.address</name> 12 <value>centos1:8080</value> 13 </property> 14 <property> 15 <name>yarn.resourcemanager.scheduler.address</name> 16 <value>centos1:8081</value> 17 </property> 18 <property> 19 <name>yarn.resourcemanager.resource-tracker.address</name> 20 <value>centos1:8082</value> 21 </property> 22 <property> 23 <!-- 每個nodemanager可分配的內存總量 --> 24 <name>yarn.nodemanager.resource.memory-mb</name> 25 <value>2048</value> 26 </property> 27 <property> 28 <name>yarn.nodemanager.remote-app-log-dir</name> 29 <value>${hadoop.tmp.dir}/nodemanager/remote</value> 30 </property> 31 <property> 32 <name>yarn.nodemanager.log-dirs</name> 33 <value>${hadoop.tmp.dir}/nodemanager/logs</value> 34 </property> 35 <property> 36 <name>yarn.resourcemanager.admin.address</name> 37 <value>centos1:8033</value> 38 </property> 39 <property> 40 <name>yarn.resourcemanager.webapp.address</name> 41 <value>centos1:8088</value> 42 </property> 43 </configuration>
此外,配置好對應的HADOOP_HOME環境變量之后,將當前hadoop文件發送到所有的節點,在sbin目錄中有start-all.sh腳本,啟動可見:
啟動完成之后,有如下兩個WEB界面:
http://192.168.1.106:8088/cluster
http://192.168.1.106:50070/dfshealth.html
使用最簡單的命令檢查下HDFS:
3 安裝Hive0.12
將Hive的tar包解壓縮之后,首先配置下HIVE_HOME的環境變量。然后便是一些配置文件的修改:
(1)hive-env.sh
將其中的HADOOP_HOME變量修改為當前系統變量值。
(2)hive-site.xml
- 修改hive.server2.thrift.sasl.qop屬性
修改為:
- 將hive.metastore.schema.verification對應的值改為false
強制metastore的schema一致性,開啟的話會校驗在metastore中存儲的信息的版本和hive的jar包中的版本一致性,並且關閉自動schema遷移,用戶必須手動的升級hive並且遷移schema,關閉的話只會在版本不一致時給出警告。
- 修改hive的元數據存儲位置,改為mysql存儲:
1 <property> 2 <name>javax.jdo.option.ConnectionURL</name> 3 <value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8</value> 4 <description>JDBC connect string for a JDBC metastore</description> 5 </property> 6 7 <property> 8 <name>javax.jdo.option.ConnectionDriverName</name> 9 <value>com.mysql.jdbc.Driver</value> 10 <description>Driver class name for a JDBC metastore</description> 11 </property> 12 13 <property> 14 <name>javax.jdo.PersistenceManagerFactoryClass</name> 15 <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value> 16 <description>class implementing the jdo persistence</description> 17 </property> 18 19 <property> 20 <name>javax.jdo.option.DetachAllOnCommit</name> 21 <value>true</value> 22 <description>detaches all objects from session so that they can be used after transaction is committed</description> 23 </property> 24 25 <property> 26 <name>javax.jdo.option.NonTransactionalRead</name> 27 <value>true</value> 28 <description>reads outside of transactions</description> 29 </property> 30 31 <property> 32 <name>javax.jdo.option.ConnectionUserName</name> 33 <value>hive</value> 34 <description>username to use against metastore database</description> 35 </property> 36 37 <property> 38 <name>javax.jdo.option.ConnectionPassword</name> 39 <value>123</value> 40 <description>password to use against metastore database</description> 41 </property>
在bin下啟動hive腳本,運行幾個hive語句:
4 安裝Mysql5.6
見http://www.cnblogs.com/Scott007/p/3572604.html
5 Pi計算實例、Hive表的計算實例運行
在Hadoop的安裝目錄bin子目錄下,執行hadoop自帶的示例,pi的計算,命令為:
./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 10 10
運行日志為:
1 Number of Maps = 10 2 Samples per Map = 10 3 14/03/20 23:50:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 4 Wrote input for Map #0 5 Wrote input for Map #1 6 Wrote input for Map #2 7 Wrote input for Map #3 8 Wrote input for Map #4 9 Wrote input for Map #5 10 Wrote input for Map #6 11 Wrote input for Map #7 12 Wrote input for Map #8 13 Wrote input for Map #9 14 Starting Job 15 14/03/20 23:50:06 INFO client.RMProxy: Connecting to ResourceManager at centos1/192.168.1.106:8080 16 14/03/20 23:50:07 INFO input.FileInputFormat: Total input paths to process : 10 17 14/03/20 23:50:07 INFO mapreduce.JobSubmitter: number of splits:10 18 14/03/20 23:50:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1395323769116_0001 19 14/03/20 23:50:08 INFO impl.YarnClientImpl: Submitted application application_1395323769116_0001 20 14/03/20 23:50:08 INFO mapreduce.Job: The url to track the job: http://centos1:8088/proxy/application_1395323769116_0001/ 21 14/03/20 23:50:08 INFO mapreduce.Job: Running job: job_1395323769116_0001 22 14/03/20 23:50:18 INFO mapreduce.Job: Job job_1395323769116_0001 running in uber mode : false 23 14/03/20 23:50:18 INFO mapreduce.Job: map 0% reduce 0% 24 14/03/20 23:52:21 INFO mapreduce.Job: map 10% reduce 0% 25 14/03/20 23:52:27 INFO mapreduce.Job: map 20% reduce 0% 26 14/03/20 23:52:32 INFO mapreduce.Job: map 30% reduce 0% 27 14/03/20 23:52:34 INFO mapreduce.Job: map 40% reduce 0% 28 14/03/20 23:52:37 INFO mapreduce.Job: map 50% reduce 0% 29 14/03/20 23:52:41 INFO mapreduce.Job: map 60% reduce 0% 30 14/03/20 23:52:43 INFO mapreduce.Job: map 70% reduce 0% 31 14/03/20 23:52:46 INFO mapreduce.Job: map 80% reduce 0% 32 14/03/20 23:52:48 INFO mapreduce.Job: map 90% reduce 0% 33 14/03/20 23:52:51 INFO mapreduce.Job: map 100% reduce 0% 34 14/03/20 23:52:59 INFO mapreduce.Job: map 100% reduce 100% 35 14/03/20 23:53:02 INFO mapreduce.Job: Job job_1395323769116_0001 completed successfully 36 14/03/20 23:53:02 INFO mapreduce.Job: Counters: 49 37 File System Counters 38 FILE: Number of bytes read=226 39 FILE: Number of bytes written=948145 40 FILE: Number of read operations=0 41 FILE: Number of large read operations=0 42 FILE: Number of write operations=0 43 HDFS: Number of bytes read=2670 44 HDFS: Number of bytes written=215 45 HDFS: Number of read operations=43 46 HDFS: Number of large read operations=0 47 HDFS: Number of write operations=3 48 Job Counters 49 Launched map tasks=10 50 Launched reduce tasks=1 51 Data-local map tasks=10 52 Total time spent by all maps in occupied slots (ms)=573584 53 Total time spent by all reduces in occupied slots (ms)=20436 54 Total time spent by all map tasks (ms)=286792 55 Total time spent by all reduce tasks (ms)=10218 56 Total vcore-seconds taken by all map tasks=286792 57 Total vcore-seconds taken by all reduce tasks=10218 58 Total megabyte-seconds taken by all map tasks=440512512 59 Total megabyte-seconds taken by all reduce tasks=20926464 60 Map-Reduce Framework 61 Map input records=10 62 Map output records=20 63 Map output bytes=180 64 Map output materialized bytes=280 65 Input split bytes=1490 66 Combine input records=0 67 Combine output records=0 68 Reduce input groups=2 69 Reduce shuffle bytes=280 70 Reduce input records=20 71 Reduce output records=0 72 Spilled Records=40 73 Shuffled Maps =10 74 Failed Shuffles=0 75 Merged Map outputs=10 76 GC time elapsed (ms)=710 77 CPU time spent (ms)=71800 78 Physical memory (bytes) snapshot=6531928064 79 Virtual memory (bytes) snapshot=19145916416 80 Total committed heap usage (bytes)=5696757760 81 Shuffle Errors 82 BAD_ID=0 83 CONNECTION=0 84 IO_ERROR=0 85 WRONG_LENGTH=0 86 WRONG_MAP=0 87 WRONG_REDUCE=0 88 File Input Format Counters 89 Bytes Read=1180 90 File Output Format Counters 91 Bytes Written=97 92 Job Finished in 175.556 seconds 93 Estimated value of Pi is 3.20000000000000000000
如果運行不起來,那說明HDFS的配置有問題啊!
Hive中執行count等語句,可以觸發mapduce任務:
如果運行的時候出現類似於如下的錯誤:
Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
說明元數據存儲有問題,可能是以下兩方面的原因:
(1)HDFS的元數據存儲有問題:
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
(2)Mysql的授權有問題:
在mysql中執行如下命令,其實就是給Mysql中的Hive數據庫賦權
grant all on db.* to hive@'%' identified by '密碼';(使用戶可以遠程連接Mysql)
grant all on db.* to hive@'localhost' identified by '密碼';(使用戶可以本地連接Mysql)
flush privileges;
具體哪方面的原因,可以查看hive的日志。
-------------------------------------------------------------------------------
如果您看了本篇博客,覺得對您有所收獲,請點擊右下角的 [推薦]
如果您想轉載本博客,請注明出處
如果您對本文有意見或者建議,歡迎留言
感謝您的閱讀,請關注我的后續博客