Hive集成HBase可以有效利用HBase數據庫的存儲特性,如行更新和列索引等。在集成的過程中注意維持HBase jar包的一致性。Hive與HBase的整合功能的實現是利用兩者本身對外的API接口互相進行通信,相互通信主要是依靠hive_hbase-handler.jar工具類。
整合hive與hbase的過程如下:
1.將HBASE_HOME下的 hbase-common-0.96.2-hadoop2.jar 和 zookeeper-3.4.5.jar 拷貝(覆蓋)到HIVE_HOME/lib文件夾下
2.修改HIVE_HOME/conf下hive-site.xml文件,添加如下內容(根據實際修改):
<property> <name>hive.querylog.location</name> <value>$HIVE_HOME/logs</value> </property> <property> <name>hive.aux.jars.path</name> <value>file:///hive-0.7.1/lib/hive-hbase-handler-0.7.1.jar,file:///hive-0.7.1/lib/hbase-common-0.96.2-hadoop2.jar,file:///hive-0.7.1/lib/zookeeper-3.3.2.jar</value> </property>
3.拷貝hbase-common-0.96.2-hadoop2.jar到所有hadoop節點(包括master)的hadoop/lib下
4.拷貝hbase/conf下的hbase-site.xml文件到所有hadoop節點(包括master)的hadoop/conf下。
注意:如果3,4兩步跳過的話,運行hive時很可能出現如下錯誤:
org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and
then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.
5.啟動hive
單節點啟動:bin/hive -hiveconf hbase.master=master:60000
如果hive-site.xml文件中沒有配置hive.aux.jars.path,則可以按照如下方式啟動。
hive --auxpath /opt/mapr/hive/hive-0.7.1/lib/hive-hbase-handler-0.7.1.jar,/opt/mapr/hive/hive-0.7.1/lib/hbase-0.90.4.jar,/opt/mapr/hive/hive-0.7.1/lib/zookeeper-3.3.2.jar -hiveconf hbase.master=localhost:60000
集群啟動:bin/hive -hiveconf hbase.zookeeper.quorum=node1,node2,node3 (所有的zookeeper節點)
經測試修改hive的配置文件hive-site.xml,就可以不用增加參數啟動hive聯合hbase
<property> <name>hive.zookeeper.quorum</name> <value>node1,node2,node3</value> <description>The list of zookeeper servers to talk to. This is only needed for read/write locks.</description> </property>
6.啟動后進行測試
(1).構建Hbase表hbase_student
hbase> create 'hbase_student', 'info'
(2).構建hive外表hive_student, 並對應hbase_student表
Hive集成HBase需要在Hive表和HBase表之間建立映射關系,也就是Hive表的列(columns)和列類型(column types)與HBase表的列族(column families)及列限定詞(column qualifiers)建立關聯。
每一個在Hive表中的域都存在於HBase中,而在Hive表中不需要包含所有HBase中的列。
HBase中的RowKey對應到Hive中為選擇一個域使用 :key 來對應,列族中的列在Hive中為 cf:q。
CREATE EXTERNAL TABLE hive_student (rowkey string, name string, age int, phone string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:name,info:age,info:phone") TBLPROPERTIES("hbase.table.name" = "hbase_student");
7.數據導入及驗證:
(1). 創建數據外表data_student
CREATE EXTERNAL TABLE data_student (rowkey string, name string, age int, phone string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/test/hbase/tsv/input/';
(2). 數據通過hive_student導入到hbase_student表中
SET hive.hbase.bulk=true; INSERT OVERWRITE TABLE hive_student SELECT rowkey, name, age, phone FROM data_student;
備注: 若遇到java.lang.IllegalArgumentException: Property value must not be null異常, 需要hive-0.13.0及以上版本支持