Hive 連接 HBase
我的版本是:
HADOOP 2.4.1
HBase 0.98.6.1
Hive 0.13.1
關於 HBase 0.98.6.1
我好像還是沒有完全正確安裝HBase,0.98.6.1對應的Hadoop版本是2.2,我這里面用的2.4.1。
使用的過程中,會遇到各種問題,比如在用importtsv
向HBase里面導入數據的時候,會報錯。暫時的解決方法是,用Hadoop2.4.1的jar包直接替換掉HBase里面的hadoop開頭的2.2的jar包。運行以后沒有報錯。
問題
首先在Hbase里面先創建一個table
$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.0, r1231986, Mon Jan 16 13:16:35 UTC 2012
hbase(main):001:0>
hbase(main):001:0> create 'bar', 'cf'
0 row(s) in 0.1200 seconds
hbase(main):002:0>
然后使用Hive連接HBase中的這個表,使用Hive的HBaseStorageHandler,DDL語句如下:
hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');
出現了如下錯誤:
14/10/24 19:31:43 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.io.IOException: Attempt to start meta tracker failed.
at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:201)
at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:230)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:277)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:293)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:162)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:554)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy9.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:197)
... 33 more
找了好久終於找到了解決辦法
解決方法
HBaseIntegration使用的是 hive-hbase-handler-x.y.z.jar模塊。
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
The handler requires
Hadoop 0.20 or higher
, and has only been tested with dependency versions hadoop-0.20.x,hbase-0.92.0
andzookeeper-3.3.4
. If you are not using hbase-0.92.0, you will need to rebuild the handler with the HBase jar matching your version, and change the--auxpath
above accordingly. Failure to use matching versions will lead to misleading connection failures such as MasterNotRunningException since the HBase RPC protocol changes often.
使用這個HBaseStorageHandler需要用到一些jar包,需要使用--auxpath
來指定相對路徑。但是cwiki上面說方法太復雜,使用起來容易出錯。
但是在介紹 HBaseBulkLoad 的時候也用到了額外的jar包,這里面的使用方式就簡單多了。
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad
Add necessary JARs
You will need to add a couple jar files to your path. First, put them in DFS:
hadoop dfs -put /usr/lib/hive/lib/hbase-VERSION.jar /user/hive/hbase-VERSION.jar
hadoop dfs -put /usr/lib/hive/lib/hive-hbase-handler-VERSION.jar /user/hive/hive-hbase-handler-VERSION.jar
Then add them to your hive-site.xml
:
<property>
<name>hive.aux.jars.path</name>
<value>/user/hive/hbase-VERSION.jar,/user/hive/hive-hbase-handler-VERSION.jar</value>
</property>
在hive-site.xml里面直接設置jar包路徑,方便多了。
我把文件傳到hdfs上面之后,添加的配置如下:
<property>
<name>hive.aux.jars.path</name>
<value>/user/hive/lib/hbase-common-0.98.6.1-hadoop2.jar,/user/hive/lib/hive-hbase-handler-0.13.1.jar,/user/hive/lib/zookeeper-3.4.6.jar</value>
<description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description>
</property>
這樣修改完成之后,再重新啟動Hive
#nohup hive --service metastore > $HIVE_HOME/log/hive_metastore.log &
#nohup hive --service hiveserver > $HIVE_HOME/log/hiveserver.log &
#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3
最后一步#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3
一定不能少了,這是啟動成功的關鍵。
關於最后一句的作用,參考大神的原話:
You need to tell Hive where to find the zookeepers quorum which would elect the HBase master
現在重新在Hive的shell中執行:
hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');
不報錯,成功添加外部表!
Hive中table的定義
Hive 相關概念:
【受管理的表】A managed table
is one for which the definition is primarily managed in Hive's metastore, and for whose data storage Hive is responsible.
【外部表】An external table
is one whose definition is managed in some external catalog, and whose data Hive does not own (i.e. it will not be deleted when the table is dropped).
【內部表】native
【外部表】non-native
These two distinctions (managed vs. external and native vs non-native) are orthogonal(正交).
Hence, there are four possibilities for base tables:
managed native
: what you get by default with CREATE TABLEexternal native
: what you get with CREATE EXTERNAL TABLE when no STORED BY clause is specifiedmanaged non-native
: what you get with CREATE TABLE when a STORED BY clause is specified; Hive stores the definition in its metastore, but does not create any files itself; instead, it calls the storage handler with a request to create a corresponding object structureexternal non-native
: what you get with CREATE EXTERNAL TABLE when a STORED BY clause is specified; Hive registers the definition in its metastore and calls the storage handler to check that it matches the primary definition in the other system
One more thing 關於Hive的關閉
Hive好像沒有指定關閉的腳本。我暫時的用的方法是,找出Hive的pid(兩個東西),然后直接kill...簡單粗暴啊。
# netstat -lnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 21415/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 12601/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 884/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 960/master
tcp 0 0 0.0.0.0:9083 0.0.0.0:* LISTEN 21100/java
tcp 0 0 192.168.129.63:9000 0.0.0.0:* LISTEN 12601/java
tcp 0 0 192.168.129.63:9001 0.0.0.0:* LISTEN 12783/java
tcp 0 0 :::22 :::* LISTEN 884/sshd
tcp 0 0 ::ffff:192.168.129.63:8088 :::* LISTEN 12939/java
tcp 0 0 ::1:25 :::* LISTEN 960/master
tcp 0 0 ::ffff:192.168.129.63:8030 :::* LISTEN 12939/java
tcp 0 0 ::ffff:192.168.129.63:8031 :::* LISTEN 12939/java
tcp 0 0 ::ffff:192.168.129.63:60000 :::* LISTEN 20610/java
tcp 0 0 ::ffff:192.168.129.63:8032 :::* LISTEN 12939/java
tcp 0 0 ::ffff:192.168.129.63:8033 :::* LISTEN 12939/java
tcp 0 0 :::60010 :::* LISTEN 20610/java
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] STREAM LISTENING 8318 1/init @/com/ubuntu/upstart
unix 2 [ ACC ] STREAM LISTENING 10389 850/dbus-daemon /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 10698 960/master public/cleanup
unix 2 [ ACC ] STREAM LISTENING 10705 960/master private/tlsmgr
unix 2 [ ACC ] STREAM LISTENING 10709 960/master private/rewrite
unix 2 [ ACC ] STREAM LISTENING 10713 960/master private/bounce
unix 2 [ ACC ] STREAM LISTENING 10717 960/master private/defer
unix 2 [ ACC ] STREAM LISTENING 10721 960/master private/trace
unix 2 [ ACC ] STREAM LISTENING 10725 960/master private/verify
unix 2 [ ACC ] STREAM LISTENING 10729 960/master public/flush
unix 2 [ ACC ] STREAM LISTENING 10733 960/master private/proxymap
unix 2 [ ACC ] STREAM LISTENING 10737 960/master private/proxywrite
unix 2 [ ACC ] STREAM LISTENING 10741 960/master private/smtp
unix 2 [ ACC ] STREAM LISTENING 10745 960/master private/relay
unix 2 [ ACC ] STREAM LISTENING 10749 960/master public/showq
unix 2 [ ACC ] STREAM LISTENING 10753 960/master private/error
unix 2 [ ACC ] STREAM LISTENING 10757 960/master private/retry
unix 2 [ ACC ] STREAM LISTENING 10761 960/master private/discard
unix 2 [ ACC ] STREAM LISTENING 10765 960/master private/local
unix 2 [ ACC ] STREAM LISTENING 10769 960/master private/virtual
unix 2 [ ACC ] STREAM LISTENING 10773 960/master private/lmtp
unix 2 [ ACC ] STREAM LISTENING 10777 960/master private/anvil
unix 2 [ ACC ] STREAM LISTENING 10781 960/master private/scache
#kill -9 21110
#kill -9 21415
參考鏈接
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
http://stackoverflow.com/questions/23658600/error-while-creating-an-hive-table-on-top-of-an-hbase-table