Hive 連接 HBase 錯誤處理


Hive 連接 HBase

我的版本是:

HADOOP 2.4.1
HBase 0.98.6.1
Hive 0.13.1

關於 HBase 0.98.6.1
我好像還是沒有完全正確安裝HBase,0.98.6.1對應的Hadoop版本是2.2,我這里面用的2.4.1。
使用的過程中,會遇到各種問題,比如在用importtsv向HBase里面導入數據的時候,會報錯。暫時的解決方法是,用Hadoop2.4.1的jar包直接替換掉HBase里面的hadoop開頭的2.2的jar包。運行以后沒有報錯。

問題

首先在Hbase里面先創建一個table

$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.0, r1231986, Mon Jan 16 13:16:35 UTC 2012
hbase(main):001:0>

hbase(main):001:0> create 'bar', 'cf'
0 row(s) in 0.1200 seconds
hbase(main):002:0>

然后使用Hive連接HBase中的這個表,使用Hive的HBaseStorageHandler,DDL語句如下:

hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');

出現了如下錯誤:

14/10/24 19:31:43 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead


FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.io.IOException: Attempt to start meta tracker failed.
	at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:201)
	at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:230)
	at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:277)
	at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:293)
	at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:162)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:554)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
	at com.sun.proxy.$Proxy9.createTable(Unknown Source)
	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613)
	at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)
	at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:197)
	... 33 more

找了好久終於找到了解決辦法


解決方法

HBaseIntegration使用的是 hive-hbase-handler-x.y.z.jar模塊。
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

The handler requires Hadoop 0.20 or higher, and has only been tested with dependency versions hadoop-0.20.x, hbase-0.92.0 and zookeeper-3.3.4. If you are not using hbase-0.92.0, you will need to rebuild the handler with the HBase jar matching your version, and change the --auxpath above accordingly. Failure to use matching versions will lead to misleading connection failures such as MasterNotRunningException since the HBase RPC protocol changes often.

使用這個HBaseStorageHandler需要用到一些jar包,需要使用--auxpath來指定相對路徑。但是cwiki上面說方法太復雜,使用起來容易出錯。

但是在介紹 HBaseBulkLoad 的時候也用到了額外的jar包,這里面的使用方式就簡單多了。
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad

Add necessary JARs
You will need to add a couple jar files to your path. First, put them in DFS:

hadoop dfs -put /usr/lib/hive/lib/hbase-VERSION.jar /user/hive/hbase-VERSION.jar
hadoop dfs -put /usr/lib/hive/lib/hive-hbase-handler-VERSION.jar /user/hive/hive-hbase-handler-VERSION.jar

Then add them to your hive-site.xml:

<property>
  <name>hive.aux.jars.path</name>
  <value>/user/hive/hbase-VERSION.jar,/user/hive/hive-hbase-handler-VERSION.jar</value>
</property>

在hive-site.xml里面直接設置jar包路徑,方便多了。
我把文件傳到hdfs上面之后,添加的配置如下:

<property>
  <name>hive.aux.jars.path</name>
  <value>/user/hive/lib/hbase-common-0.98.6.1-hadoop2.jar,/user/hive/lib/hive-hbase-handler-0.13.1.jar,/user/hive/lib/zookeeper-3.4.6.jar</value>
  <description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description>
</property>

這樣修改完成之后,再重新啟動Hive

#nohup hive --service metastore > $HIVE_HOME/log/hive_metastore.log & 
 
#nohup hive --service hiveserver > $HIVE_HOME/log/hiveserver.log & 

#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3 

最后一步#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3 一定不能少了,這是啟動成功的關鍵。

關於最后一句的作用,參考大神的原話:

You need to tell Hive where to find the zookeepers quorum which would elect the HBase master

現在重新在Hive的shell中執行:

hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');

不報錯,成功添加外部表!


Hive中table的定義

Hive 相關概念:

【受管理的表】A managed table is one for which the definition is primarily managed in Hive's metastore, and for whose data storage Hive is responsible.
【外部表】An external table is one whose definition is managed in some external catalog, and whose data Hive does not own (i.e. it will not be deleted when the table is dropped).
【內部表】native
【外部表】non-native

These two distinctions (managed vs. external and native vs non-native) are orthogonal(正交).
Hence, there are four possibilities for base tables:

  • managed native: what you get by default with CREATE TABLE
  • external native: what you get with CREATE EXTERNAL TABLE when no STORED BY clause is specified
  • managed non-native: what you get with CREATE TABLE when a STORED BY clause is specified; Hive stores the definition in its metastore, but does not create any files itself; instead, it calls the storage handler with a request to create a corresponding object structure
  • external non-native: what you get with CREATE EXTERNAL TABLE when a STORED BY clause is specified; Hive registers the definition in its metastore and calls the storage handler to check that it matches the primary definition in the other system

One more thing 關於Hive的關閉

Hive好像沒有指定關閉的腳本。我暫時的用的方法是,找出Hive的pid(兩個東西),然后直接kill...簡單粗暴啊。

# netstat -lnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
tcp        0      0 0.0.0.0:10000               0.0.0.0:*                   LISTEN      21415/java          
tcp        0      0 0.0.0.0:50070               0.0.0.0:*                   LISTEN      12601/java          
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      884/sshd            
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      960/master          
tcp        0      0 0.0.0.0:9083                0.0.0.0:*                   LISTEN      21100/java          
tcp        0      0 192.168.129.63:9000         0.0.0.0:*                   LISTEN      12601/java          
tcp        0      0 192.168.129.63:9001         0.0.0.0:*                   LISTEN      12783/java          
tcp        0      0 :::22                       :::*                        LISTEN      884/sshd            
tcp        0      0 ::ffff:192.168.129.63:8088  :::*                        LISTEN      12939/java          
tcp        0      0 ::1:25                      :::*                        LISTEN      960/master          
tcp        0      0 ::ffff:192.168.129.63:8030  :::*                        LISTEN      12939/java          
tcp        0      0 ::ffff:192.168.129.63:8031  :::*                        LISTEN      12939/java          
tcp        0      0 ::ffff:192.168.129.63:60000 :::*                        LISTEN      20610/java          
tcp        0      0 ::ffff:192.168.129.63:8032  :::*                        LISTEN      12939/java          
tcp        0      0 ::ffff:192.168.129.63:8033  :::*                        LISTEN      12939/java          
tcp        0      0 :::60010                    :::*                        LISTEN      20610/java          
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING     8318   1/init              @/com/ubuntu/upstart
unix  2      [ ACC ]     STREAM     LISTENING     10389  850/dbus-daemon     /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     10698  960/master          public/cleanup
unix  2      [ ACC ]     STREAM     LISTENING     10705  960/master          private/tlsmgr
unix  2      [ ACC ]     STREAM     LISTENING     10709  960/master          private/rewrite
unix  2      [ ACC ]     STREAM     LISTENING     10713  960/master          private/bounce
unix  2      [ ACC ]     STREAM     LISTENING     10717  960/master          private/defer
unix  2      [ ACC ]     STREAM     LISTENING     10721  960/master          private/trace
unix  2      [ ACC ]     STREAM     LISTENING     10725  960/master          private/verify
unix  2      [ ACC ]     STREAM     LISTENING     10729  960/master          public/flush
unix  2      [ ACC ]     STREAM     LISTENING     10733  960/master          private/proxymap
unix  2      [ ACC ]     STREAM     LISTENING     10737  960/master          private/proxywrite
unix  2      [ ACC ]     STREAM     LISTENING     10741  960/master          private/smtp
unix  2      [ ACC ]     STREAM     LISTENING     10745  960/master          private/relay
unix  2      [ ACC ]     STREAM     LISTENING     10749  960/master          public/showq
unix  2      [ ACC ]     STREAM     LISTENING     10753  960/master          private/error
unix  2      [ ACC ]     STREAM     LISTENING     10757  960/master          private/retry
unix  2      [ ACC ]     STREAM     LISTENING     10761  960/master          private/discard
unix  2      [ ACC ]     STREAM     LISTENING     10765  960/master          private/local
unix  2      [ ACC ]     STREAM     LISTENING     10769  960/master          private/virtual
unix  2      [ ACC ]     STREAM     LISTENING     10773  960/master          private/lmtp
unix  2      [ ACC ]     STREAM     LISTENING     10777  960/master          private/anvil
unix  2      [ ACC ]     STREAM     LISTENING     10781  960/master          private/scache
#kill -9 21110
#kill -9 21415

參考鏈接

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
http://stackoverflow.com/questions/23658600/error-while-creating-an-hive-table-on-top-of-an-hbase-table


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM