impala中使用復雜類型(Hive):
如果Hive中創建的表帶有復雜類型(array,struct,map),且儲存格式(stored as textfile)為text或者默認,那么在impala中將無法查詢到該表
解決辦法:
另建一張字段一致的表,將stored as textfile改為stored as parquet,再將源表數據插入(insert into tablename2 select * from tablename1),這張表即可在impala中查詢。
查詢方法:
impala 和hive不同,對array,map,struct等復雜類型不使用explode,而使用如下方法:
select order_id,rooms.room_id, days.day_id,days.price from test2,test2.rooms,test2.rooms.days;
看起來是把一個復雜類型當作子表,進行join的查詢
表結構:
test2 (
order_id string,
rooms array<struct<
room_id:string,
days:array<struct<day_id:string,price:int>>
>
>
)
Impala與HBase整合:
Impala與HBase整合,需要將HBase的RowKey和列映射到Impala的Table字段中。Impala使用Hive的Metastore來存儲元數據信息,與Hive類似,在於HBase進行整合時,也是通過外部表(EXTERNAL)的方式來實現。
在HBase中創建表:

... tname = TableName.valueOf("students"); HTableDescriptor tDescriptor = new HTableDescriptor(tname); HColumnDescriptor famliy = new HColumnDescriptor("core"); tDescriptor.addFamily(famliy); admin.createTable(tDescriptor); //添加列: ... HTable htable = (HTable) connection.getTable(tname); //不要自動清理緩沖區 htable.setAutoFlush(false); for (int i = 1; i < 50; i++) { Put put = new Put(Bytes.toBytes("lisi" + format.format(i))); //關閉寫前日志 put.setWriteToWAL(false); put.addColumn(Bytes.toBytes("core"), Bytes.toBytes("math"), Bytes.toBytes(format.format(i))); put.addColumn(Bytes.toBytes("core"), Bytes.toBytes("english"), Bytes.toBytes(format.format(Math.random() * i))); put.addColumn(Bytes.toBytes("core"), Bytes.toBytes("chinese"), Bytes.toBytes(format.format(Math.random() * i))); htable.put(put); if (i % 2000 == 0) { htable.flushCommits(); } }
在Hive中創建外部表:

... state.execute("create external table if not exists students (" + "user_name string, " + "core_math string, " + "core_english string, " + "core_chinese string )" + "row format serde 'org.apache.hadoop.hive.hbase.HBaseSerDe' " + "stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' " + "with serdeproperties ('hbase.columns.mapping'=':key,core:math,core:english,core:chinese') " + "tblproperties('hbase.table.name'='students')"); ...
上面DDL語句中,在WITH SERDEPROPERTIES選項中指定Hive外部表字段到HBase列的映射,其中“:key”對應於HBase中的RowKey,名稱為“lisi****”,其余的就是列簇info中的列名。最后在TBLPROPERTIES中指定了HBase中要進行映射的表名。
在Impala中同步元數據:
Impala共享Hive的Metastore,這時需要同步元數據,可以通過在Impala Shell中執行同步命令:
#INVALIDATE METADATA;
然后,就可以查看到映射HBase中表了
注意: impala支持select / insert , 不支持 delete/update單行語句,Impala不支持修改非kudu表,其他操作與Hive類似
Java操作:
maven 依賴:

<dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> </dependency> <dependency> <groupId>com.cloudera.impala</groupId> <artifactId>jdbc</artifactId> <version>2.5.31</version> </dependency>
Java code:

import org.junit.After; import org.junit.Before; import org.junit.Test; import java.sql.*; /** * @Author:Xavier * @Data:2019-02-22 13:34 **/ public class ImpalaOptionTest { private String driverName="com.cloudera.impala.jdbc41.Driver"; private String url="jdbc:impala://datanode02:21050/xavierdb"; private Connection conn=null; private Statement state=null; private ResultSet res=null; @Before public void init() throws ClassNotFoundException, SQLException { Class.forName(driverName); conn= DriverManager.getConnection(url,"impala","impala"); state=conn.createStatement(); } //顯示數據庫 @Test public void test() throws SQLException { // ResultSet res=state.executeQuery("show databases"); // ResultSet res = state.executeQuery("show tables"); res = state.executeQuery("select * from students"); while(res.next()){ System.out.println(String.valueOf(res.getString(1))); } } // 釋放資源 @After public void destory() throws SQLException { if (res != null) state.close(); if (state != null) state.close(); if (conn != null) conn.close(); } }