impala操作hase、hive

本文轉載自查看原文 2019-02-22 18:23 831 impala

impala中使用復雜類型(Hive):
    如果Hive中創建的表帶有復雜類型（array，struct，map），且儲存格式（stored as textfile）為text或者默認，那么在impala中將無法查詢到該表
解決辦法：
    另建一張字段一致的表，將stored as textfile改為stored as parquet，再將源表數據插入（insert into tablename2 select * from tablename1），這張表即可在impala中查詢。

查詢方法：
    impala 和hive不同，對array，map，struct等復雜類型不使用explode，而使用如下方法：
select order_id,rooms.room_id, days.day_id,days.price from test2,test2.rooms,test2.rooms.days;
看起來是把一個復雜類型當作子表，進行join的查詢
表結構：
test2 (
   order_id string,
   rooms array<struct<
         room_id:string,
         days:array<struct<day_id:string,price:int>>
         >
   >
)

Impala與HBase整合:
Impala與HBase整合，需要將HBase的RowKey和列映射到Impala的Table字段中。Impala使用Hive的Metastore來存儲元數據信息，與Hive類似，在於HBase進行整合時，也是通過外部表（EXTERNAL）的方式來實現。

在HBase中創建表:

...
tname = TableName.valueOf("students");
HTableDescriptor tDescriptor = new HTableDescriptor(tname);
HColumnDescriptor famliy = new HColumnDescriptor("core");
tDescriptor.addFamily(famliy);
admin.createTable(tDescriptor);
//添加列：
...
HTable htable = (HTable) connection.getTable(tname);
//不要自動清理緩沖區
 htable.setAutoFlush(false);
for (int i = 1; i < 50; i++) {
            Put put = new Put(Bytes.toBytes("lisi" + format.format(i)));
            //關閉寫前日志
            put.setWriteToWAL(false);

            put.addColumn(Bytes.toBytes("core"), Bytes.toBytes("math"), Bytes.toBytes(format.format(i)));
            put.addColumn(Bytes.toBytes("core"), Bytes.toBytes("english"), Bytes.toBytes(format.format(Math.random() * i)));
            put.addColumn(Bytes.toBytes("core"), Bytes.toBytes("chinese"), Bytes.toBytes(format.format(Math.random() * i)));
            htable.put(put);
            if (i % 2000 == 0) {
                htable.flushCommits();
            }
        }

部分代碼

在Hive中創建外部表：

...
        state.execute("create external table if not exists students (" +
                "user_name string, " +
                "core_math string, " +
                "core_english string, " +
                "core_chinese string )" +
                "row format serde 'org.apache.hadoop.hive.hbase.HBaseSerDe' " +
                "stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' " +
                "with serdeproperties ('hbase.columns.mapping'=':key,core:math,core:english,core:chinese') " +
                "tblproperties('hbase.table.name'='students')");
...

部分代碼

上面DDL語句中，在WITH SERDEPROPERTIES選項中指定Hive外部表字段到HBase列的映射，其中“:key”對應於HBase中的RowKey，名稱為“lisi****”，其余的就是列簇info中的列名。最后在TBLPROPERTIES中指定了HBase中要進行映射的表名。

在Impala中同步元數據：
Impala共享Hive的Metastore，這時需要同步元數據，可以通過在Impala Shell中執行同步命令：
#INVALIDATE METADATA;
然后，就可以查看到映射HBase中表了

注意： impala支持select / insert , 不支持 delete/update單行語句，Impala不支持修改非kudu表，其他操作與Hive類似

Java操作：
maven 依賴：

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <dependency>
            <groupId>com.cloudera.impala</groupId>
            <artifactId>jdbc</artifactId>
            <version>2.5.31</version>
        </dependency>

maven

Java code：

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.sql.*;

/**
 * @Author:Xavier
 * @Data:2019-02-22 13:34
 **/


public class ImpalaOptionTest {
    private String driverName="com.cloudera.impala.jdbc41.Driver";
    private String url="jdbc:impala://datanode02:21050/xavierdb";
    private Connection conn=null;
    private Statement state=null;
    private ResultSet res=null;

    @Before
    public void init() throws ClassNotFoundException, SQLException {
        Class.forName(driverName);
        conn= DriverManager.getConnection(url,"impala","impala");
        state=conn.createStatement();
    }

    //顯示數據庫
    @Test
    public void test() throws SQLException {
//        ResultSet res=state.executeQuery("show databases");
//        ResultSet res = state.executeQuery("show tables");
        res = state.executeQuery("select * from students");
        
        while(res.next()){
            System.out.println(String.valueOf(res.getString(1)));
        }
    }

    // 釋放資源
    @After
    public void destory() throws SQLException {
        if (res != null) state.close();
        if (state != null) state.close();
        if (conn != null) conn.close();
    }


}

Java Code

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [轉]impala操作hive數據實例 impala為什么比hive快 Impala和Hive的關系（詳解） hive、impala集成ldap Impala與Hive的比較 Impala和Hive的區別 Impala和Hive的關系（詳解） Impala 加載Hive的UDF Hive和Impala的區別使用Hive或Impala執行SQL語句，對存儲在Elasticsearch中的數據操作