Pentaho的Mondrian對Hive的支持


需求描述

考慮直接在Hive或者Impala等Big Data方案,能夠支持MDX查詢,現調研一下Mondrian對hive的支持情況。

環境准備

hive環境,采用hive-0.10-cdh4.2.1 客戶端程序使用的類庫:mondrian-3.6.0、olap4j-1.2.0-SNAPSHOT

數據准備

來源於網上一個數據源,准備四張表 Customer - 客戶信息維表 Product - 產品維表 ProductType - 產品類表維表 Sale - 銷售記錄表 為了方便測試數據與MDX正確性,將數據導入到MySQL中一份,用來與Hive查詢結果進行對比。

在MySQL創建對應表及數據

具體SQL語句:

create database hive_test;

use hive_test;

/**用戶信息表*/
create table Customer (
       cusId int not null,
       gender char(1) null,
       constraint PK_CUSTOMER primary key(cusId)
);

/**產品表*/
create table Product (
       proId int not null,
       proTypeId int null,
       proName varchar(32) null,
       constraint PK_PRODUCT primary key(proId)
);

/**產品類別表*/
create table ProductType (
       proTypeId int not null,
       proTypeName varchar(32) null,
       constraint PK_PRODUCTTYPE primary key (proTypeId)
);

/**銷售記錄表/
create table Sale (
       saleId int not null,
       proId int null,
       cusId int null,
       unitPrice float null,
       number int null,
       constraint PK_SALE primary key(saleId)
);

insert into Customer(cusId,gender) values(1,'F');
insert into Customer(cusId,gender) values(2,'M');
insert into Customer(cusId,gender) values(3,'M');
insert into Customer(cusId,gender) values(4,'F');


insert into ProductType(proTypeId,proTypeName) values(1,'electrical');
insert into ProductType(proTypeId,proTypeName) values(2,'digital');
insert into ProductType(proTypeId,proTypeName) values(3,'furniture');

insert into Product(proId,proTypeId,proName) values(1,1,'washing machine');
insert into Product(proId,proTypeId,proName) values(2,1,'television');
insert into Product(proId,proTypeId,proName) values(3,2,'mp3');
insert into Product(proId,proTypeId,proName) values(4,2,'mp4');
insert into Product(proId,proTypeId,proName) values(5,2,'camera');
insert into Product(proId,proTypeId,proName) values(6,3,'chair');
insert into Product(proId,proTypeId,proName) values(7,3,'desk');
insert into sale(saleId,proId,cusId,unitPrice,number) values(1,1,1,340.34,2);
insert into sale(saleId,proId,cusId,unitPrice,number) values(2,1,2,140.34,1);
insert into sale(saleId,proId,cusId,unitPrice,number) values(3,2,3,240.34,3);
insert into sale(saleId,proId,cusId,unitPrice,number) values(4,3,4,540.34,4);
insert into sale(saleId,proId,cusId,unitPrice,number) values(5,4,1,80.34,5);
insert into sale(saleId,proId,cusId,unitPrice,number) values(6,5,2,90.34,26);
insert into sale(saleId,proId,cusId,unitPrice,number) values(7,6,3,140.34,7);
insert into sale(saleId,proId,cusId,unitPrice,number) values(8,7,4,640.34,28);
insert into sale(saleId,proId,cusId,unitPrice,number) values(9,6,1,140.34,29);
insert into sale(saleId,proId,cusId,unitPrice,number) values(10,7,2,740.34,29);
insert into sale(saleId,proId,cusId,unitPrice,number) values(11,5,3,30.34,28);
insert into sale(saleId,proId,cusId,unitPrice,number) values(12,4,4,1240.34,72);
insert into sale(saleId,proId,cusId,unitPrice,number) values(13,3,1,314.34,27);
insert into sale(saleId,proId,cusId,unitPrice,number) values(14,3,2,45.34,27);
View Code

hive中測試數據准備

在虛擬機准備好hive測試環境,采用hive-0.10-cdh4.2.1版本 具體語句:

create database mondrian;
use mondrian;
create table Sale (saleId INT, proId INT, cusId INT, unitPrice FLOAT, number INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
create table Product (proId INT, proTypeId INT, proName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
create table ProductType (proTypeId INT, proTypeName STRING)   ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
create table Customer (cusId INT, gender STRING)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

# Customer文件
1,F
2,M
3,M
4,F
load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Customer" OVERWRITE into table Customer;

# ProductType文件
1,electrical
2,digital
3,furniture
load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/ProductType" into table ProductType;

# Product數據文件
1,1,washing machine
2,1,television
3,2,mp3
4,2,mp4
5,2,camera
6,3,chair
7,3,desk
load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Product" into table Product;

# Sale數據文件
1,1,1,340.34,2
2,1,2,140.34,1
3,2,3,240.34,3
4,3,4,540.34,4
5,4,1,80.34,5
6,5,2,90.34,26
7,6,3,140.34,7
8,7,4,640.34,28
9,6,1,140.34,29
10,7,2,740.34,29
11,5,3,30.34,28
12,4,4,1240.34,72
13,3,1,314.34,27
14,3,2,45.34,27
load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Sale" into table Sale;
View Code

 

元數據定義

Cube、Measure等元數據定義見:

<Schema name="hello">
<Cube name="Sales">
<!--  事實表(fact table)  -->
<Table name="Sale"/>
<!--  客戶維  -->
<Dimension name="cusGender" foreignKey="cusId">
<Hierarchy hasAll="true" allMemberName="allGender" primaryKey="cusId">
<Table name="Customer"/>
<Level name="gender" column="gender"/>
</Hierarchy>
</Dimension>
<!--  產品類別維  -->
<Dimension name="proType" foreignKey="proId">
<Hierarchy hasAll="true" allMemberName="allPro" primaryKey="proId" primaryKeyTable="Product">
<join leftKey="proTypeId" rightKey="proTypeId">
<Table name="Product"/>
<Table name="ProductType"/>
</join>
<Level name="proTypeId" column="proTypeId" nameColumn="proTypeName" uniqueMembers="true" table="ProductType"/>
<Level name="proId" column="proId" nameColumn="proName" uniqueMembers="true" table="Product"/>
</Hierarchy>
</Dimension>
<Measure name="numb" column="number" aggregator="sum" datatype="Numeric"/>
<Measure name="totalSale" aggregator="sum" formatString="$ #,##0.00">
<!--  unitPrice*number所得值的列  -->
<MeasureExpression>
<SQL dialect="generic">unitPrice*number</SQL>
</MeasureExpression>
</Measure>
<CalculatedMember name="averPri" dimension="Measures">
<Formula>[Measures].[totalSale] / [Measures].[numb]</Formula>
<CalculatedMemberProperty name="FORMAT_STRING" value="$ #,##0.00"/>
</CalculatedMember>
</Cube>
</Schema>

測試MDX

1. 查詢所有類別產品銷售總件數、平均價格和總銷售額

"select " + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," + "{([proType].[allPro],[cusGender].[allGender])} " + "on rows " + "from [Sales]" 

對Hive支持情況

建立Connection連接方式

建立Connection連接方式有兩種:

使用mondrian自帶的DriverManager獲取Connection實例

mondrian中自帶的API

# 這里的Connection、DriverManager、Query、Result等都是mondrian提供的API接口
        Connection connection = DriverManager.getConnection(
                "Provider=mondrian;" +
                "Jdbc=jdbc:hive2://node02:10000/mondrian;" +
                "JdbcUser=;JdbcPassword=;" +
                "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
                "JdbcDrivers=org.apache.hive.jdbc.HiveDriver", null);

        Query query = connection.parseQuery(
                "select \n" +
                        "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns,\n" +
                        "{([proType].[allPro],[cusGender].[allGender])} \n" +
                        "on rows\n" +
                        "from [Sales]\n");

        @SuppressWarnings("deprecation")
        Result result = connection.execute(query);
        PrintWriter pw = new PrintWriter(System.out);
        result.print(pw);
        pw.flush();

  對應的連接MySQL,只需要將getConnection中的connectString換成如下即可:

Connection connection =  DriverManager.getConnection(
                "Provider=mondrian;" +
                        "Jdbc=jdbc:mysql://localhost:3306/hive_test; JdbcUser=root;" +
                        "JdbcPassword=123;" +
                        "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
                        "JdbcDrivers=com.mysql.jdbc.Driver", null);

 

  測試的時候連接MySQL時,沒什么問題,在使用相同的API連Hive的時候,有點問題。down了一下源碼發現它的過程是這樣的:先去連接池中取一個Connection實例,沒有的話通過Factory創建一個Connection放入池里。而在Mondrian創建Factory的時候指定了兩個屬性:autoCommit和readOnly,RDBMS的Driver都沒什么問題,Hive的JDBC提供的HiveConnection中對這兩個屬性的set方法實現得很詭異,都是直接拋異常了:

  public void setReadOnly(boolean readOnly) throws SQLException {
    // TODO Auto-generated method stub
    throw new SQLException("Method not supported");
  }
  public void setAutoCommit(boolean autoCommit) throws SQLException {
    if (autoCommit) {
      throw new SQLException("enabling autocommit is not supported");
    }
  }

 

將這兩行拋出異常的地方注釋掉,rebuild一下jar包,MDX就可以順利執行完了。

使用JDK原生的DriverManager獲取Connection實例

  可以使用JDK原生的DriverManager獲取Connection然后再使用Olap4j的封裝成OLapConnection然后再去執行MDX 具體連接示例如下:

 Class.forName("mondrian.olap4j.MondrianOlap4jDriver");
 Connection nativeConn = DriverManager.getConnection("jdbc:mondrian:Jdbc=jdbc:hive2://node02:10000/mondrian; JdbcUser=;" +
         "JdbcPassword=;" +
         "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
         "JdbcDrivers=org.apache.hive.jdbc.HiveDriver");

 OlapConnection olapConn = nativeConn.unwrap(OlapConnection.class);

 if (olapConn == null) {
     throw new IllegalStateException("Connection is null");
 }
 OlapStatement statement = olapConn.createStatement();
 CellSet cellSet = statement.executeOlapQuery("select " +
         "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," +
         "{([proType].[allPro],[cusGender].[allGender])} " +
         "on rows " +
         "from [Sales]") ;
 //formatter.
 RectangularCellSetFormatter formatter =
         new RectangularCellSetFormatter(false);

 // Print out.
 PrintWriter writer = new PrintWriter(System.out);
 formatter.format(cellSet, writer);
 writer.flush();
 statement.close();
 olapConn.close();
 nativeConn.close();

 

指定Database

  hive也有類似RDBMS一樣有database的概念,在Hive提供的普通Java API中雖然在連接字符串中指定了database,但是它默認的並非你指定的database而是上一次當前客戶端或線程使用的database(注:並非default),所以一般使用Hive 客戶端必須先執行一下use database。而在OlapConnection和Mondrian提供的Connection都不支持"use database"操作。暫時的解決辦法,每次去進行MDX查詢的時候先通過普通的Java Api執行一下use database,指定到當前需要查詢的數據庫中。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM