需求描述
考慮直接在Hive或者Impala等Big Data方案,能夠支持MDX查詢,現調研一下Mondrian對hive的支持情況。
環境准備
hive環境,采用hive-0.10-cdh4.2.1 客戶端程序使用的類庫:mondrian-3.6.0、olap4j-1.2.0-SNAPSHOT
數據准備
來源於網上一個數據源,准備四張表 Customer - 客戶信息維表 Product - 產品維表 ProductType - 產品類表維表 Sale - 銷售記錄表 為了方便測試數據與MDX正確性,將數據導入到MySQL中一份,用來與Hive查詢結果進行對比。
在MySQL創建對應表及數據
具體SQL語句:

create database hive_test; use hive_test; /**用戶信息表*/ create table Customer ( cusId int not null, gender char(1) null, constraint PK_CUSTOMER primary key(cusId) ); /**產品表*/ create table Product ( proId int not null, proTypeId int null, proName varchar(32) null, constraint PK_PRODUCT primary key(proId) ); /**產品類別表*/ create table ProductType ( proTypeId int not null, proTypeName varchar(32) null, constraint PK_PRODUCTTYPE primary key (proTypeId) ); /**銷售記錄表/ create table Sale ( saleId int not null, proId int null, cusId int null, unitPrice float null, number int null, constraint PK_SALE primary key(saleId) ); insert into Customer(cusId,gender) values(1,'F'); insert into Customer(cusId,gender) values(2,'M'); insert into Customer(cusId,gender) values(3,'M'); insert into Customer(cusId,gender) values(4,'F'); insert into ProductType(proTypeId,proTypeName) values(1,'electrical'); insert into ProductType(proTypeId,proTypeName) values(2,'digital'); insert into ProductType(proTypeId,proTypeName) values(3,'furniture'); insert into Product(proId,proTypeId,proName) values(1,1,'washing machine'); insert into Product(proId,proTypeId,proName) values(2,1,'television'); insert into Product(proId,proTypeId,proName) values(3,2,'mp3'); insert into Product(proId,proTypeId,proName) values(4,2,'mp4'); insert into Product(proId,proTypeId,proName) values(5,2,'camera'); insert into Product(proId,proTypeId,proName) values(6,3,'chair'); insert into Product(proId,proTypeId,proName) values(7,3,'desk'); insert into sale(saleId,proId,cusId,unitPrice,number) values(1,1,1,340.34,2); insert into sale(saleId,proId,cusId,unitPrice,number) values(2,1,2,140.34,1); insert into sale(saleId,proId,cusId,unitPrice,number) values(3,2,3,240.34,3); insert into sale(saleId,proId,cusId,unitPrice,number) values(4,3,4,540.34,4); insert into sale(saleId,proId,cusId,unitPrice,number) values(5,4,1,80.34,5); insert into sale(saleId,proId,cusId,unitPrice,number) values(6,5,2,90.34,26); insert into sale(saleId,proId,cusId,unitPrice,number) values(7,6,3,140.34,7); insert into sale(saleId,proId,cusId,unitPrice,number) values(8,7,4,640.34,28); insert into sale(saleId,proId,cusId,unitPrice,number) values(9,6,1,140.34,29); insert into sale(saleId,proId,cusId,unitPrice,number) values(10,7,2,740.34,29); insert into sale(saleId,proId,cusId,unitPrice,number) values(11,5,3,30.34,28); insert into sale(saleId,proId,cusId,unitPrice,number) values(12,4,4,1240.34,72); insert into sale(saleId,proId,cusId,unitPrice,number) values(13,3,1,314.34,27); insert into sale(saleId,proId,cusId,unitPrice,number) values(14,3,2,45.34,27);
hive中測試數據准備
在虛擬機准備好hive測試環境,采用hive-0.10-cdh4.2.1版本 具體語句:

create database mondrian; use mondrian; create table Sale (saleId INT, proId INT, cusId INT, unitPrice FLOAT, number INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; create table Product (proId INT, proTypeId INT, proName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; create table ProductType (proTypeId INT, proTypeName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; create table Customer (cusId INT, gender STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; # Customer文件 1,F 2,M 3,M 4,F load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Customer" OVERWRITE into table Customer; # ProductType文件 1,electrical 2,digital 3,furniture load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/ProductType" into table ProductType; # Product數據文件 1,1,washing machine 2,1,television 3,2,mp3 4,2,mp4 5,2,camera 6,3,chair 7,3,desk load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Product" into table Product; # Sale數據文件 1,1,1,340.34,2 2,1,2,140.34,1 3,2,3,240.34,3 4,3,4,540.34,4 5,4,1,80.34,5 6,5,2,90.34,26 7,6,3,140.34,7 8,7,4,640.34,28 9,6,1,140.34,29 10,7,2,740.34,29 11,5,3,30.34,28 12,4,4,1240.34,72 13,3,1,314.34,27 14,3,2,45.34,27 load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Sale" into table Sale;
元數據定義
Cube、Measure等元數據定義見:
<Schema name="hello"> <Cube name="Sales"> <!-- 事實表(fact table) --> <Table name="Sale"/> <!-- 客戶維 --> <Dimension name="cusGender" foreignKey="cusId"> <Hierarchy hasAll="true" allMemberName="allGender" primaryKey="cusId"> <Table name="Customer"/> <Level name="gender" column="gender"/> </Hierarchy> </Dimension> <!-- 產品類別維 --> <Dimension name="proType" foreignKey="proId"> <Hierarchy hasAll="true" allMemberName="allPro" primaryKey="proId" primaryKeyTable="Product"> <join leftKey="proTypeId" rightKey="proTypeId"> <Table name="Product"/> <Table name="ProductType"/> </join> <Level name="proTypeId" column="proTypeId" nameColumn="proTypeName" uniqueMembers="true" table="ProductType"/> <Level name="proId" column="proId" nameColumn="proName" uniqueMembers="true" table="Product"/> </Hierarchy> </Dimension> <Measure name="numb" column="number" aggregator="sum" datatype="Numeric"/> <Measure name="totalSale" aggregator="sum" formatString="$ #,##0.00"> <!-- unitPrice*number所得值的列 --> <MeasureExpression> <SQL dialect="generic">unitPrice*number</SQL> </MeasureExpression> </Measure> <CalculatedMember name="averPri" dimension="Measures"> <Formula>[Measures].[totalSale] / [Measures].[numb]</Formula> <CalculatedMemberProperty name="FORMAT_STRING" value="$ #,##0.00"/> </CalculatedMember> </Cube> </Schema>
測試MDX
1. 查詢所有類別產品銷售總件數、平均價格和總銷售額
"select " + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," + "{([proType].[allPro],[cusGender].[allGender])} " + "on rows " + "from [Sales]"
對Hive支持情況
建立Connection連接方式
建立Connection連接方式有兩種:
使用mondrian自帶的DriverManager獲取Connection實例
mondrian中自帶的API
# 這里的Connection、DriverManager、Query、Result等都是mondrian提供的API接口 Connection connection = DriverManager.getConnection( "Provider=mondrian;" + "Jdbc=jdbc:hive2://node02:10000/mondrian;" + "JdbcUser=;JdbcPassword=;" + "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" + "JdbcDrivers=org.apache.hive.jdbc.HiveDriver", null); Query query = connection.parseQuery( "select \n" + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns,\n" + "{([proType].[allPro],[cusGender].[allGender])} \n" + "on rows\n" + "from [Sales]\n"); @SuppressWarnings("deprecation") Result result = connection.execute(query); PrintWriter pw = new PrintWriter(System.out); result.print(pw); pw.flush();
對應的連接MySQL,只需要將getConnection中的connectString換成如下即可:
Connection connection = DriverManager.getConnection( "Provider=mondrian;" + "Jdbc=jdbc:mysql://localhost:3306/hive_test; JdbcUser=root;" + "JdbcPassword=123;" + "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" + "JdbcDrivers=com.mysql.jdbc.Driver", null);
測試的時候連接MySQL時,沒什么問題,在使用相同的API連Hive的時候,有點問題。down了一下源碼發現它的過程是這樣的:先去連接池中取一個Connection實例,沒有的話通過Factory創建一個Connection放入池里。而在Mondrian創建Factory的時候指定了兩個屬性:autoCommit和readOnly,RDBMS的Driver都沒什么問題,Hive的JDBC提供的HiveConnection中對這兩個屬性的set方法實現得很詭異,都是直接拋異常了:
public void setReadOnly(boolean readOnly) throws SQLException { // TODO Auto-generated method stub throw new SQLException("Method not supported"); } public void setAutoCommit(boolean autoCommit) throws SQLException { if (autoCommit) { throw new SQLException("enabling autocommit is not supported"); } }
將這兩行拋出異常的地方注釋掉,rebuild一下jar包,MDX就可以順利執行完了。
使用JDK原生的DriverManager獲取Connection實例
可以使用JDK原生的DriverManager獲取Connection然后再使用Olap4j的封裝成OLapConnection然后再去執行MDX 具體連接示例如下:
Class.forName("mondrian.olap4j.MondrianOlap4jDriver"); Connection nativeConn = DriverManager.getConnection("jdbc:mondrian:Jdbc=jdbc:hive2://node02:10000/mondrian; JdbcUser=;" + "JdbcPassword=;" + "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" + "JdbcDrivers=org.apache.hive.jdbc.HiveDriver"); OlapConnection olapConn = nativeConn.unwrap(OlapConnection.class); if (olapConn == null) { throw new IllegalStateException("Connection is null"); } OlapStatement statement = olapConn.createStatement(); CellSet cellSet = statement.executeOlapQuery("select " + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," + "{([proType].[allPro],[cusGender].[allGender])} " + "on rows " + "from [Sales]") ; //formatter. RectangularCellSetFormatter formatter = new RectangularCellSetFormatter(false); // Print out. PrintWriter writer = new PrintWriter(System.out); formatter.format(cellSet, writer); writer.flush(); statement.close(); olapConn.close(); nativeConn.close();
指定Database
hive也有類似RDBMS一樣有database的概念,在Hive提供的普通Java API中雖然在連接字符串中指定了database,但是它默認的並非你指定的database而是上一次當前客戶端或線程使用的database(注:並非default),所以一般使用Hive 客戶端必須先執行一下use database。而在OlapConnection和Mondrian提供的Connection都不支持"use database"操作。暫時的解決辦法,每次去進行MDX查詢的時候先通過普通的Java Api執行一下use database,指定到當前需要查詢的數據庫中。