測試Postgresql和遠程Hive的Join操作。
測試環境
Centos6.8
HDP2.4集群,其中Hive Server2位於主機名為hdp的主機上
Postgres by BigSQL(pg96)
Installation Steps
由於Postgres by BigSQL上有編譯好的hadoop_fdw,只需用其pgc命令直接安裝,否則要去編譯hadoop_fdw源代碼,這個編譯過程中缺少各種依賴就放棄了,編譯參考bulid。
下載包:
$ wget http://oscg-downloads.s3.amazonaws.com/packages/postgresql-9.5.7-1-x64-bigsql.rpm
以sudo權限安裝rpm包:
$ sudo yum localinstall postgresql-9.6.2-2-x64-bigsql.rpm
Postgresql被安裝到/opt/postgresql/pg96,Postgresql使用的所有庫都位於/opt/postgresql/pg96/lib目錄中,以減少沖突和其他不兼容的可能性。你可以添加--prefix以將包安裝到你所指定的位置。
你也可以將前面2步合在一起:
$ sudo yum install http://oscg-downloads.s3.amazonaws.com/packages/postgresql-9.6.2-2-x64-bigsql.rpm
Configure and initializing PostgreSQL Server
以sudo權限執行下面命令:
$ sudo /opt/postgresql/pgc start pg96
Using the Database
加載postgres環境變量:
$ . /opt/postgresql/pg96/pg96.env
查看pg96的狀態:
$ sudo /opt/postgresql/pgc status
進入數據庫:
$ /opt/postgresql/pg96/bin/psql -U postgres -d postgres
安裝HadoopFDW前需要准備環境
- Hadoop集群,並且其他機器可以訪問hive的默認端口10000(這里使用的是HDP)
- 將Hadoop集群中如下2個jar文件放到postgresql server機器上,我這里放到/opt/hadoop/hive-client-lib(若沒有此目錄,自行創建)
/usr/hdp/2.4.0.0-169/
|
`--- hadoop/
|
`--- hadoop-common-2.7.1.2.4.0.0-169.jar
|
`--- hive/
|
`--- lib
|
`--- hive-jdbc-1.2.1000.2.4.0.0-169-standalone.jar
postgresql server查看jar文件:
$ ls /opt/hadoop/hive-client-lib/
hadoop-common-2.7.1.2.4.0.0-169.jar hive-jdbc-1.2.1000.2.4.0.0-169-standalone.jar
- 測試Jdbc連接Hive
在postgreSQL host上,用下面的內容創建一個小的Jdbc程序HiveJdbcClient.java:
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class HiveJdbcClient {
private static final String url = "jdbc:hive2://hdp:10000";
private static final String user = "hive";
private static final String password = "123456";
private static final String query = "SHOW DATABASES";
private static final String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
Connection con = DriverManager.getConnection(url, user, password);
Statement stmt = con.createStatement();
System.out.println("Running: " + query);
ResultSet res = stmt.executeQuery(query);
while (res.next()) {
System.out.println(res.getString(1));
}
}
}
注意:hdp主機名和對應ip需要映射到/etc/hosts中。
編譯:
javac HiveJdbcClient.java
運行下面的命令執行程序:
java -cp .:$(echo /opt/hadoop/hive-client-lib/*.jar | tr ' ' :) HiveJdbcClient
最后2行輸出:
Running: SHOW DATABASES
default
- 假設jdk安裝在
/opt/jdk1.8.0_111,執行如下命令:
ln -s /opt/jdk1.8.0_111/jre/lib/amd64/server/libjvm.so /opt/postgresql/pg96/lib/libjvm.so
- 在
/etc/profile中添加如下2句,並且source
export LD_LIBRARY_PATH=/opt/jdk1.8.0_111/jre/lib/amd64/server:$LD_LIBRARY_PATH
export HADOOP_FDW_CLASSPATH=/opt/postgresql/pg96/lib/postgresql/Hadoop_FDW.jar:$(echo /opt/hadoop/hive-client-lib/*.jar | tr ' ' :)
其中LD_LIBRARY_PATH設置libjvm.so的父目錄的環境變量,Hadoop_FDW.jar為后面安裝完hadoop_fdw后生成在此目錄中。
以上所有配置完成后,重啟pg96服務,使用下面命令:
cd /opt/postgresql
./pgc restart pg96
Install and Enable Hadoop-FDW
./pgc install hadoop_fdw2-pg96
在hive所在機器上創建測試所需的表
hive> show databases;
OK
default
hive> create table test_fdw(id int, height float);
hive> insert into test_fdw values(1, 1.68);
hive> select * from test_fdw;
OK
1 1.68
進入pg96使用
/opt/postgresql/pg96/bin/psql -U postgres
CREATE EXTENSION hadoop_fdw;
CREATE SERVER hadoop_server FOREIGN DATA WRAPPER hadoop_fdw
OPTIONS (HOST 'hdp', PORT '10000');
CREATE USER MAPPING FOR PUBLIC SERVER hadoop_server;
create foreign table foreign_hive(
id int,
height float)
server hadoop_server OPTIONS (TABLE 'test_fdw');
select * from foreign_hive;
id | height
----+------------------
1 | 1.67999994754791
(1 row)
測試Hive與本地Postgresql的join
在postgresql上建表:
create table local_postgresql (id int, name text);
insert into local_postgresql values(1, 'li'),(2, 'wang');
測試join查詢:
select * from foreign_hive join local_postgresql on foreign_hive.id= local_postgresql.id;
id | height | id | name
----+------------------+----+------
1 | 1.67999994754791 | 1 | li
(1 row)
參考網址:
