最近公司有一個需求,需求的內容是根據用戶頁面選擇的參數條件查詢Hive,數量量大致是300萬以內,要求3秒響應.使用的其它的技術就不要說了,先說說SpingBoot集成Hive-jdbc吧,網上雖然有完整的集成方案,但是根據方案來實現總是遇到各種各樣的問題,一會日志包問題 一會jetty問題,各種煩心的異常.這次蹭着這個機會來說說我是怎么集成的.
先貼上我的pom.xml相關依賴:
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid-spring-boot-starter</artifactId>
<version>1.1.16</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty.aggregate</groupId>
<artifactId>jetty-all</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-shims</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
連接數據庫肯定需要DataSource配置,我這里用的阿里系的.
@Configuration
@ConfigurationProperties(prefix = "hive")
public class HiveDruidConfig {
private String url;
private String user;
private String password;
private String driverClassName;
private int initialSize;
private int minIdle;
private int maxWait;
private int timeBetweenEvictionRunsMillis;
private int minEvictableIdleTimeMillis;
private String validationQuery;
private boolean testWhileIdle;
private boolean testOnBorrow;
private boolean testOnReturn;
private boolean poolPreparedStatements;
private int maxPoolPreparedStatementPerConnectionSize;
@Bean(name = "hiveDruidDataSource")
@Qualifier("hiveDruidDataSource")
public DruidDataSource dataSource() {
DruidDataSource datasource = new DruidDataSource();
datasource.setUrl(url);
datasource.setUsername(user);
datasource.setPassword(password);
datasource.setDriverClassName(driverClassName);
// pool configuration
datasource.setInitialSize(initialSize);
datasource.setMinIdle(minIdle);
datasource.setMaxWait(maxWait);
datasource.setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis);
datasource.setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis);
datasource.setValidationQuery(validationQuery);
datasource.setTestWhileIdle(testWhileIdle);
datasource.setTestOnBorrow(testOnBorrow);
datasource.setTestOnReturn(testOnReturn);
datasource.setPoolPreparedStatements(poolPreparedStatements);
datasource.setMaxPoolPreparedStatementPerConnectionSize(maxPoolPreparedStatementPerConnectionSize);
return datasource;
}
// 此處省略各個屬性的get和set方法
@Bean(name = "hiveDruidTemplate")
public JdbcTemplate hiveDruidTemplate(@Qualifier("hiveDruidDataSource") DataSource dataSource) {
return new JdbcTemplate(dataSource);
}
}
配置完成以后,我們就需要寫工具類了
HiveRepository.java
@Service
public class HiveRepository{
@Autowired
private JdbcTemplate hiveJdbcTemplate;
/**
* <li>Description: TODO </li>
*/
@PostConstruct
public void createTable() {
/*建表SQL語句*/
StringBuffer sql = new StringBuffer("create table IF NOT EXISTS ");
sql.append("bus_receiver ");
sql.append("(id BIGINT comment '主鍵ID' " +
",name STRING comment '姓名' " +
",address STRING comment '地址'" +
",en_name STRING comment '拼音名字'" +
",member_family INT comment '家庭成員'" +
",createDate DATE comment '創建時') ");
sql.append(" ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'"); // 定義分隔符
sql.append(" STORED AS TEXTFILE"); // 作為文本存儲*/
hiveJdbcTemplate.execute(sql.toString());
}
/**
* <li>Description: TODO </li>
*
* @param pathFile TODO
*/
public void loadData(String pathFile){
String sql = "LOAD DATA INPATH '"+pathFile+"' INTO TABLE bus_receiver";
hiveJdbcTemplate.execute(sql);
}
/**
* <li>Description: TODO </li>
*
* @param busReceiverEntity 實體
*/
public void insert(BusReceiverEntity busReceiverEntity) {
hiveJdbcTemplate.update("insert into bus_receiver(id,name,address,en_name,member_family) values(?,?,?,?,?)",
new PreparedStatementSetter(){
@Override
public void setValues(PreparedStatement ps) throws SQLException {
ps.setLong(1, busReceiverEntity.getId());
ps.setString(2,busReceiverEntity.getName());
ps.setString(3,busReceiverEntity.getAddress());
ps.setString(4,busReceiverEntity.getEnName());
ps.setInt(5,busReceiverEntity.getMemberFamily());
}
}
);
}
public void deleteAll(){
String sql = "insert overwrite table bus_receiver select * from bus_receiver where 1=0";
hiveJdbcTemplate.execute(sql);
}
}
最后貼上配置文件:
hive:
url: jdbc:hive2://XXX:10000/test
driver-class-name: org.apache.hive.jdbc.HiveDriver
filters: stat
initialSize: 2
maxWait: 60000
timeBetweenEvictionRunsMillis: 60000
minEvictableIdleTimeMillis: 300000
validationQuery: SELECT 1
testWhileIdle: true
testOnBorrow: false
testOnReturn: false
poolPreparedStatements: false
maxPoolPreparedStatementPerConnectionSize: 200
需要注意的是,再啟動項目的時候需要將servlet-api放到JAVA_HOME/jre/lib/ext目錄下
以上方案並非完全原創.
