hbase 聚合操作

本文轉載自查看原文 2017-04-30 11:33 1882

hbase本身提供了聚合方法可以服務端聚合操作

hbase中的CoprocessorProtocol機制.

CoprocessorProtocol的原理比較簡單，近似於一個mapreduce框架。由client將scan分解為面向多個region的請求，並行發送請求到多個region，然后client做一個reduce的操作，得到最后的結果。

先看一個例子，使用hbase的AggregationClient可以做到簡單的面向單個column的統計。

Java代碼

@Test
public void testAggregationClient() throws Throwable {
LongColumnInterpreter columnInterpreter = new LongColumnInterpreter();
AggregationClient aggregationClient = new AggregationClient(
CommonConfig.getConfiguration());
Scan scan = new Scan();
scan.addColumn(ColumnFamilyName, QName1);
Long max = aggregationClient.max(TableNameBytes, columnInterpreter,
scan);
Assert.assertTrue(max.longValue() == 100);
Long min = aggregationClient.min(TableNameBytes, columnInterpreter,
scan);
Assert.assertTrue(min.longValue() == 20);
Long sum = aggregationClient.sum(TableNameBytes, columnInterpreter,
scan);
Assert.assertTrue(sum.longValue() == 120);
Long count = aggregationClient.rowCount(TableNameBytes,
columnInterpreter, scan);
Assert.assertTrue(count.longValue() == 4);
}

看下hbase的源碼。AggregateImplementation

Java代碼

@Override
public <T, S> T getMax(ColumnInterpreter<T, S> ci, Scan scan)
throws IOException {
T temp;
T max = null;
InternalScanner scanner = ((RegionCoprocessorEnvironment) getEnvironment())
.getRegion().getScanner(scan);
List<KeyValue> results = new ArrayList<KeyValue>();
byte[] colFamily = scan.getFamilies()[0];
byte[] qualifier = scan.getFamilyMap().get(colFamily).pollFirst();
// qualifier can be null.
try {
boolean hasMoreRows = false;
do {
hasMoreRows = scanner.next(results);
for (KeyValue kv : results) {
temp = ci.getValue(colFamily, qualifier, kv);
max = (max == null || (temp != null && ci.compare(temp, max) > 0)) ? temp : max;
}
results.clear();
} while (hasMoreRows);
} finally {
scanner.close();
}
log.info("Maximum from this region is "
+ ((RegionCoprocessorEnvironment) getEnvironment()).getRegion()
.getRegionNameAsString() + ": " + max);
return max;
}

這里由於

Java代碼

byte[] colFamily = scan.getFamilies()[0];
byte[] qualifier = scan.getFamilyMap().get(colFamily).pollFirst();

所以，hbase自帶的Aggregate函數，只能面向單列進行統計。

當我們想對多列進行Aggregate，並同時進行countRow時，有以下選擇。
1 scan出所有的row，程序自己進行Aggregate和count。
2 使用AggregationClient，調用多次，得到所有的結果。由於多次調用，有一致性問題。
3 自己擴展CoprocessorProtocol。

這個是github的hbase集成插件

這個功能集成到simplehbase里面了。
https://github.com/zhang-xzhi/simplehbase

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 HBase（四）HBase集群Shell操作 Hbase記錄-HBase基本操作（一） Elasticsearch 常用的聚合操作 HBase基本操作 HBase API 基礎操作 HBase filter shell操作 Hbase快速開始——shell操作 spark操作HBASE python 操作Hbase 詳解 Java操作hbase總結