Hbase 學習（九）華為二級索引（原理）

本文轉載自查看原文 2016-01-29 15:11 3276 HBase

轉自：http://my.oschina.net/u/923508/blog/413129

這個是華為的二級索引方案，已經開放源代碼了,下面是網上的一篇講解原理的帖子，發出來和大家共享一下。

經過本人認真閱讀了一下代碼，發現這個源碼僅供參考，想要集成到原有的集群當中是有點兒難度的，它對hbase的源碼進行不少的修改。

源碼地址：https://github.com/Huawei-Hadoop/hindex

下面來對其方案做一個分析。

1.整體架構

這個架構在Client Ext中設定索引細節，在Balancer中收集信息，在Coprocessor中管理二級索引數據。

architecture 華為hbase二級索引（secondary index）細節分析

2.表創建

在創建表的時候，在同一個region server上創建索引表，且一一對應。

3.插入操作

在主表中插入某條數據后，用Coprocessor將索引列寫到索引表中去，寫道索引表中的數據的主鍵為：region開始key+索引名+索引列值+主表row key。這么做，是為了讓其在同一個分布規則下，索引表會跟主表在通過region server上，在查詢的時候就可以少一次rpc。

4.scan操作

一個查詢到來的時候，通過coprocessor鈎子，先從索引表中查詢范圍row，然后再從主表中相關row中掃描獲得最終數據。

5. split操作處理

為了使主表和索引表在同一個RS上，要禁用索引表的自動和手動split，只能由主表split的時候觸發，當主表split的時候，對索引表按其對應數據進行划分，同時，對索引表的第二個daughter split的row key的前面部分修改為對應的主鍵的row key。

6. 性能

查詢性能極大提升,插入性能下降10%左右

總結，本文對華為hbase使用coprocessor進行二級索引的方案的創建表，插入數據，查詢數據的步驟進行了一個粗略分析，以窺其全貌。在使用的時候，可以作為一個參考。

轉載自：http://www.dengchuanhua.com/167.html

————————————————————————————————————————————————————————————

二級索引實現方式：http://www.aboutyun.com/thread-14201-1-1.html

HBase的一級索引就是rowkey，我們只能通過rowkey進行檢索。如果我們相對hbase里面列族的列列進行一些組合查詢，就需要采用HBase的二級索引方案來進行多條件的查詢。
常見的二級索引方案有以下幾種：
1.MapReduce方案
2.ITHBASE方案
3.IHBASE方案
4.Coprocessor方案
5.Solr+hbase方案

MapReduce方案

IndexBuilder：利用MR的方式構建Index
優點：並發批量構建Index
缺點：不能實時構建Index

舉例：
原表：

[Bash shell] 純文本查看復制代碼

 
                 row  1      f1:name  zhangsan 
                
                 row  2      f1:name  lisi 
                
                 row  3      f1:name  wangwu

索引表：

[Bash shell] 純文本查看復制代碼

 
                 row     zhangsan    f1: 
                 id   
                 1 
                
 
                 row     lisi        f1: 
                 id   
                 2 
                
 
                 row     wangwu      f1: 
                 id   
                 3 
                

Demo

[Bash shell] 純文本查看復制代碼

001

002

003

004

005

006

007

008

009

010

011

012

013

014

015

016

017

018

019

020

021

022

023

024

025

026

027

028

029

030

031

032

033

034

035

036

037

038

039

040

041

042

043

044

045

046

047

048

049

050

051

052

053

054

055

056

057

058

059

060

061

062

063

064

065

066

067

068

069

070

071

072

073

074

075

076

077

078

079

080

081

082

083

084

085

086

087

088

089

090

091

092

093

094

095

096

097

098

099

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

 
                 package IndexDouble; 
                
                 import 
                 java.io.IOException; 
                
                 import 
                 java.util.HashMap; 
                
                 import 
                 java.util.Map; 
                
                 import 
                 java.util.Set; 
                
                 import 
                 org.apache.commons.collections.map.HashedMap; 
                
                 import 
                 org.apache.hadoop.conf.Configuration; 
                
                 import 
                 org.apache.hadoop.hbase.HBaseConfiguration; 
                
                 import 
                 org.apache.hadoop.hbase.client.HConnection; 
                
                 import 
                 org.apache.hadoop.hbase.client.HConnectionManager; 
                
                 import 
                 org.apache.hadoop.hbase.client.Put; 
                
                 import 
                 org.apache.hadoop.hbase.client.Result; 
                
                 import 
                 org.apache.hadoop.hbase.client.Scan; 
                
                 import 
                 org.apache.hadoop.hbase.io.ImmutableBytesWritable; 
                
                 import 
                 org.apache.hadoop.hbase.mapreduce.MultiTableOutputFormat; 
                
                 import 
                 org.apache.hadoop.hbase.mapreduce.TableInputFormat; 
                
                 import 
                 org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; 
                
                 import 
                 org.apache.hadoop.hbase.mapreduce.TableMapper; 
                
                 import 
                 org.apache.hadoop.hbase.util.Bytes; 
                
                 import 
                 org.apache.hadoop.mapreduce.Job; 
                
                 import 
                 org.apache.hadoop.util.GenericOptionsParser; 
                
                 public class IndexBuilder { 
                
                 private String rootDir; 
                
                 private String zkServer; 
                
                 private String port; 
                
                 private Configuration conf;  
                
                 private HConnection hConn = null; 
                
                 private IndexBuilder(String rootDir,String zkServer,String port) throws IOException{ 
                
                 this.rootDir = rootDir; 
                
                 this.zkServer = zkServer; 
                
                 this.port = port; 
                
                 conf = HBaseConfiguration.create(); 
                
                 conf. 
                 set 
                 ( 
                 "hbase.rootdir" 
                 , rootDir); 
                
                 conf. 
                 set 
                 ( 
                 "hbase.zookeeper.quorum" 
                 , zkServer); 
                
                 conf. 
                 set 
                 ( 
                 "hbase.zookeeper.property.clientPort" 
                 , port); 
                
                 hConn = HConnectionManager.createConnection(conf);   
                
                 } 
                
                 static class MyMapper extends TableMapper<ImmutableBytesWritable, Put>{ 
                
                 // 
                 記錄了要進行索引的列 
                
                 private Map<byte[], ImmutableBytesWritable> indexes = new  
                
                 HashMap<byte[], ImmutableBytesWritable>(); 
                
                 private String familyName; 
                
                 @Override 
                
                 protected void map(ImmutableBytesWritable key, Result value, 
                
                 Context context) throws IOException, InterruptedException { 
                
                 // 
                 原始表列 
                
                 Set<byte[]> keys = indexes.keySet(); 
                
                 // 
                 索引表的rowkey是原始表的列，索引表的列是原始表的rowkey 
                
                 for 
                 (byte[] k : keys){ 
                
                 // 
                 獲得新建索引表的表名 
                
                 ImmutableBytesWritable indexTableName = indexes.get(k); 
                
                 //Result 
                 存放的是原始表的數據 
                
                 // 
                 查找到內容             根據列族 和 列 得到原始表的值 
                
                 byte[] val = value.getValue(Bytes.toBytes(familyName), k); 
                
                 if 
                 (val != null) { 
                
                 // 
                 索引表 
                
                 Put put = new Put(val); 
                 // 
                 索引表行鍵 
                
                 // 
                 列族  列   原始表的行鍵 
                
                 put.add(Bytes.toBytes( 
                 "f1" 
                 ),Bytes.toBytes( 
                 "id" 
                 ),key.get()); 
                
                 context.write(indexTableName, put); 
                
                 } 
                
                 } 
                
                 } 
                
                 // 
                 真正運行Map之前執行一些處理。 
                
                 @Override 
                
                 protected void setup(Context context) throws IOException, 
                
                 InterruptedException { 
                
                 // 
                 通過上下文得到配置 
                
                 Configuration conf = context.getConfiguration(); 
                
                 // 
                 獲得表名 
                
                 String tableName = conf.get( 
                 "tableName" 
                 );  
                
                 //String 
                 family = conf.get( 
                 "familyName" 
                 ); 
                
                 // 
                 獲得列族 
                
                 familyName = conf.get( 
                 "columnFamily" 
                 ); 
                
                 // 
                 獲得列 
                
                 String[] qualifiers = conf.getStrings( 
                 "qualifiers" 
                 );  
                
                 for 
                 (String qualifier : qualifiers) { 
                
                 // 
                 建立一個映射，為每一個列創建一個表，表的名字tableName+ 
                 "-" 
                 +qualifier 
                
                 // 
                 原始表的列    索引表新建表名 
                
                 indexes.put(Bytes.toBytes(qualifier),  
                
                 new ImmutableBytesWritable(Bytes.toBytes(tableName+ 
                 "-" 
                 +qualifier))); 
                
                 } 
                
                 }    
                
                 } 
                
                 public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { 
                
                 String rootDir =  
                 "hdfs://hadoop1:8020/hbase" 
                 ; 
                
                 String zkServer =  
                 "hadoop1" 
                 ; 
                
                 String port =  
                 "2181" 
                 ; 
                
                 IndexBuilder conn = new IndexBuilder(rootDir,zkServer,port); 
                
                 String[] otherArgs = new GenericOptionsParser(conn.conf, args).getRemainingArgs();  
                
                 //IndexBuilder 
                 : TableName,ColumnFamily,Qualifier 
                
                 if 
                 (otherArgs.length<3){ 
                
                 System. 
                 exit 
                 (-1); 
                
                 } 
                
                 // 
                 表名 
                
                 String tableName = otherArgs[0]; 
                
                 // 
                 列族 
                
                 String columnFamily = otherArgs[1]; 
                
                 conn.conf. 
                 set 
                 ( 
                 "tableName" 
                 , tableName); 
                
                 conn.conf. 
                 set 
                 ( 
                 "columnFamily" 
                 , columnFamily); 
                
                 // 
                 列  可能存在多個列 
                
                 String[] qualifiers = new String[otherArgs.length-2]; 
                
                 for 
                 (int i = 0; i < qualifiers.length; i++) { 
                
                 qualifiers[i] = otherArgs[i+2]; 
                
                 } 
                
                 // 
                 設置列 
                
                 conn.conf.setStrings( 
                 "qualifiers" 
                 , qualifiers); 
                
                 @SuppressWarnings( 
                 "deprecation" 
                 ) 
                
                 Job job = new Job(conn.conf,tableName); 
                
                 job.setJarByClass(IndexBuilder.class); 
                
                 job.setMapperClass(MyMapper.class); 
                
                 job.setNumReduceTasks(0); 
                 // 
                 由於不需要執行reduce階段 
                
                 job.setInputFormatClass(TableInputFormat.class); 
                
                 job.setOutputFormatClass(MultiTableOutputFormat.class); 
                
                 Scan scan = new Scan(); 
                
                 TableMapReduceUtil.initTableMapperJob(tableName,scan,  
                
                 MyMapper.class, ImmutableBytesWritable.class, Put.class, job); 
                
                 job.waitForCompletion( 
                 true 
                 ); 
                
                 } 
                
                 }

[Bash shell] 純文本查看復制代碼

 
                 創建原始表 
                
                 hbase(main):002:0> create  
                 'studentinfo' 
                 , 
                 'f1' 
                
                 0 row(s)  
                 in 
                 0.6520 seconds 
                
                 => Hbase::Table - studentinfo 
                
                 hbase(main):003:0> put  
                 'studentinfo' 
                 , 
                 '1' 
                 , 
                 'f1:name' 
                 , 
                 'zhangsan' 
                
                 0 row(s)  
                 in 
                 0.1640 seconds 
                
                 hbase(main):004:0> put  
                 'studentinfo' 
                 , 
                 '2' 
                 , 
                 'f1:name' 
                 , 
                 'lisi' 
                
                 0 row(s)  
                 in 
                 0.0240 seconds 
                
                 hbase(main):005:0> put  
                 'studentinfo' 
                 , 
                 '3' 
                 , 
                 'f1:name' 
                 , 
                 'wangwu' 
                
                 0 row(s)  
                 in 
                 0.0290 seconds 
                
                 hbase(main):006:0> scan  
                 'studentinfo' 
                
                 ROW                      COLUMN+CELL 
                
                 1                       column=f1:name, timestamp=1436262175823, value=zhangsan 
                
                 2                       column=f1:name, timestamp=1436262183922, value=lisi 
                
                 3                       column=f1:name, timestamp=1436262189250, value=wangwu 
                
                 3 row(s)  
                 in 
                 0.0530 seconds

[Bash shell] 純文本查看復制代碼

 
                 創建索引表 
                
                 hbase(main):007:0> create  
                 'studentinfo-name' 
                 , 
                 'f1' 
                
                 0 row(s)  
                 in 
                 0.7740 seconds 
                
                 => Hbase::Table - studentinfo-name

執行結果

<ignore_js_op>

ITHBASE方案

優點：ITHBase(Indexed Transactional HBase)是HBase的一個事物型的帶索引的擴展。
缺點：需要重構hbase，幾年沒有更新。
http://github.com/hbase-trx/hbase-transactional-tableindexed

IHBASE方案

**優點：**IHBase（Indexed HBase）是HBase的一個擴展，用干支持更快的掃描。
缺點：需要重構hbase。
原理：在Memstore滿了以后刷磁盤時，IHBase會進行攔截請求,並為這個memstore的數據構建索引，索引另一個CF的方式存儲在表內。scan的時候，IHBase會結合索引列中的標記，來加速scan。
http://github.com/ykulbak/ihbase

Coprocessor方案

HIndex–來自華為的HBase二級索引
http://github.com/Huawei-Hadoop/hindex

The solution is 100% Java, compatible with Apache HBase 0.94.8, and is open sourced under ASL.

Following capabilities are supported currently.
1.multiple indexes on table,
2.multi column index,
3.index based on part of a column value,
4.equals and range condition scans using index, and
5.bulk loading data to indexed table (Indexing done with bulk load).

Solr+hbase方案

Solr是一個獨立的企業級搜索應用服務器，它對並提供類似干Web-service的API接口。用戶可以通過http請求，向搜索引擎服務器提交一定格式的XML文件，生成索引；也可以通過Http Get操作提出查找請求，並得到XML格式的返回結果。
Solr是一個高性能，采用Java5開發，基干Lucene的全文搜索服務器。同時對其進行了擴展，提供了比Lucene更為豐富的查詢語言，同時實現了可配置、可擴展並對查詢性能進行了優化，並且提供了一個完善的功能節理界面，是一款非常優秀的全文搜索引擎。

HBase無可置疑擁有其優勢，但其本身只對rowkey支持毫秒級的快速檢索，對於多字段的組合查詢卻無能為力。
基於Solr的HBase多條件查詢原理很簡單，將HBase表中涉及條件過濾的字段和rowkey在Solr中建立索引，通過Solr的多條件查詢快速獲得符合過濾條件的rowkey值，拿到這些rowkey之后在HBASE中通過指定rowkey進行查詢。
<ignore_js_op>

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 HBase的二級索引 hbase創建二級索引 HBase二級索引的設計 [轉]HBASE 二級索引 (轉)HBase二級索引與Join HBase二級索引的設計(案例講解) 基於Solr實現HBase的二級索引 HBase二級索引方案總結 Hbase(三) hbase協處理器與二級索引 HBase 二級索引與Coprocessor協處理器

Hbase 學習（九） 華為二級索引（原理）

免責聲明！

Hbase 學習（九）華為二級索引（原理）