HBase基礎之常用過濾器hbase shell操作


創建表

create 'test1', 'lf', 'sf'

lf: column family of LONG values (binary value)
-- sf: column family of STRING values

導入數據

put 'test1', 'user1|ts1', 'sf:c1', 'sku1'
put 'test1', 'user1|ts2', 'sf:c1', 'sku188'
put 'test1', 'user1|ts3', 'sf:s1', 'sku123'

put 'test1', 'user2|ts4', 'sf:c1', 'sku2'
put 'test1', 'user2|ts5', 'sf:c2', 'sku288'
put 'test1', 'user2|ts6', 'sf:s1', 'sku222'

 

一個用戶(userX),在什么時間(tsX),作為rowkey

對什么產品(value:skuXXX),做了什么操作作為列名,比如,c1: click from homepage; c2: click from ad; s1: search from homepage; b1: buy

查詢案例

誰的值=sku188

scan 'test1', FILTER=>"ValueFilter(=,'binary:sku188')"

ROW                          COLUMN+CELL                                                                       
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188 

誰的值包含88

scan 'test1', FILTER=>"ValueFilter(=,'substring:88')"

ROW                          COLUMN+CELL                                                                       
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188                               
 user2|ts5                   column=sf:c2, timestamp=1409122355030, value=sku288 

通過廣告點擊進來的(column為c2)值包含88的用戶

scan 'test1', FILTER=>"ColumnPrefixFilter('c2') AND ValueFilter(=,'substring:88')"

ROW                          COLUMN+CELL                                                                       
 user2|ts5                   column=sf:c2, timestamp=1409122355030, value=sku288

通過搜索進來的(column為s)值包含123或者222的用戶

scan 'test1', FILTER=>"ColumnPrefixFilter('s') AND ( ValueFilter(=,'substring:123') OR ValueFilter(=,'substring:222') )"

ROW                          COLUMN+CELL                                                                       
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123                               
 user2|ts6                   column=sf:s1, timestamp=1409122355970, value=sku222

rowkey為user1開頭的

scan 'test1', FILTER => "PrefixFilter ('user1')"

ROW                          COLUMN+CELL                                                                       
 user1|ts1                   column=sf:c1, timestamp=1409122354868, value=sku1                                 
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188                               
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123

FirstKeyOnlyFilter: 一個rowkey可以有多個version,同一個rowkey的同一個column也會有多個的值, 只拿出key中的第一個column的第一個version
KeyOnlyFilter: 只要key,不要value

scan 'test1', FILTER=>"FirstKeyOnlyFilter() AND ValueFilter(=,'binary:sku188') AND KeyOnlyFilter()"

ROW                          COLUMN+CELL                                                                       
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=  

從user1|ts2開始,找到所有的rowkey以user1開頭的

scan 'test1', {STARTROW=>'user1|ts2', FILTER => "PrefixFilter ('user1')"}

ROW                          COLUMN+CELL                                                                       
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188                               
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123 

從user1|ts2開始,找到所有的到rowkey以user2開頭

scan 'test1', {STARTROW=>'user1|ts2', STOPROW=>'user2'}

ROW                          COLUMN+CELL                                                                       
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188                               
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123

查詢rowkey里面包含ts3的

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.RowFilter
scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('ts3'))}

ROW                          COLUMN+CELL                                                                       
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123 

查詢rowkey里面包含ts的

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.RowFilter
scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('ts'))}

ROW                          COLUMN+CELL                                                                       
 user1|ts1                   column=sf:c1, timestamp=1409122354868, value=sku1                                 
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188                               
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123                               
 user2|ts4                   column=sf:c1, timestamp=1409122354998, value=sku2                                 
 user2|ts5                   column=sf:c2, timestamp=1409122355030, value=sku288                               
 user2|ts6                   column=sf:s1, timestamp=1409122355970, value=sku222  

加入一條測試數據

put 'test1', 'user2|err', 'sf:s1', 'sku999'

查詢rowkey里面以user開頭的,新加入的測試數據並不符合正則表達式的規則,故查詢不出來

import org.apache.hadoop.hbase.filter.RegexStringComparator
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.RowFilter
scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new('^user\d+\|ts\d+$'))}

ROW                          COLUMN+CELL                                                                       
 user1|ts1                   column=sf:c1, timestamp=1409122354868, value=sku1                                 
 user1|ts2                   column=sf:c1, timestamp=1409122354918, value=sku188                               
 user1|ts3                   column=sf:s1, timestamp=1409122354954, value=sku123                               
 user2|ts4                   column=sf:c1, timestamp=1409122354998, value=sku2                                 
 user2|ts5                   column=sf:c2, timestamp=1409122355030, value=sku288                               
 user2|ts6                   column=sf:s1, timestamp=1409122355970, value=sku222  

加入測試數據

put 'test1', 'user1|ts9', 'sf:b1', 'sku1'

b1開頭的列中並且值為sku1的

scan 'test1', FILTER=>"ColumnPrefixFilter('b1') AND ValueFilter(=,'binary:sku1')"

ROW                          COLUMN+CELL                                                                       
 user1|ts9                   column=sf:b1, timestamp=1409124908668, value=sku1

SingleColumnValueFilter的使用,b1開頭的列中並且值為sku1的

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
scan 'test1', {COLUMNS => 'sf:b1', FILTER => SingleColumnValueFilter.new(Bytes.toBytes('sf'), Bytes.toBytes('b1'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('sku1'))}

ROW                          COLUMN+CELL                                                                       
 user1|ts9                   column=sf:b1, timestamp=1409124908668, value=sku1

 

hbase zkcli 的使用

hbase zkcli
ls /
[hbase, zookeeper]

[zk: hadoop000:
2181(CONNECTED) 1] ls /hbase [meta-region-server, backup-masters, table, draining, region-in-transition, running, table-lock, master, namespace, hbaseid, online-snapshot, replication, splitWAL, recovering-regions, rs]
[zk: hadoop000:
2181(CONNECTED) 2] ls /hbase/table [member, test1, hbase:meta, hbase:namespace]
[zk: hadoop000:
2181(CONNECTED) 3] ls /hbase/table/test1 []
[zk: hadoop000:
2181(CONNECTED) 4] get /hbase/table/test1 ?master:60000}l$??lPBUF cZxid = 0x107 ctime = Wed Aug 27 14:52:21 HKT 2014 mZxid = 0x10b mtime = Wed Aug 27 14:52:22 HKT 2014 pZxid = 0x107 cversion = 0 dataVersion = 2 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 31 numChildren = 0

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM