HBase的TTL介紹


1. 定義

TTL(Time to Live) 用於限定數據的超時時間。  

2.原理 

以Column Family的TTL為例介紹,

hbase(main):001:0> desc 'wxy:test'
Table wxy:test is ENABLED                                                                               
wxy:test                                                                                                
COLUMN FAMILIES DESCRIPTION                                                                             
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS =
> '2', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOC
KSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                           
{NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSIO
N => 'NONE', VERSIONS => '5', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOC
KSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                           
2 row(s) in 0.9730 seconds

CF默認的TTL值是FOREVER,也就是永不過期。

  • 修改TTL的值,CF的TTL的值以秒為單位:

hbase(main):003:0> disable 'wxy:test'
0 row(s) in 1.3500 seconds

hbase(main):004:0> alter 'wxy:test', {NAME=>'f1', TTL => '100'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1780 seconds

hbase(main):002:0> desc 'wxy:test'
Table wxy:test is DISABLED
wxy:test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS =
> '2', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOC
KSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSIO
N => 'NONE', VERSIONS => '5', TTL => '100 SECONDS (1 MINUTE 40 SECOND)', MIN_VERSIONS => '0', KEEP_DELET
ED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
2 row(s) in 0.0680 seconds

hbase(main):003:0> enable 'wxy:test'
0 row(s) in 0.2460 seconds

 

  • scan現有的值:
hbase(main):007:0> scan 'wxy:test'
ROW                         COLUMN+CELL                                                                 
 r1                         column=cf:name, timestamp=1503047499079, value=lisi4                        
 r1                         column=cf:sex, timestamp=1502788726648, value=male                          
 r2                         column=cf:age, timestamp=1503041691183, value=20                            
 r3                         column=cf:age, timestamp=1503041723715, value=23                            
 r4                         column=cf:name, timestamp=1503041738224, value=Alex                         
4 row(s) in 0.1140 seconds

 

  • 更新表
hbase(main):007:0> put 'wxy:test' ,'r4','f1:address','shandi'
0 row(s) in 0.2590 seconds

hbase(main):008:0> scan 'wxy:test'
ROW                         COLUMN+CELL                                                                 
 r1                         column=cf:name, timestamp=1503047499079, value=lisi4                        
 r1                         column=cf:sex, timestamp=1502788726648, value=male                          
 r2                         column=cf:age, timestamp=1503041691183, value=20                            
 r3                         column=cf:age, timestamp=1503041723715, value=23                            
 r4                         column=cf:name, timestamp=1503041738224, value=Alex                         
 r4                         column=f1:address, timestamp=1505976958276, value=shandi                    
4 row(s) in 0.0680 seconds
  • 過30秒后掃描表   
hbase(main):012:0> scan 'wxy:test'
ROW                         COLUMN+CELL                                                                 
 r1                         column=cf:name, timestamp=1503047499079, value=lisi4                        
 r1                         column=cf:sex, timestamp=1502788726648, value=male                          
 r2                         column=cf:age, timestamp=1503041691183, value=20                            
 r3                         column=cf:age, timestamp=1503041723715, value=23                            
 r4                         column=cf:name, timestamp=1503041738224, value=Alex                         
 r4                         column=f1:address, timestamp=1505976958276, value=shandi                    
4 row(s) in 0.0460 seconds

hbase(main):013:0> scan 'wxy:test'
ROW                         COLUMN+CELL                                                                 
 r1                         column=cf:name, timestamp=1503047499079, value=lisi4                        
 r1                         column=cf:sex, timestamp=1502788726648, value=male                          
 r2                         column=cf:age, timestamp=1503041691183, value=20                            
 r3                         column=cf:age, timestamp=1503041723715, value=23                            
 r4                         column=cf:name, timestamp=1503041738224, value=Alex                         
 r4                         column=f1:address, timestamp=1505976958276, value=shandi                    
4 row(s) in 0.0390 seconds

如上,連續掃描兩次,數據沒有變化

  • 過100秒后掃描表
hbase(main):019:0> scan 'wxy:test'
ROW                         COLUMN+CELL                                                                 
 r1                         column=cf:name, timestamp=1503047499079, value=lisi4                        
 r1                         column=cf:sex, timestamp=1502788726648, value=male                          
 r2                         column=cf:age, timestamp=1503041691183, value=20                            
 r3                         column=cf:age, timestamp=1503041723715, value=23                            
 r4                         column=cf:name, timestamp=1503041738224, value=Alex                         
4 row(s) in 0.0280 seconds

  發現r4的f1不見了。這就是TTL的工作原理。

     TTL=>的更新超時時間是指:該列最后更新的時間,到超時時間的限制,而不是第一次創建,到超時時間;

     同時我們也注意到100秒后r4被刪除,但是只刪除掉了r1的f1列,如果r1有其他列,比如cf,則其他列保留,TTL的概念只針對CELL

        

     如果一個Store file僅包括過期的rows, minor comact的時候會將這些文件刪掉(可以參見HBase compact)。將hbase.store.delete.expired.storefile 設置成false或者將minimum number of versions 設置成除0意外的值可以將這個feature diable掉。number of versions的默認值是0:

hbase(main):001:0> desc 'wxy:test'
Table wxy:test is ENABLED                                                                               
wxy:test                                                                                                
COLUMN FAMILIES DESCRIPTION                                                                             
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS =
> '2', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOC
KSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                           
{NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSIO
N => 'NONE', VERSIONS => '5', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOC
KSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                           
2 row(s) in 0.9730 seconds

   

     注意:修改表結構之前,需要先disable 表,否則表中的記錄被清空!HBase不disable直接去alter 表是可以的! 參加如下測試過程:

hbase(main):004:0> scan 'test'
ROW                         COLUMN+CELL row1 column=cf:a, timestamp=1500967679327, value=value1 row2 column=cf:b, timestamp=1500967692945, value=value2 row3 column=cf:c, timestamp=1500967715743, value=value3 3 row(s) in 0.2490 seconds hbase(main):005:0> desc 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS = > '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOC KSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.0880 seconds hbase(main):006:0> alter 'test',{NAME => 'cf',TTL => '100'} Updating all regions with the new schema... 0/1 regions updated. 1/1 regions updated. Done. 0 row(s) in 2.2200 seconds hbase(main):007:0> scan 'test' ROW COLUMN+CELL 0 row(s) in 0.0190 seconds

 

3. 粒度

早期版本控制粒度是column family; 新版本因為Cell可以支持tag了,所以可以在cell級別設置TTL了。待考證)

( 參見http://hbase.apache.org/book.html#ttl 及https://issues.apache.org/jira/browse/HBASE-10560)

Cell的TTL與Column family的TTL區別:

  • Column family的TTL以秒為單位,cell的TTL以毫秒為單位
  • 如果有有cell級別的TTL,則cell的TTL override CF的TTL; 但是不能超出CF級別的TTL

       以下引自:http://hbase.apache.org/book.html#ttl 

  • Cell TTLs are expressed in units of milliseconds instead of seconds.

  • A cell TTLs cannot extend the effective lifetime of a cell beyond a ColumnFamily level TTL setting.

       以下引自:https://issues.apache.org/jira/browse/HBASE-10560 作者的comments:

We can keep the existing column level definition and enforcement mechanism and extend it to look for a TTL cell tag during compaction. If one is found, it can override the CF setting. TTL overrides can be passed up to the server in an operation attribute.

 

參考文獻:

http://blog.csdn.net/wulantian/article/details/41010947

http://hbase.apache.org/book.html#ttl

https://issues.apache.org/jira/browse/HBASE-10560

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM