我們知道,hbase表可以設置一個至多個列簇(column families),但是為什么說越少的列簇越好呢?
官網原文:
HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed even though the amount of data they carry is small. When many column families exist the flushing and compaction interaction can make for a bunch of needless i/o (To be addressed by changing flushing and compaction to work on a per column family basis).
回顧下hbase表,每張表會切分為多個region,每個region也就是表的一部分子集數據,region會分散到hbase 集群regionserver上;
region中每個columnFamily的數據組成一個Store。每個Store由一個Memstore和多個HFile組成(一個列簇對應一個memstore和N個HFile);
在達到flush條件時候,每個memstore都會flush生成一個HFile文件;另外隨着HFile文件的生成,后台minorCompact線程會觸發合並HFile文件;
重點來了!flush和compact都是在region的基礎上進行的!!!
比如在flush時候,如果有多個memstore(多個列簇),只要有一個memstore達到flush條件,其他的memstore即使數據很小也要跟着執行flush,這也就導致了很多不必要的I/O開銷。觸發flush的條件如下:
- Memstore級別限制:當Region中任意一個MemStore的大小達到了上限(hbase.hregion.memstore.flush.size,默認128MB),會觸發Memstore刷新。
- Region級別限制:當Region中所有Memstore的大小總和達到了上限(hbase.hregion.memstore.block.multiplier * hbase.hregion.memstore.flush.size,默認 2* 128M = 256M),會觸發memstore刷新。
- Region Server級別限制:當一個Region Server中所有Memstore的大小總和達到了上限(hbase.regionserver.global.memstore.upperLimit * hbase_heapsize,默認 40%的JVM內存使用量),會觸發部分Memstore刷新。Flush順序是按照Memstore由大到小執行,先Flush Memstore最大的Region,再執行次大的,直至總體Memstore內存使用量低於閾值(hbase.regionserver.global.memstore.lowerLimit * hbase_heapsize,默認 38%的JVM內存使用量)。
- 當一個Region Server中HLog數量達到上限(可通過參數hbase.regionserver.maxlogs配置)時,系統會選取最早的一個 HLog對應的一個或多個Region進行flush
- HBase定期刷新Memstore:默認周期為1小時,確保Memstore不會長時間沒有持久化。為避免所有的MemStore在同一時間都進行flush導致的問題,定期的flush操作有20000左右的隨機延時。
同樣在compact時候,由於是建立在region的基礎上,同樣會產生不必要的I/O開銷,觸發compcat(minor_compact)條件:
hbase.hstore.compactionThreshold Description If more than this number of HStoreFiles in any one HStore (one HStoreFile is written per flush of memstore) then a compaction is run to rewrite all HStoreFiles files as one. Larger numbers put off compaction but when it runs, it takes longer to complete. default 3
Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows). If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA’s data will likely be spread across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient.
另外,如果一個表中存在多個列族,請注意數據量(即,行數)。如果ColumnFamilyA有100萬行,而ColumnFamilyB有10億行,ColumnFamilyA的數據很可能分布在許多許多regions(和regionservers)。這使得ColumnFamilyA的大規模scan效率降低。(我們知道hbase split是由參數hbase.hregion.max.filesize值來控制的,但是,觸發region split不是說該region下所有的HFile文件大小達到這個值就會觸發split,而是region下某個HFile文件達到了這個值才會執行split,也就是說這里ColumnFamilyB在做split時候,ColumnFamilyA的數據量還很小很小,但是也會被帶着執行split,這也就會導致更多的HDFS小文件,並且分散到更多的region和regionservers上)