hbase region split操作的一些細節,具體split步驟很多文檔都有說明,本文主要關注regionserver如何選取split point
首先推薦web ui查看hbase region分布的一個開源工具hannibal,建議用daemontool管理hannibal意外退出,自動重啟,之前博文寫了博文介紹如何使用daemontool管理
假設有一張hbase的table如下表所示,有一個region的大小比較大,可以對這個region進行手動split操作
HBase的物理存儲樹狀圖如下
Table (HBase table) Region (Regions for the table) Store (Store per ColumnFamily for each Region for the table) MemStore (MemStore for each Store for each Region for the table) StoreFile (StoreFiles for each Store for each Region for the table) Block (Blocks within a StoreFile within a Store for each Region for the table)
一種常見的分裂策略是:ConstantSizeRegionSplitPolicy,配置hbase.hregion.max.filesize是指某個store(對應一個column family)的大小
/<hdfs-dir>/<hbasetable>/<xxx(part of region-id)>/<columu-family>
memstore flush到store files時,或者多個store files compact操作時候,會判斷是否需要split。
找到最大且不包含reference的store,在這個store下面找到最大的storefile,然后用這個storefile的中間rowkey作為split的點。
RegionSplitPolicy.java Iterator i$ = stores.values().iterator(); while(i$.hasNext()) { Store s = (Store)i$.next(); byte[] splitPoint = s.getSplitPoint(); long storeSize = s.getSize(); if(splitPoint != null && largestStoreSize < storeSize) { splitPointFromLargestStore = splitPoint; largestStoreSize = storeSize; } }
Store.java public byte[] getSplitPoint() { long e = 0L; StoreFile largestSf = null; Iterator r = this.storefiles.iterator(); StoreFile midkey; while (r.hasNext()) { midkey = (StoreFile) r.next(); org.apache.hadoop.hbase.regionserver.StoreFile.Reader mk; if (midkey.isReference()) { assert false : "getSplitPoint() called on a region that can\'t split!"; mk = null; return (byte[]) mk; } mk = midkey.getReader(); if (mk == null) { LOG.warn("Storefile " + midkey + " Reader is null"); } else { long fk = mk.length(); if (fk > e) { e = fk; largestSf = midkey; } } } org.apache.hadoop.hbase.regionserver.StoreFile.Reader r1 = largestSf.getReader(); if (r1 == null) { LOG.warn("Storefile " + largestSf + " Reader is null"); midkey = null; return (byte[]) midkey; } byte[] midkey1 = r1.midkey(); //...略 }
所以split實際上並不是完全的等分,因為split point不一定是數據分布的中位點。
參考:
http://blog.javachen.com/2014/01/16/hbase-region-split-policy.html
http://www.cnblogs.com/niurougan/articles/3975463.html
http://hbase.group.iteye.com/group/topic/40359