CDH6.1.0 Hbase 批量put數據RegionTooBusyException


【問題描述】

最近在用ogg同步mysql數據到Hbase時,由於某個表操作頻繁,大概單表每天會有3000w+的操作記錄,客戶端報RegionTooBusyException

ogg同步數據流:

  mysql庫  ->  ogg  ->  kafka  ->  消費者  ->  hbase

其中ogg通過讀取mysql binlog 日志轉成特定json發送到kafka,為了保證操作的順序性,發送到kafka的數據每個表是指定partition的。然后寫一個java消費者,將kafka的數據寫入hbase。當某個表的操作數據太大時,單條插入hbase的處理效率太低,這時候我們想改用批量或並發put,此時報如下異常:

[2019-10-26 16:34:03][(AsyncRequestFutureImpl.java:765)] INFO id=1, table=P2P.tbBorrowerBill, attempt=8/16, failed=35ops, last exception=org.apache.hadoop.hbase.RegionTooBusyException: org.apache.hadoop.hbase.RegionTooBusyException: StoreTooBusy,P2P.tbBorrowerBill,89900059934,1566331971733.44e09c036996460648ebccc8a1e5f1b7.:DPP Above parallelPutToStoreThreadLimit(10)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1063)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:966)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:929)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2680)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
on worker03-cdh-prd-bjidc,16020,1571711463301, tracking started null, retrying after=10080ms, replay=35ops

【問題排查】

從報出的異常看,問題是因為並發限制,然后我們減少批量和並發數,雖然不再報這個錯了,但是put性能沒有提高。

經google后,發現報出此錯誤的類為StoreHotnessProtector.java,報異常代碼片段如下:

if (tooBusyStore != null) {
String msg =
"StoreTooBusy," + this.region.getRegionInfo().getRegionNameAsString() + ":" + tooBusyStore
+ " Above parallelPutToStoreThreadLimit(" + this.parallelPutToStoreThreadLimit + ")";
if (LOG.isTraceEnabled()) {
LOG.trace(msg);
}
throw new RegionTooBusyException(msg);
}

該類的注釋如下:

/**
* StoreHotnessProtector is designed to help limit the concurrency of puts with dense columns, it
* does best-effort to avoid exhausting all RS's handlers. When a lot of clients write requests with
* dense (hundreds) columns to a Store at the same time, it will lead to blocking of RS because CSLM
* degrades when concurrency goes up. It's not a kind of throttling. Throttling is user-oriented,
* while StoreHotnessProtector is system-oriented, RS-self-protected mechanism.

* There are three key parameters:

* 1. parallelPutToStoreThreadLimitCheckMinColumnCount: If the amount of columns exceed this
* threshold, the HotProtector will work, 100 by default

* 2. parallelPutToStoreThreadLimit: The amount of concurrency allowed to write puts to a Store at
* the same time.

* 3. parallelPreparePutToStoreThreadLimit: The amount of concurrency allowed to
* prepare writing puts to a Store at the same time.

* Notice that our writing pipeline includes three key process: MVCC acquire, writing MemStore, and
* WAL. Only limit the concurrency of writing puts to Store(parallelPutToStoreThreadLimit) is not
* enough since the actual concurrency of puts may still exceed the limit when MVCC contention or
* slow WAL sync happens. This is why parallelPreparePutToStoreThreadLimit is needed.

* This protector is enabled by default and could be turned off by setting
* hbase.region.store.parallel.put.limit to 0, supporting online configuration change.
*/

通過注釋可以最后一段可以看到通過設置參數hbase.region.store.parallel.put.limit可以設置修改region處理的並行限制。

【問題解決】
修改表的配置,執行如下hbase命令

disable 'UserCredit.tbCreditOperationRecord'
alter 'UserCredit.tbCreditOperationRecord', CONFIGURATION => {'hbase.region.store.parallel.put.limit.min.column.count' => 200, 'hbase.region.store.parallel.put.limit' => 100}
enable 'UserCredit.tbCreditOperationRecord'

這里將hbase.region.store.parallel.put.limit.min.column.count參數,線程處理的最小列數由默認100改為200

將hbase.region.store.parallel.put.limit參數,單region處理的並發數由默認10改為100

 

通過上面對hbase表配置修改,我們的集群能夠達到單表2000條/秒的put速度,每個put有134列,已經能夠滿足我們實時同步的需求。

 

【總結】

本次遇到的問題總結起來就是CDH6.1.0默認單region的處理能力限制的較弱,通過修改表配置增加並行數和處理列數即可。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM