CDH6.1.0 Hbase 批量put数据RegionTooBusyException


【问题描述】

最近在用ogg同步mysql数据到Hbase时,由于某个表操作频繁,大概单表每天会有3000w+的操作记录,客户端报RegionTooBusyException

ogg同步数据流:

  mysql库  ->  ogg  ->  kafka  ->  消费者  ->  hbase

其中ogg通过读取mysql binlog 日志转成特定json发送到kafka,为了保证操作的顺序性,发送到kafka的数据每个表是指定partition的。然后写一个java消费者,将kafka的数据写入hbase。当某个表的操作数据太大时,单条插入hbase的处理效率太低,这时候我们想改用批量或并发put,此时报如下异常:

[2019-10-26 16:34:03][(AsyncRequestFutureImpl.java:765)] INFO id=1, table=P2P.tbBorrowerBill, attempt=8/16, failed=35ops, last exception=org.apache.hadoop.hbase.RegionTooBusyException: org.apache.hadoop.hbase.RegionTooBusyException: StoreTooBusy,P2P.tbBorrowerBill,89900059934,1566331971733.44e09c036996460648ebccc8a1e5f1b7.:DPP Above parallelPutToStoreThreadLimit(10)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1063)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:966)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:929)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2680)
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
on worker03-cdh-prd-bjidc,16020,1571711463301, tracking started null, retrying after=10080ms, replay=35ops

【问题排查】

从报出的异常看,问题是因为并发限制,然后我们减少批量和并发数,虽然不再报这个错了,但是put性能没有提高。

经google后,发现报出此错误的类为StoreHotnessProtector.java,报异常代码片段如下:

if (tooBusyStore != null) {
String msg =
"StoreTooBusy," + this.region.getRegionInfo().getRegionNameAsString() + ":" + tooBusyStore
+ " Above parallelPutToStoreThreadLimit(" + this.parallelPutToStoreThreadLimit + ")";
if (LOG.isTraceEnabled()) {
LOG.trace(msg);
}
throw new RegionTooBusyException(msg);
}

该类的注释如下:

/**
* StoreHotnessProtector is designed to help limit the concurrency of puts with dense columns, it
* does best-effort to avoid exhausting all RS's handlers. When a lot of clients write requests with
* dense (hundreds) columns to a Store at the same time, it will lead to blocking of RS because CSLM
* degrades when concurrency goes up. It's not a kind of throttling. Throttling is user-oriented,
* while StoreHotnessProtector is system-oriented, RS-self-protected mechanism.

* There are three key parameters:

* 1. parallelPutToStoreThreadLimitCheckMinColumnCount: If the amount of columns exceed this
* threshold, the HotProtector will work, 100 by default

* 2. parallelPutToStoreThreadLimit: The amount of concurrency allowed to write puts to a Store at
* the same time.

* 3. parallelPreparePutToStoreThreadLimit: The amount of concurrency allowed to
* prepare writing puts to a Store at the same time.

* Notice that our writing pipeline includes three key process: MVCC acquire, writing MemStore, and
* WAL. Only limit the concurrency of writing puts to Store(parallelPutToStoreThreadLimit) is not
* enough since the actual concurrency of puts may still exceed the limit when MVCC contention or
* slow WAL sync happens. This is why parallelPreparePutToStoreThreadLimit is needed.

* This protector is enabled by default and could be turned off by setting
* hbase.region.store.parallel.put.limit to 0, supporting online configuration change.
*/

通过注释可以最后一段可以看到通过设置参数hbase.region.store.parallel.put.limit可以设置修改region处理的并行限制。

【问题解决】
修改表的配置,执行如下hbase命令

disable 'UserCredit.tbCreditOperationRecord'
alter 'UserCredit.tbCreditOperationRecord', CONFIGURATION => {'hbase.region.store.parallel.put.limit.min.column.count' => 200, 'hbase.region.store.parallel.put.limit' => 100}
enable 'UserCredit.tbCreditOperationRecord'

这里将hbase.region.store.parallel.put.limit.min.column.count参数,线程处理的最小列数由默认100改为200

将hbase.region.store.parallel.put.limit参数,单region处理的并发数由默认10改为100

 

通过上面对hbase表配置修改,我们的集群能够达到单表2000条/秒的put速度,每个put有134列,已经能够满足我们实时同步的需求。

 

【总结】

本次遇到的问题总结起来就是CDH6.1.0默认单region的处理能力限制的较弱,通过修改表配置增加并行数和处理列数即可。


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM