舊 的 HBase 接口邏輯與傳統 JDBC 方式很不相同,新的接口與傳統 JDBC 的邏輯更加相像,具有更加清晰的 Connection 管理方式。
同時,在舊的接口中,客戶端何時將 Put 寫到服務端也需要設置,一個 Put 馬上寫到服務端,還是攢到一批寫到服務端,新用戶往往對此不太清楚。
在新的接口中,引入了 BufferedMutator,可以提供更加高效清晰的寫操作。
HBase 0.98 與 HBase 1.0 接口名稱對比
舉一個例子,舊的 API 寫入操作的代碼:
新的 API 寫入操作的代碼:
可以看到,在操作前,首先建立連接,然后拿到一個對應表的句柄,之后再進行一系列操作。以上兩個是同步寫操作。
下面看一下批量異步寫入接口:
org.apache.hadoop.hbase.client.BufferedMutator主要用來對HBase的單個表進行操作。它和Put類的作用差不多,但是主要用來實現批量的異步寫操作。
BufferedMutator替換了HTable的setAutoFlush(false)的作用。
可以從Connection的實例中獲取BufferedMutator的實例。在使用完成后需要調用close()方法關閉連接。對BufferedMutator進行配置需要通過BufferedMutatorParams完成。
MapReduce Job的是BufferedMutator使用的典型場景。MapReduce作業需要批量寫入,但是無法找到恰當的點執行flush。
BufferedMutator接收MapReduce作業發送來的Put數據后,會根據某些因素(比如接收的Put數據的總量)啟發式地執行Batch Put操作,且會異步的提交Batch Put請求,這樣MapReduce作業的執行也不會被打斷。
BufferedMutator也可以用在一些特殊的情況上。MapReduce作業的每個線程將會擁有一個獨立的BufferedMutator對象。
一個獨立的BufferedMutator也可以用在大容量的在線系統上來執行批量Put操作,但是這時需要注意一些極端情況比如JVM異常或機器故障,此時有可能造成數據丟失。
官方源碼路徑:/hbase-2.0.4/hbase-examples/src/main/java/org/apache/hadoop/hbase/client/example/BufferedMutatorExample.java
/** * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.hbase.client.example; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.concurrent.Callable; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; import java.util.concurrent.TimeoutException; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.BufferedMutator; import org.apache.hadoop.hbase.client.BufferedMutatorParams; import org.apache.hadoop.hbase.client.Connection; import org.apache.hadoop.hbase.client.ConnectionFactory; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.yetus.audience.InterfaceAudience; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * An example of using the {@link BufferedMutator} interface. */ @InterfaceAudience.Private public class BufferedMutatorExample extends Configured implements Tool { private static final Logger LOG = LoggerFactory.getLogger(BufferedMutatorExample.class); private static final int POOL_SIZE = 10; private static final int TASK_COUNT = 100; private static final TableName TABLE = TableName.valueOf("foo"); private static final byte[] FAMILY = Bytes.toBytes("f"); @Override public int run(String[] args) throws InterruptedException, ExecutionException, TimeoutException { /** a callback invoked when an asynchronous write fails. */ final BufferedMutator.ExceptionListener listener = new BufferedMutator.ExceptionListener() { @Override public void onException(RetriesExhaustedWithDetailsException e, BufferedMutator mutator) { for (int i = 0; i < e.getNumExceptions(); i++) { LOG.info("Failed to sent put " + e.getRow(i) + "."); } } }; BufferedMutatorParams params = new BufferedMutatorParams(TABLE) .listener(listener); // // step 1: create a single Connection and a BufferedMutator, shared by all worker threads. // try (final Connection conn = ConnectionFactory.createConnection(getConf()); final BufferedMutator mutator = conn.getBufferedMutator(params)) { /** worker pool that operates on BufferedTable instances */ final ExecutorService workerPool = Executors.newFixedThreadPool(POOL_SIZE); List<Future<Void>> futures = new ArrayList<>(TASK_COUNT); for (int i = 0; i < TASK_COUNT; i++) { futures.add(workerPool.submit(new Callable<Void>() { @Override public Void call() throws Exception { // // step 2: each worker sends edits to the shared BufferedMutator instance. They all use // the same backing buffer, call-back "listener", and RPC executor pool. // Put p = new Put(Bytes.toBytes("someRow")); p.addColumn(FAMILY, Bytes.toBytes("someQualifier"), Bytes.toBytes("some value")); mutator.mutate(p); // do work... maybe you want to call mutator.flush() after many edits to ensure any of // this worker's edits are sent before exiting the Callable return null; } })); } // // step 3: clean up the worker pool, shut down. // for (Future<Void> f : futures) { f.get(5, TimeUnit.MINUTES); } workerPool.shutdown(); } catch (IOException e) { // exception while creating/destroying Connection or BufferedMutator LOG.info("exception while creating/destroying Connection or BufferedMutator", e); } // BufferedMutator.close() ensures all work is flushed. Could be the custom listener is // invoked from here. return 0; } public static void main(String[] args) throws Exception { ToolRunner.run(new BufferedMutatorExample(), args); } }