前言
1. 創建表:(由master完成)
- 首先需要獲取master地址(master啟動時會將地址告訴zookeeper)因而客戶端首先會訪問zookeeper獲取master的地址
- client和master通信,然后有master來創建表(包括表的列簇,是否cache,設置存儲的最大版本數,是否壓縮等)。
2. 讀寫刪除數據
- client與regionserver通信,讀寫、刪除數據
- 寫入和刪除數據時講數據打上不同的標志append,真正的數據刪除操作在compact時發生
3. 版本信息
API基礎知識
CRUD操作:
put:插入單行或者多行
get: 讀取數據或者使用scan()
delete:刪除數據
batch(): 批量處理操作
scan:
類似於數據庫中的游標cursor
HTable常用方法:
void close(): 用完一個HTable實例后需要調用一次close(),(這個方法會隱式的調用flushCache方法)
byte[] getTableName() 獲取表名
Configuration getConfiguration(): 獲取HTable實例中的配置
HTableDescriptor getTableDescriptor(): 獲取表的結構
static boolean isTableEnabled(table): 查看表是否禁用
configuration
HbaseConfiguration, 表示HBase的配置信息
兩種構造函數如下:
public HBaseConfiguration() -----------默認的構造方式會從hbase-default.xml和hbase-site.xml中讀取配置,如果classpath中沒有這兩個文件,需要自己配置
public HBaseConfiguration(final Configuration c)
eg:
static Configuration cfg = HBaseConfiguration.create(); static { cfg.set("hbase.zookeeper.quorum", "192.168.1.95"); cfg.set("hbase.zookeeper.property.clientPort", "2181"); }
注意: new HBaseConfiguration()方式已經啟用,不建議使用下面方式。
static HBaseConfiguration cfg = null;
static {
Configuration HBASE_CONFIG = new Configuration();
HBASE_CONFIG.set("hbase.zookeeper.quorum", "192.168.1.95");
HBASE_CONFIG.set("hbase.zookeeper.property.clientPort", "2181");
cfg = new HBaseConfiguration(HBASE_CONFIG);
}
創建表
使用HBaseAdmin對象的createTable方法
eg:
public static void createTable(String tableName) { System.out.println("************start create table**********"); try { HBaseAdmin hBaseAdmin = new HBaseAdmin(cfg); if (hBaseAdmin.tableExists(tableName)) {// 如果存在要創建的表,那么先刪除,再創建 hBaseAdmin.disableTable(tableName); hBaseAdmin.deleteTable(tableName); System.out.println(tableName + " is exist"); } HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);// 代表表的schema tableDescriptor.addFamily(new HColumnDescriptor("name")); //增加列簇 tableDescriptor.addFamily(new HColumnDescriptor("age")); tableDescriptor.addFamily(new HColumnDescriptor("gender")); hBaseAdmin.createTable(tableDescriptor); } catch (MasterNotRunningException e) { e.printStackTrace(); } catch (ZooKeeperConnectionException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } System.out.println("*****end create table*************"); }
public static void main(String[] agrs) { try { String tablename = "wishTest"; HBaseTest.createTable(tablename); } catch (Exception e) { e.printStackTrace(); } }
日志信息如下:
************start create table**********
14/05/18 14:14:22 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/05/18 14:14:22 INFO zookeeper.ZooKeeper: Client environment:host.name=LJ-PC
14/05/18 14:14:22 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_11
14/05/18 14:14:22 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
14/05/18 14:14:22 INFO zookeeper.ZooKeeper: Client environment:java.home=D:\java\jdk1.6.0_11\jre
14/05/18 14:14:22 INFO zookeeper.ZooKeeper: Client environment:java.class.path=....
...
14/05/18 14:14:22 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6560@LJ-PC
14/05/18 14:14:22 INFO zookeeper.ClientCnxn: Socket connection established to hadoop/192.168.1.95:2181, initiating session
14/05/18 14:14:22 INFO zookeeper.ClientCnxn: Session establishment complete on server hadoop/192.168.1.95:2181, sessionid = 0x460dd23bda0007, negotiated timeout = 180000
14/05/18 14:14:22 INFO zookeeper.ZooKeeper: Session: 0x460dd23bda0007 closed
14/05/18 14:14:22 INFO zookeeper.ClientCnxn: EventThread shut down
*****end create table*************
在centos中查看是否創建成功:
網頁上查看:
HTableDescriptor其他方法如下:
- setMaxFileSize,指定最大的region size
- setMemStoreFlushSize 指定memstore flush到HDFS上的文件大小,默認是64M
- public void addFamily(final HColumnDescriptor family)
HColumnDescriptor 其他方法如下:
- setTimeToLive:指定最大的TTL,單位是ms,過期數據會被自動刪除。
- setInMemory:指定是否放在內存中,對小表有用,可用於提高效率。默認關閉
- setBloomFilter:指定是否使用BloomFilter,可提高隨機查詢效率。默認關閉
- setCompressionType:設定數據壓縮類型。默認無壓縮。
- setScope(scope):集群的Replication,默認為flase
- setBlocksize(blocksize); block的大小默認是64kb,block小適合隨機讀,但是可能導Index過大而使內存oom, block大利於順序讀。
- setMaxVersions:指定數據最大保存的版本個數。默認為3。版本數最多為Integer.MAX_VALUE, 但是版本數過多可能導致compact時out of memory。
- setBlockCacheEnabled:是否可以cache, 默認設置為true,將最近讀取的數據所在的Block放入內存中,標記為single,若下次讀命中則將其標記為multi
插入數據
使用HTable獲取table 注意:HTable不是線程安全的,因此當多線程插入數據的時候推薦使用HTablePool
使用put插入數據,可以單條插入數據和批量插入數據,put方法如下:
public void put(final Put put) throws IOException
public void put(final List<Put> puts) throws IOException
put 常用方法:
- add:增加一個Cell
- setTimeStamp:指定所有cell默認的timestamp,如果一個Cell沒有指定timestamp,就會用到這個值。如果沒有調用,HBase會將當前時間作為未指定timestamp的cell的timestamp.
- setWriteToWAL: WAL是Write Ahead Log的縮寫,指的是HBase在插入操作前是否寫Log。默認是打開,關掉會提高性能,但是如果系統出現故障(負責插入的Region Server掛掉),數據可能會丟失。
下面兩個方法會影響插入性能
- setAutoFlash:
AutoFlush指的是在每次調用HBase的Put操作,是否提交到HBase Server。默認是true,每次會提交。如果此時是單條插入,就會有更多的IO,從而降低性能。進行大量Put時,HTable的setAutoFlush最好設置為flase。否則每執行一個Put就需要和RegionServer發送一個請求。如果autoFlush = false,會等到寫緩沖填滿才會發起請求。顯式的發起請求,可以調用flushCommits。HTable的close操作也會發起flushCommits
- setWriteBufferSize:
Write Buffer Size在AutoFlush為false的時候起作用,默認是2MB,也就是當插入數據超過2MB,就會自動提交到Server
eg:
public static void insert(String tableName) { System.out.println("************start insert ************"); HTablePool pool = new HTablePool(cfg, 1000); Put put = new Put("1".getBytes());// 一個PUT代表一行數據,每行一個唯一的ROWKEY,此處rowkey為1 put.add("name".getBytes(), null, "wish".getBytes());// 本行數據的第一列 put.add("age".getBytes(), null, "20".getBytes());// 本行數據的第三列 put.add("gender".getBytes(), null, "female".getBytes());// 本行數據的第三列 try { pool.getTable(tableName).put(put); } catch (IOException e) { e.printStackTrace(); } System.out.println("************end insert************"); }
public static void main(String[] agrs) { try { String tablename = "wishTest"; HBaseTest.insert(tablename); } catch (Exception e) { e.printStackTrace(); } }
日志信息如下:
************start insert ************
14/05/18 15:01:17 WARN hbase.HBaseConfiguration: instantiating HBaseConfiguration() is deprecated. Please use HBaseConfiguration#create() to construct a plain Configuration
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:host.name=LJ-PC
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_11
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:java.home=D:\java\jdk1.6.0_11\jre
.....
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:os.name=Windows Vista
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:os.arch=x86
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:os.version=6.1
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:user.name=root
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:user.home=C:\Users\LJ
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Client environment:user.dir=D:\java\eclipse4.3-jee-kepler-SR1-win32\workspace\hadoop
14/05/18 15:01:17 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.1.95:2181 sessionTimeout=180000 watcher=hconnection
14/05/18 15:01:17 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 1252@LJ-PC
14/05/18 15:01:17 INFO zookeeper.ClientCnxn: Opening socket connection to server hadoop/192.168.1.95:2181. Will not attempt to authenticate using SASL (無法定位登錄配置)
14/05/18 15:01:17 INFO zookeeper.ClientCnxn: Socket connection established to hadoop/192.168.1.95:2181, initiating session
14/05/18 15:01:17 INFO zookeeper.ClientCnxn: Session establishment complete on server hadoop/192.168.1.95:2181, sessionid = 0x460dd23bda000b, negotiated timeout = 180000
************end insert************
查看插入結果:
查詢數據
分為單條查詢和批量查詢,單條查詢通過get查詢。 通過HTable的getScanner實現批量查詢
public Result get(final Get get) //單條查詢
public ResultScanner getScanner(final Scan scan) //批量查詢
eg:單條查詢:
public static void querySingle(String tableName) { HTablePool pool = new HTablePool(cfg, 1000); try { Get get = new Get("1".getBytes());// 根據rowkey查詢 Result r = pool.getTable(tableName).get(get); System.out.println("rowkey:" + new String(r.getRow())); for (KeyValue keyValue : r.raw()) { System.out.println("列:" + new String(keyValue.getFamily()) + " 值:" + new String(keyValue.getValue())); } } catch (IOException e) { e.printStackTrace(); } }
public static void main(String[] agrs) { try { String tablename = "wishTest"; HBaseTest.querySingle(tablename); } catch (Exception e) { e.printStackTrace(); } }
查詢結果:
rowkey:1
列:age 值:20
列:gender 值:female
列:name 值:wish
eg:批量查詢:
public static void queryAll(String tableName) { HTablePool pool = new HTablePool(cfg, 1000); try { ResultScanner rs = pool.getTable(tableName).getScanner(new Scan()); for (Result r : rs) { System.out.println("rowkey:" + new String(r.getRow())); for (KeyValue keyValue : r.raw()) { System.out.println("列:" + new String(keyValue.getFamily()) + " 值:" + new String(keyValue.getValue())); } } } catch (IOException e) { e.printStackTrace(); } }
public static void main(String[] agrs) { try { String tablename = "wishTest"; HBaseTest.queryAll(tablename); } catch (Exception e) { e.printStackTrace(); } }
結果如下:
rowkey:1
列:age 值:20
列:gender 值:female
列:name 值:wish
rowkey:112233bbbcccc
列:age 值:20
列:gender 值:female
列:name 值:wish
rowkey:2
列:age 值:20
列:gender 值:female
列:name 值:rain
刪除數據
使用HTable的delete刪除數據:
public void delete(final Delete delete)
eg:
public static void deleteRow(String tablename, String rowkey) { try { HTable table = new HTable(cfg, tablename); List list = new ArrayList(); Delete d1 = new Delete(rowkey.getBytes()); list.add(d1); table.delete(list); System.out.println("刪除行成功!"); } catch (IOException e) { e.printStackTrace(); } } public static void main(String[] agrs) { try { String tablename = "wishTest"; System.out.println("****************************刪除前數據**********************"); HBaseTest.queryAll(tablename); HBaseTest.deleteRow(tablename,"112233bbbcccc"); System.out.println("****************************刪除后數據**********************"); HBaseTest.queryAll(tablename); } catch (Exception e) { e.printStackTrace(); } }
結果如下:
****************************刪除前數據**********************
rowkey:1
列:age 值:20
列:gender 值:female
列:name 值:wish
rowkey:112233bbbcccc
列:age 值:20
列:gender 值:female
列:name 值:wish
rowkey:2
列:age 值:20
列:gender 值:female
列:name 值:rain
刪除行成功!
****************************刪除后數據**********************
rowkey:1
列:age 值:20
列:gender 值:female
列:name 值:wish
rowkey:2
列:age 值:20
列:gender 值:female
列:name 值:rain
刪除表
和hbase shell中類似,刪除表前需要先disable表;分別使用disableTable和deleteTable來刪除和禁用表
同創建表一樣需要使用HbaseAdmin
eg:
public static void dropTable(String tableName) { try { HBaseAdmin admin = new HBaseAdmin(cfg); admin.disableTable(tableName); admin.deleteTable(tableName); System.out.println("table: "+tableName+"刪除成功!"); } catch (MasterNotRunningException e) { e.printStackTrace(); } catch (ZooKeeperConnectionException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } public static void main(String[] agrs) { try { String tablename = "wishTest"; HBaseTest.dropTable(tablename); } catch (Exception e) { e.printStackTrace(); } }
結果:
table: wishTest刪除成功!
補充
1. 想已經創建的表中添加列簇時可以使用HBaseAdmin的addColumn方法
eg:注意要先disable表
/* * 向已經存在的表中添加列 ,需要先disable表 */ public static void addMyColumn(String tableName,String columnFamily){ System.out.println("************start add column ************"); HBaseAdmin hBaseAdmin = null; try { hBaseAdmin = new HBaseAdmin(cfg); hBaseAdmin.disableTable(tableName); HColumnDescriptor hd = new HColumnDescriptor(columnFamily); hBaseAdmin.addColumn(tableName,hd); } catch (MasterNotRunningException e) { e.printStackTrace(); } catch (ZooKeeperConnectionException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }finally{ try { hBaseAdmin.enableTable(tableName); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } System.out.println("************end add Column ************"); }
2. 插入數據時的Put.add 中可以指定qualifier,即列簇中的列
eg:
put.add("score".getBytes(), "Math".getBytes(), "90".getBytes());// 本行數據的第四列
public static void insert(String tableName) { System.out.println("************start insert ************"); HTablePool pool = new HTablePool(cfg, 1000); // HTable table = (HTable) pool.getTable(tableName); Put put = new Put("6".getBytes());// 一個PUT代表一行數據,再NEW一個PUT表示第二行數據,每行一個唯一的ROWKEY,此處rowkey為put構造方法中傳入的值 put.add("name".getBytes(), null, "Joey".getBytes());// 本行數據的第一列 put.add("age".getBytes(), null, "20".getBytes());// 本行數據的第三列 put.add("gender".getBytes(), null, "male".getBytes());// 本行數據的第三列 put.add("score".getBytes(), "Math".getBytes(), "90".getBytes());// 本行數據的第四列 put.add("score".getBytes(), "English".getBytes(), "100".getBytes());// 本行數據的第四列 put.add("score".getBytes(), "Chinese".getBytes(), "100".getBytes());// 本行數據的第四列 第二個參數對應qualifier try { pool.getTable(tableName).put(put); } catch (IOException e) { e.printStackTrace(); } System.out.println("************end insert************"); }
完整代碼
package wish.hbase; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.MasterNotRunningException; import org.apache.hadoop.hbase.ZooKeeperConnectionException; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; //import org.apache.hadoop.hbase.io.BatchUpdate; public class HBaseCRUD { static Configuration cfg = HBaseConfiguration.create(); static { cfg.set("hbase.zookeeper.quorum", "192.168.1.95"); cfg.set("hbase.zookeeper.property.clientPort", "2181"); } /** * 創建一張表 */ public static void createTable(String tableName) { System.out.println("************start create table**********"); try { HBaseAdmin hBaseAdmin = new HBaseAdmin(cfg); if (hBaseAdmin.tableExists(tableName)) {// 如果存在要創建的表,那么先刪除,再創建 hBaseAdmin.disableTable(tableName); hBaseAdmin.deleteTable(tableName); System.out.println(tableName + " is exist"); } HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);// 代表表的schema tableDescriptor.addFamily(new HColumnDescriptor("name")); // 增加列簇 tableDescriptor.addFamily(new HColumnDescriptor("age")); tableDescriptor.addFamily(new HColumnDescriptor("gender")); hBaseAdmin.createTable(tableDescriptor); } catch (MasterNotRunningException e) { e.printStackTrace(); } catch (ZooKeeperConnectionException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } System.out.println("*****end create table*************"); } /* * 向已經存在的表中添加列 ,需要先disable表 */ public static void addMyColumn(String tableName,String columnFamily){ System.out.println("************start add column ************"); HBaseAdmin hBaseAdmin = null; try { hBaseAdmin = new HBaseAdmin(cfg); hBaseAdmin.disableTable(tableName); HColumnDescriptor hd = new HColumnDescriptor(columnFamily); hBaseAdmin.addColumn(tableName,hd); } catch (MasterNotRunningException e) { e.printStackTrace(); } catch (ZooKeeperConnectionException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }finally{ try { hBaseAdmin.enableTable(tableName); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } System.out.println("************end add Column ************"); } /* * 插入數據 */ public static void insert(String tableName) { System.out.println("************start insert ************"); HTablePool pool = new HTablePool(cfg, 1000); // HTable table = (HTable) pool.getTable(tableName); Put put = new Put("6".getBytes());// 一個PUT代表一行數據,再NEW一個PUT表示第二行數據,每行一個唯一的ROWKEY,此處rowkey為put構造方法中傳入的值 put.add("name".getBytes(), null, "Joey".getBytes());// 本行數據的第一列 put.add("age".getBytes(), null, "20".getBytes());// 本行數據的第三列 put.add("gender".getBytes(), null, "male".getBytes());// 本行數據的第三列 put.add("score".getBytes(), "Math".getBytes(), "90".getBytes());// 本行數據的第四列 put.add("score".getBytes(), "English".getBytes(), "100".getBytes());// 本行數據的第四列 put.add("score".getBytes(), "Chinese".getBytes(), "100".getBytes());// 本行數據的第四列 第二個參數對應qualifier try { pool.getTable(tableName).put(put); } catch (IOException e) { e.printStackTrace(); } System.out.println("************end insert************"); } /* * 查詢所有數據 */ public static void queryAll(String tableName) { HTablePool pool = new HTablePool(cfg, 1000); try { ResultScanner rs = pool.getTable(tableName).getScanner(new Scan()); for (Result r : rs) { System.out.println("rowkey:" + new String(r.getRow())); for (KeyValue keyValue : r.raw()) { System.out.println("列:" + new String(keyValue.getFamily()) + " qualifier:" + new String(keyValue.getQualifier()) + " 值:" + new String(keyValue.getValue())); } } } catch (IOException e) { e.printStackTrace(); } } /* * 查詢單條數據 */ public static void querySingle(String tableName) { HTablePool pool = new HTablePool(cfg, 1000); try { Get get = new Get("1".getBytes());// 根據rowkey查詢 Result r = pool.getTable(tableName).get(get); System.out.println("rowkey:" + new String(r.getRow())); for (KeyValue keyValue : r.raw()) { System.out.println("列:" + new String(keyValue.getFamily()) + " 值:" + new String(keyValue.getValue())); } } catch (IOException e) { e.printStackTrace(); } } /* * 刪除數據 */ public static void deleteRow(String tablename, String rowkey) { try { HTable table = new HTable(cfg, tablename); List list = new ArrayList(); Delete d1 = new Delete(rowkey.getBytes()); list.add(d1); table.delete(list); System.out.println("刪除行成功!"); } catch (IOException e) { e.printStackTrace(); } } /* * 刪除表 */ public static void dropTable(String tableName) { try { HBaseAdmin admin = new HBaseAdmin(cfg); admin.disableTable(tableName); admin.deleteTable(tableName); System.out.println("table: " + tableName + "刪除成功!"); } catch (MasterNotRunningException e) { e.printStackTrace(); } catch (ZooKeeperConnectionException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } public static void main(String[] agrs) { try { String tableName = "wishTest1"; // HBaseCRUD.addMyColumn(tableName,"score"); HBaseCRUD.queryAll(tableName); } catch (Exception e) { e.printStackTrace(); } } }