Hbase安裝及客戶端測試及筆記

海量存儲
列式存儲
極易擴展

Hbase的擴展性主要體現在兩個方面，一個是基於上層處理能力（RegionServer）的擴展，一個是基於存儲的擴展（HDFS）。
通過橫向添加RegionSever的機器，進行水平擴展，提升Hbase上層的處理能力，提升Hbsae服務更多Region的能力。

備注：RegionServer的作用是管理region、承接業務的訪問，這個后面會詳細的介紹通過橫向添加Datanode的機器，進行存儲層擴容，提升Hbase的數據存儲能力和提升后端存儲的讀寫能力。
高並發
稀疏

稀疏主要是針對Hbase列的靈活性，在列族中，你可以指定任意多的列，在列數據為空的情況下，是不會占用存儲空間的。

Hbase 架構

在這里插入圖片描述

Client
Client包含了訪問Hbase的接口，另外Client還維護了對應的cache來加速Hbase的訪問，比如cache的.META.元數據的信息。
Zookeeper
HBase通過Zookeeper來做master的高可用、RegionServer的監控、元數據的入口以及集群配置的維護等工作。具體工作如下：
通過Zoopkeeper來保證集群中只有1個master在運行，如果master異常，會通過競爭機制產生新的master提供服務
通過Zoopkeeper來監控RegionServer的狀態，當RegionSevrer有異常的時候，通過回調的形式通知Master RegionServer上下線的信息
通過Zoopkeeper存儲元數據的統一入口地址
Hmaster
master節點的主要職責如下：
為RegionServer分配Region
維護整個集群的負載均衡
維護集群的元數據信息
發現失效的Region，並將失效的Region分配到正常的RegionServer上
當RegionSever失效的時候，協調對應Hlog的拆分
HregionServer
HregionServer直接對接用戶的讀寫請求，是真正的“干活”的節點。它的功能概括如下：
管理master為其分配的Region
處理來自客戶端的讀寫請求
負責和底層HDFS的交互，存儲數據到HDFS
負責Region變大以后的拆分
負責Storefile的合並工作
HDFS
HDFS為Hbase提供最終的底層數據存儲服務，同時為HBase提供高可用（Hlog存儲在HDFS）的支持，具體功能概括如下：
提供元數據和表數據的底層分布式存儲服務
數據多副本，保證的高可靠和高可用性

Hbase 中的角色

HMaster

功能

1．監控RegionServer

2．處理RegionServer故障轉移

3．處理元數據的變更

4．處理region的分配或轉移

5．在空閑時間進行數據的負載均衡

6．通過Zookeeper發布自己的位置給客戶端

RegionServer

功能

1．負責存儲HBase的實際數據

2．處理分配給它的Region

3．刷新緩存到HDFS

4．維護Hlog

5．執行壓縮

6．負責處理Region分片

其他組件

1．Write-Ahead logs

HBase的修改記錄，當對HBase讀寫數據的時候，數據不是直接寫進磁盤，它會在內存中保留一段時間（時間以及數據量閾值可以設定）。但把數據保存在內存中可能有更高的概率引起數據丟失，為了解決這個問題，數據會先寫在一個叫做Write-Ahead logfile的文件中，然后再寫入內存中。所以在系統出現故障的時候，數據可以通過這個日志文件重建。

2．Region

Hbase表的分片，HBase表會根據RowKey值被切分成不同的region存儲在RegionServer中，在一個RegionServer中可以有多個不同的region。

3．Store

HFile存儲在Store中，一個Store對應HBase表中的一個列族。

4．MemStore

顧名思義，就是內存存儲，位於內存中，用來保存當前的數據操作，所以當數據保存在WAL中之后，RegsionServer會在內存中存儲鍵值對。

5．HFile

這是在磁盤上保存原始數據的實際的物理文件，是實際的存儲文件。StoreFile是以Hfile的形式存儲在HDFS的。

HBase 安裝

Zookeeper 正常部署

bin/zkServer.sh start

hadoop正常部署

sbin/start-dfs.sh

sbin/start-yarn.sh

HBase 安裝

1. 解壓到目錄

tar -zxvf hbase-1.3.1-bin.tar.gz -C /opt/module

2. 配置文件

hbase-env.sh 修改內容

export JAVA_HOME=/opt/module/jdk1.8.0_144
export HBASE_MANAGES_ZK=false

hbase-site.xml 修改內容

<configuration>
	<property>     
		<name>hbase.rootdir</name>     
		<value>hdfs://hadoop102:9000/hbase</value>   
	</property>

	<property>   
		<name>hbase.cluster.distributed</name>
		<value>true</value>
	</property>

   <!-- 0.98后的新變動，之前版本沒有.port,默認端口為60000 -->
	<property>
		<name>hbase.master.port</name>
		<value>16000</value>
	</property>

	<property>   
		<name>hbase.zookeeper.quorum</name>
	     <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
	</property>

	<property>   
		<name>hbase.zookeeper.property.dataDir</name>
	     <value>/opt/module/zookeeper-3.4.10/zkData</value>
	</property>
</configuration>

regionservers:
```
hadoop100
hadoop101
hadoop102
```

軟連接hadoop配置文件到hbase

ln -s /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml /opt/module/hbase/conf/core-site.xml

ln -s /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml 
/opt/module/hbase/conf/hdfs-site.xml

同步其他服務器

xsync.sh hbase/

啟動服務

啟動方式1

bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver

如果集群之間的節點時間不同步，會導致regionserver無法啟動，拋出ClockOutOfSyncException異常。

同步時間服務看 hadoop

屬性hbase.master.maxclockskew設置更大的值

<property>
        <name>hbase.master.maxclockskew</name>
        <value>180000</value>
        <description>Time difference of regionserver from master</description>
 </property>

啟動方式2

bin/start-hbase.sh

bin/stop-hbase.sh

查看HBase 頁面

http://hadoop100:16010

HBase Shell操作

基本操作

進入客戶端命令行

Bin/hbase shell
幫助

help
查看當前數據中有哪些表

list
如果在shell命令想要刪除，ctrl+backspace 進行刪除

表的操作

創建表

create ‘student’,‘info’

插入數據到表

hbase(main):003:0> put 'student','1001','info:sex','male'
hbase(main):004:0> put 'student','1001','info:age','18'
hbase(main):005:0> put 'student','1002','info:name','Janna'
hbase(main):006:0> put 'student','1002','info:sex','female'
hbase(main):007:0> put 'student','1002','info:age','20'

掃描查看數據表

hbase(main):008:0> scan 'student'
hbase(main):009:0> scan 'student',{STARTROW => '1001', STOPROW  => '1001'}
hbase(main):010:0> scan 'student',{STARTROW => '1001'}

查看表結構

hbase(main):008:0> scan 'student'
hbase(main):009:0> scan 'student',{STARTROW => '1001', STOPROW  => '1001'}
hbase(main):010:0> scan 'student',{STARTROW => '1001'}

更新指定字段的數據

hbase(main):012:0> put 'student','1001','info:name','Nick'
hbase(main):013:0> put 'student','1001','info:age','100'

查看指定行或指定列族的數據

hbase(main):014:0> get 'student','1001'
hbase(main):015:0> get 'student','1001','info:name'

統計表數據行數
```
 count 'student'
```

刪除數據

刪除某rowkey的全部數據：
hbase(main):016:0> deleteall 'student','1001'
刪除某rowkey的某一列數據：
hbase(main):017:0> delete 'student','1002','info:sex'

清空表數據

truncate ‘student’

刪除表

首先需要先讓該表為disable狀態：
hbase(main):019:0> disable 'student'
然后才能drop這個表：
hbase(main):020:0> drop 'student'
提示：如果直接drop表，會報錯：ERROR: Table student is enabled. Disable it first.

變更表信息

hbase(main):022:0> alter 'student',{NAME=>'info',VERSIONS=>3}
hbase(main):022:0> get 'student','1001',{COLUMN=>'info:name',VERSIONS=>3}

HBase數據結構

RowKey

與nosql數據庫們一樣,RowKey是用來檢索記錄的主鍵。訪問HBASE table中的行，只有三種方式：

1.通過單個RowKey訪問

2.通過RowKey的range（正則）

3.全表掃描

RowKey行鍵 (RowKey)可以是任意字符串(最大長度是64KB，實際應用中長度一般為 10-100bytes)，在HBASE內部，RowKey保存為字節數組。存儲時，數據按照RowKey的字典序(byte order)排序存儲。設計RowKey時，要充分排序存儲這個特性，將經常一起讀取的行存儲放到一起。(位置相關性)

Column Family

列族：HBASE表中的每個列，都歸屬於某個列族。列族是表的schema的一部分(而列不是)，必須在使用表之前定義。列名都以列族作為前綴。例如 courses:history，courses:math都屬於courses 這個列族。

Cell

由{rowkey, column Family:columu, version} 唯一確定的單元。cell中的數據是沒有類型的，全部是字節碼形式存貯。

關鍵字：無類型、字節碼

Time Stamp

HBASE 中通過rowkey和columns確定的為一個存貯單元稱為cell。每個 cell都保存着同一份數據的多個版本。版本通過時間戳來索引。時間戳的類型是 64位整型。時間戳可以由HBASE(在數據寫入時自動 )賦值，此時時間戳是精確到毫秒的當前系統時間。時間戳也可以由客戶顯式賦值。如果應用程序要避免數據版本沖突，就必須自己生成具有唯一性的時間戳。每個 cell中，不同版本的數據按照時間倒序排序，即最新的數據排在最前面。

為了避免數據存在過多版本造成的的管理 (包括存貯和索引)負擔，HBASE提供了兩種數據版本回收方式。一是保存數據的最后n個版本，二是保存最近一段時間內的版本（比如最近七天）。用戶可以針對每個列族進行設置。

命名空間

在這里插入圖片描述

1) Table：表，所有的表都是命名空間的成員，即表必屬於某個命名空間，如果沒有指定，則在default默認的命名空間中。

2) RegionServer group**：**一個命名空間包含了默認的RegionServer Group。

3) Permission**：**權限，命名空間能夠讓我們來定義訪問控制列表ACL（Access Control List）。例如，創建表，讀取表，刪除，更新等等操作。

4) Quota**：**限額，可以強制一個命名空間可包含的region的數量。

Hbase 原理

讀流程

在這里插入圖片描述

1）Client先訪問zookeeper，從meta表讀取region的位置，然后讀取meta表中的數據。meta中又存儲了用戶表的region信息；

2）根據namespace、表名和rowkey在meta表中找到對應的region信息；

3）找到這個region對應的regionserver；

4）查找對應的region；

5）先從MemStore找數據，如果沒有，再到BlockCache里面讀；

6）BlockCache還沒有，再到StoreFile上讀(為了讀取的效率)；

7）如果是從StoreFile里面讀取的數據，不是直接返回給客戶端，而是先寫入BlockCache，再返回給客戶端。

寫流程

在這里插入圖片描述

1）Client向HregionServer發送寫請求；

2）HregionServer將數據寫到HLog（write ahead log）。為了數據的持久化和恢復；

3）HregionServer將數據寫到內存（MemStore）；

4）反饋Client寫成功。

數據Flush過程

1）當MemStore數據達到閾值（默認是128M，老版本是64M），將數據刷到硬盤，將內存中的數據刪除，同時刪除HLog中的歷史數據；

2）並將數據存儲到HDFS中；

3）在HLog中做標記點。

數據合並過程

1）當數據塊達到4塊，Hmaster觸發合並操作，Region將數據塊加載到本地，進行合並；

2）當合並的數據超過256M，進行拆分，將拆分后的Region分配給不同的HregionServer管理；

3）當HregionServer宕機后，將HregionServer上的hlog拆分，然后分配給不同的HregionServer加載，修改.META.；

4）注意：HLog會同步到HDFS。

HBase Api 操作

環境准備

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.3.1</version>
</dependency>

HBaseAPI

獲取Configuration對象

public static Configuration conf;
static{
	//使用HBaseConfiguration的單例方法實例化
	conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "192.168.9.102");
conf.set("hbase.zookeeper.property.clientPort", "2181");
}

判斷表是否存在

public static boolean isTableExist(String tableName) throws MasterNotRunningException,
 ZooKeeperConnectionException, IOException{
	//在HBase中管理、訪問表需要先創建HBaseAdmin對象
//Connection connection = ConnectionFactory.createConnection(conf);
//HBaseAdmin admin = (HBaseAdmin) connection.getAdmin();
	HBaseAdmin admin = new HBaseAdmin(conf);
	return admin.tableExists(tableName);
}

創建表

public static void createTable(String tableName, String... columnFamily) throws
 MasterNotRunningException, ZooKeeperConnectionException, IOException{
	HBaseAdmin admin = new HBaseAdmin(conf);
	//判斷表是否存在
	if(isTableExist(tableName)){
		System.out.println("表" + tableName + "已存在");
		//System.exit(0);
	}else{
		//創建表屬性對象,表名需要轉字節
		HTableDescriptor descriptor = new HTableDescriptor(TableName.valueOf(tableName));
		//創建多個列族
		for(String cf : columnFamily){
			descriptor.addFamily(new HColumnDescriptor(cf));
		}
		//根據對表的配置，創建表
		admin.createTable(descriptor);
		System.out.println("表" + tableName + "創建成功！");
	}
}

刪除表

public static void dropTable(String tableName) throws MasterNotRunningException,
 ZooKeeperConnectionException, IOException{
	HBaseAdmin admin = new HBaseAdmin(conf);
	if(isTableExist(tableName)){
		admin.disableTable(tableName);
		admin.deleteTable(tableName);
		System.out.println("表" + tableName + "刪除成功！");
	}else{
		System.out.println("表" + tableName + "不存在！");
	}
}

向表中插入數據

public static void addRowData(String tableName, String rowKey, String columnFamily, String
 column, String value) throws IOException{
	//創建HTable對象
	HTable hTable = new HTable(conf, tableName);
	//向表中插入數據
	Put put = new Put(Bytes.toBytes(rowKey));
	//向Put對象中組裝數據
	put.add(Bytes.toBytes(columnFamily), Bytes.toBytes(column), Bytes.toBytes(value));
	hTable.put(put);
	hTable.close();
	System.out.println("插入數據成功");
}

刪除多行數據

public static void deleteMultiRow(String tableName, String... rows) throws IOException{
	HTable hTable = new HTable(conf, tableName);
	List<Delete> deleteList = new ArrayList<Delete>();
	for(String row : rows){
		Delete delete = new Delete(Bytes.toBytes(row));
		deleteList.add(delete);
	}
	hTable.delete(deleteList);
	hTable.close();
}

獲取所有數據

public static void getAllRows(String tableName) throws IOException{
	HTable hTable = new HTable(conf, tableName);
	//得到用於掃描region的對象
	Scan scan = new Scan();
	//使用HTable得到resultcanner實現類的對象
	ResultScanner resultScanner = hTable.getScanner(scan);
	for(Result result : resultScanner){
		Cell[] cells = result.rawCells();
		for(Cell cell : cells){
			//得到rowkey
			System.out.println("行鍵:" + Bytes.toString(CellUtil.cloneRow(cell)));
			//得到列族
			System.out.println("列族" + Bytes.toString(CellUtil.cloneFamily(cell)));
			System.out.println("列:" + Bytes.toString(CellUtil.cloneQualifier(cell)));
			System.out.println("值:" + Bytes.toString(CellUtil.cloneValue(cell)));
		}
	}
}

獲取某一行數據

public static void getRow(String tableName, String rowKey) throws IOException{
	HTable table = new HTable(conf, tableName);
	Get get = new Get(Bytes.toBytes(rowKey));
	//get.setMaxVersions();顯示所有版本
    //get.setTimeStamp();顯示指定時間戳的版本
	Result result = table.get(get);
	for(Cell cell : result.rawCells()){
		System.out.println("行鍵:" + Bytes.toString(result.getRow()));
		System.out.println("列族" + Bytes.toString(CellUtil.cloneFamily(cell)));
		System.out.println("列:" + Bytes.toString(CellUtil.cloneQualifier(cell)));
		System.out.println("值:" + Bytes.toString(CellUtil.cloneValue(cell)));
		System.out.println("時間戳:" + cell.getTimestamp());
	}
}

獲取某一行指定”列族：列“數據

public static void getRowQualifier(String tableName, String rowKey, String family, String
 qualifier) throws IOException{
	HTable table = new HTable(conf, tableName);
	Get get = new Get(Bytes.toBytes(rowKey));
	get.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier));
	Result result = table.get(get);
	for(Cell cell : result.rawCells()){
		System.out.println("行鍵:" + Bytes.toString(result.getRow()));
		System.out.println("列族" + Bytes.toString(CellUtil.cloneFamily(cell)));
		System.out.println("列:" + Bytes.toString(CellUtil.cloneQualifier(cell)));
		System.out.println("值:" + Bytes.toString(CellUtil.cloneValue(cell)));
	}
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hbase 客戶端Scan MQTT 2——服務端安裝與客戶端測試 hbase客戶端Client作用 HBase客戶端避坑指南 Hbase配置java客戶端客戶端性能測試 Insomnia rest客戶端測試工具安裝蘋果IPad客戶端安裝測試軟件 docker安裝fastdfs與java客戶端測試 HBase學習筆記1 - 如何編寫高性能的客戶端Java代碼