hadoop(四): 本地 hbase 集群配置 Azure Blob Storage

本文轉載自查看原文 2016-09-05 21:19 1564 HBase

基於 HDP2.4安裝(五)：集群及組件安裝 創建的hadoop集群，修改默認配置，將hbase 存儲配置為 Azure Blob Storage

目錄：

簡述
配置
驗證
FAQ

簡述：

hadoop-azure 提供hadoop 與 azure blob storage 集成支持，需要部署 hadoop-azure.jar 程序包，在HDP2.4 安裝包中已默認提供，如下圖：
配置成功后，讀寫的數據都存儲在 Azure Blob Storage account
支持配置多個 Azure Blob Storage account，實現了標准的 Hadoop FileSystem interface
Reference file system paths using URLs using the wasb scheme.
Tested on both Linux and Windows. Tested at scale.
Azure Blob Storage 包含三部分內容:

1. Storage Account: All access is done through a storage account
2. Container: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.
3. Blob: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata

配置：

在 china Azure 門戶(https://manage.windowsazure.cn) 創建一個 blob storage Account, 如下圖命名：localhbase

配置訪問 Azure blob storage 訪問證書及key以及切換文件系統配置，本地 hadoop core-site.xml 文件，內容如下

<property>
  <name>fs.defaultFS</name>
  <value>wasb://localhbase@localhbase.blob.core.chinacloudapi.cn</value>
</property>
<property>
  <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name>
  <value>YOUR ACCESS KEY</value>
</property>

在大多數場景下Hadoop clusters, the core-site.xml file is world-readable，為了安全起見，可通過配置將Key加密，然后通過配置的程序對key進行解密，此場景下的配置如下（基於安全考慮的可選配置）：

<property>
  <name>fs.azure.account.keyprovider.localhbase.blob.core.chinacloudapi.cn</name>
  <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
</property>
<property>
  <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name>
  <value>YOUR ENCRYPTED ACCESS KEY</value>
</property>
<property>
  <name>fs.azure.shellkeyprovider.script</name>
  <value>PATH TO DECRYPTION PROGRAM</value>
</property>

Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs；Block blobs are the default kind of blob and are good for most big-data use cases, like input data for Hive, Pig, analytical map-reduce jobs etc
Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times before you run out of blocks and your writes will fail，That won’t work for HBase logs, so page blob support was introduced to overcome this limitation
Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs

In order to have the files you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir to a comma-separated list of folder names

<property>
   <name>fs.azure.page.blob.dir</name>
   <value>/hbase/WALs,/hbase/oldWALs,/mapreducestaging,/hbase/MasterProcWALs,/atshistory,/tezstaging,/ams/hbase</value>
</property>

驗證：

上面的參數配置均在 ambari 中完成，重啟參數依賴的服務
命令： hdfs dfs -ls /hbase/data/default 如下圖, 沒有數據
參見 HBase(三): Azure HDInsigt HBase表數據導入本地HBase 將測試表數據導入，完成后如下圖：
命令：./hbase hbck -repair -ignorePreCheckPermission
命令： hbase shell
查看數據，如下圖，則OK
用我們自己開發的查詢工具驗證數據，如下圖，關於工具的開發見下一章
參考資料： https://hadoop.apache.org/docs/current/hadoop-azure/index.html

FAQ

ambari collector不要與regionserver一台機器
配置ha一定要在更改數據目錄到wasb之前

hadoop core-site.xml增加以下配置，否則mapreduce2組件會起不來,(注意impl為小寫)

<property>         
  <name>fs.AbstractFileSystem.wasb.impl</name>                           
  <value>org.apache.hadoop.fs.azure.Wasb</value> 
</property>

本地自建集群，配置HA，修改集群的FS為 wasb, 然后將原hbase集群物理文件目錄copy至新建的blob storage, 此時，在使用phoenix插入帶有索引的表數據時出錯，修改hbase-site.xml配置如下：
```
<property>         
  <name>hbase.regionserver.wal.codec</name>                           
  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> 
</property>
```

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Azure Blob Storage從入門到精通 Python 操作 Azure Blob Storage PowerShell 操作 Azure Blob Storage 使用Azure Blob優化Hadoop集群存儲成本 Azure Storage 系列（三）Blob 參數設置說明 Azure Data Factory（五）Blob Storage 密鑰管理問題 Hadoop集群(三) Hbase搭建 Windows Azure Storage (9) Windows Azure 上的托管服務CDN (中) Blob Service Hadoop+HBase 集群搭建 Azure Storage 系列（二） .NET Core Web 項目中操作 Blob 存儲