基於 HDP2.4安裝(五):集群及組件安裝 創建的hadoop集群,修改默認配置,將hbase 存儲配置為 Azure Blob Storage
目錄:
- 簡述
- 配置
- 驗證
- FAQ
簡述:
- hadoop-azure 提供hadoop 與 azure blob storage 集成支持,需要部署 hadoop-azure.jar 程序包,在HDP2.4 安裝包中已默認提供,如下圖:
- 配置成功后,讀寫的數據都存儲在 Azure Blob Storage account
- 支持配置多個 Azure Blob Storage account, 實現了標准的 Hadoop FileSystem interface
- Reference file system paths using URLs using the wasb scheme.
- Tested on both Linux and Windows. Tested at scale.
- Azure Blob Storage 包含三部分內容:
-
- Storage Account: All access is done through a storage account
- Container: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.
- Blob: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata
配置 :
- 在 china Azure 門戶(https://manage.windowsazure.cn) 創建一個 blob storage Account, 如下圖命名:localhbase
- 配置訪問 Azure blob storage 訪問證書及key以及切換文件系統配置,本地 hadoop core-site.xml 文件,內容如下
<property> <name>fs.defaultFS</name> <value>wasb://localhbase@localhbase.blob.core.chinacloudapi.cn</value> </property> <property> <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name> <value>YOUR ACCESS KEY</value> </property>
-
在大多數場景下Hadoop clusters, the core-site.xml file is world-readable,為了安全起見,可通過配置將Key加密,然后通過配置的程序對key進行解密,此場景下的配置如下(基於安全考慮的可選配置):
<property> <name>fs.azure.account.keyprovider.localhbase.blob.core.chinacloudapi.cn</name> <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value> </property> <property> <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name> <value>YOUR ENCRYPTED ACCESS KEY</value> </property> <property> <name>fs.azure.shellkeyprovider.script</name> <value>PATH TO DECRYPTION PROGRAM</value> </property>
-
Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs;Block blobs are the default kind of blob and are good for most big-data use cases, like input data for Hive, Pig, analytical map-reduce jobs etc
-
Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times before you run out of blocks and your writes will fail,That won’t work for HBase logs, so page blob support was introduced to overcome this limitation
-
Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs
-
In order to have the files you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir to a comma-separated list of folder names
<property> <name>fs.azure.page.blob.dir</name> <value>/hbase/WALs,/hbase/oldWALs,/mapreducestaging,/hbase/MasterProcWALs,/atshistory,/tezstaging,/ams/hbase</value> </property>
驗證:
- 上面的參數配置均在 ambari 中完成,重啟參數依賴的服務
-
命令: hdfs dfs -ls /hbase/data/default 如下圖, 沒有數據
- 參見 HBase(三): Azure HDInsigt HBase表數據導入本地HBase 將測試表數據導入,完成后如下圖:
- 命令:./hbase hbck -repair -ignorePreCheckPermission
- 命令: hbase shell
- 查看數據,如下圖,則OK
- 用我們自己開發的查詢工具驗證數據,如下圖,關於工具的開發見下一章
- 參考資料: https://hadoop.apache.org/docs/current/hadoop-azure/index.html
FAQ
- ambari collector不要與regionserver一台機器
- 配置ha一定要在更改數據目錄到wasb之前
- hadoop core-site.xml增加以下配置,否則mapreduce2組件會起不來,(注意impl為小寫)
<property> <name>fs.AbstractFileSystem.wasb.impl</name> <value>org.apache.hadoop.fs.azure.Wasb</value> </property>
-
本地自建集群,配置HA,修改集群的FS為 wasb, 然后將原hbase集群物理文件目錄copy至新建的blob storage, 此時,在使用phoenix插入帶有索引的表數據時出錯,修改hbase-site.xml配置如下:
<property> <name>hbase.regionserver.wal.codec</name> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> </property>
-