hadoop(四): 本地 hbase 集群配置 Azure Blob Storage


      基於 HDP2.4安裝(五):集群及組件安裝  創建的hadoop集群,修改默認配置,將hbase 存儲配置為 Azure Blob Storage

目錄:

  • 簡述
  • 配置
  • 驗證
  • FAQ

簡述:


  • hadoop-azure 提供hadoop 與 azure blob storage 集成支持,需要部署 hadoop-azure.jar 程序包,在HDP2.4 安裝包中已默認提供,如下圖:
  • 配置成功后,讀寫的數據都存儲在 Azure Blob Storage account
  • 支持配置多個 Azure Blob Storage account, 實現了標准的 Hadoop FileSystem interface
  • Reference file system paths using URLs using the wasb scheme.
  • Tested on both Linux and Windows. Tested at scale.
  • Azure Blob Storage 包含三部分內容:
    1. Storage Account: All access is done through a storage account
    2. Container: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.
    3. Blob: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata

配置 :


  • 在 china Azure  門戶(https://manage.windowsazure.cn) 創建一個 blob storage Account, 如下圖命名:localhbase
  • 配置訪問 Azure blob storage 訪問證書及key以及切換文件系統配置,本地 hadoop  core-site.xml 文件,內容如下 
    <property>
      <name>fs.defaultFS</name>
      <value>wasb://localhbase@localhbase.blob.core.chinacloudapi.cn</value>
    </property>
    <property>
      <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name>
      <value>YOUR ACCESS KEY</value>
    </property>
  • 在大多數場景下Hadoop clusters, the core-site.xml file is world-readable,為了安全起見,可通過配置將Key加密,然后通過配置的程序對key進行解密,此場景下的配置如下(基於安全考慮的可選配置):

    <property>
      <name>fs.azure.account.keyprovider.localhbase.blob.core.chinacloudapi.cn</name>
      <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
    </property>
    <property>
      <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name>
      <value>YOUR ENCRYPTED ACCESS KEY</value>
    </property>
    <property>
      <name>fs.azure.shellkeyprovider.script</name>
      <value>PATH TO DECRYPTION PROGRAM</value>
    </property>
  • Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs;Block blobs are the default kind of blob and are good for most big-data use cases, like input data for Hive, Pig, analytical map-reduce jobs etc

  • Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times before you run out of blocks and your writes will fail,That won’t work for HBase logs, so page blob support was introduced to overcome this limitation

  •  Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs

  • In order to have the files you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir to a comma-separated list of folder names

    <property>
       <name>fs.azure.page.blob.dir</name>
       <value>/hbase/WALs,/hbase/oldWALs,/mapreducestaging,/hbase/MasterProcWALs,/atshistory,/tezstaging,/ams/hbase</value>
    </property>

驗證: 


FAQ


  • ambari collector不要與regionserver一台機器
  • 配置ha一定要在更改數據目錄到wasb之前
  • hadoop core-site.xml增加以下配置,否則mapreduce2組件會起不來,(注意impl為小寫)
    <property>         
      <name>fs.AbstractFileSystem.wasb.impl</name>                           
      <value>org.apache.hadoop.fs.azure.Wasb</value> 
    </property>
  • 本地自建集群,配置HA,修改集群的FS為 wasb, 然后將原hbase集群物理文件目錄copy至新建的blob storage, 此時,在使用phoenix插入帶有索引的表數據時出錯,修改hbase-site.xml配置如下:

    <property>         
      <name>hbase.regionserver.wal.codec</name>                           
      <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> 
    </property>
  •  


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM