Hbase集群搭建及所有配置調優參數整理及API代碼運行

本文轉載自查看原文 2016-05-31 01:30 6956 Hbase

　　最近為了方便開發，在自己的虛擬機上搭建了三節點的Hadoop集群與Hbase集群，hadoop集群的搭建與zookeeper集群這里就不再詳細說明，原來的筆記中記錄過。這里將hbase配置參數進行相應整理，方便日后使用。

首先vi ~/.bash_profile將hbase的環境變量進行配置，最后source ~./bash_profile使之立即生效

1、修改hbase-env.sh

　　由於我使用的是外置的zookeeper，所以這里HBASE_MANAGES_ZK設置為，設置參數:

# The java implementation to use.  Java 1.7+ required.
 export JAVA_HOME=/usr/local/yangsy/jdk1.7.0_55

# Extra Java CLASSPATH elements.  Optional.
 export HBASE_CLASSPATH=/usr/local/hbase-1.0.2/conf

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
 export HBASE_MANAGES_ZK=false

2、修改hbase-site.xml

<configuration> 
 //設置將數據寫入hdfs的目錄
  <property>  
    <name>hbase.rootdir</name>  
    <value>hdfs://master:9000/usr/local/hadoop-2.6.0/hbaseData</value>  
  </property>  
 //設置hbase模式為集群模式
<property>  
    <name>hbase.cluster.distributed</name>  
    <value>true</value>  
</property>
 //設置hbase的master端口地址
<property>
  <name>hbase.master</name>
  <value>hdfs://master:60000</value>
</property>
//HBase Master web界面綁定的端口,默認為0.0.0.0
<property>
  <name>hbase.master.info.port</name>
  <value>60010</value>
</property>
//連接zookeeper的端口設置
    <property>    
        <name>hbase.zookeeper.property.clientPort</name>    
        <value>2183</value>    
    </property>  
  //設置zookeeper的連接地址(必須為基數個)
 <property>  
           <name>hbase.zookeeper.quorum</name>  
           <value>master,slave1,slave2</value>  
   </property>
//Zookeeper的zoo.conf中的路徑配置，快照的存儲位置
<property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/usr/local/zookeeper-3.4.6/data</value>
</property>
//Zookeeper連接超時時間
<property>
  <name>zookeeper.session.timeout</name>
  <value>60000</value>
</property>
 
</configuration>

這里要注意的是，如果選擇外置的zookeeper集群，則需要將zookeeper的zoo.cfg拷貝至HBase的conf下。在啟動HBase時，將會自動加載該配置文件。同時，如果hadoop為ha集群的話，需要將core-site.xml以及hdfs-site.xml拷貝到hbase的conf下，否則啟動后regionServer會報unknownhost.

3、修改regionservers

slave1
slave2

4、啟動hadoop集群、zookeeper集群以及Hbase

首先要確保zkeeper是否正常啟動在zookeeper bin目錄下使用./zkServer.sh status查看狀態

5、查看HBase master啟動是否報錯

6、查看各slave節點reginserver是否報錯

6、看來啟動成功，可以后續愉快的玩耍了

最后查閱了Hbase相關配置參數，這里進行總結，以便日后熟練后調優

hbase.rootdir

這個目錄是region server的共享目錄，用來持久化Hbase。URL需要是'完全正確'的，還要包含文件系統的scheme。例如，要表示hdfs中的 '/hbase'目錄，namenode 運行在namenode.example.org的9090端口。則需要設置為hdfs://namenode.example.org:9000 /hbase。默認情況下Hbase是寫到/tmp的。不改這個配置，數據會在重啟的時候丟失。

默認: file:///tmp/hbase-${user.name}/hbase

hbase.master.port

Hbase的Master的端口.

默認: 60000

hbase.cluster.distributed

Hbase的運行模式。false是單機模式，true是分布式模式。若為false,Hbase和Zookeeper會運行在同一個JVM里面。

默認: false

hbase.tmp.dir

本地文件系統的臨時文件夾。可以修改到一個更為持久的目錄上。(/tmp會在重啟時清楚)

默認: /tmp/hbase-${user.name}

hbase.master.info.port

HBase Master web 界面端口. 設置為-1 意味着你不想讓他運行。

默認: 60010

hbase.master.info.bindAddress

HBase Master web 界面綁定的端口

默認: 0.0.0.0

hbase.client.write.buffer

HTable 客戶端的寫緩沖的默認大小。這個值越大，需要消耗的內存越大。因為緩沖在客戶端和服務端都有實例，所以需要消耗客戶端和服務端兩個地方的內存。得到的好處是，可以減少RPC的次數。可以這樣估算服務器端被占用的內存： hbase.client.write.buffer * hbase.regionserver.handler.count

默認: 2097152

hbase.regionserver.port

HBase RegionServer綁定的端口

默認: 60020

hbase.regionserver.info.port

HBase RegionServer web 界面綁定的端口設置為 -1 意味這你不想與運行 RegionServer 界面.

默認: 60030

hbase.regionserver.info.port.auto

Master或RegionServer是否要動態搜一個可以用的端口來綁定界面。當hbase.regionserver.info.port已經被占用的時候，可以搜一個空閑的端口綁定。這個功能在測試的時候很有用。默認關閉。

默認: false

hbase.regionserver.info.bindAddress

HBase RegionServer web 界面的IP地址

默認: 0.0.0.0

hbase.regionserver.class

RegionServer 使用的接口。客戶端打開代理來連接region server的時候會使用到。

默認: org.apache.hadoop.hbase.ipc.HRegionInterface

hbase.client.pause

通常的客戶端暫停時間。最多的用法是客戶端在重試前的等待時間。比如失敗的get操作和region查詢操作等都很可能用到。

默認: 1000

hbase.client.retries.number

最大重試次數。例如 region查詢，Get操作，Update操作等等都可能發生錯誤，需要重試。這是最大重試錯誤的值。

默認: 10

hbase.client.scanner.caching

當調用Scanner的next方法，而值又不在緩存里的時候，從服務端一次獲取的行數。越大的值意味着Scanner會快一些，但是會占用更多的內存。當緩沖被占滿的時候，next方法調用會越來越慢。慢到一定程度，可能會導致超時。例如超過了 hbase.regionserver.lease.period。

默認: 1

hbase.client.keyvalue.maxsize

一個KeyValue實例的最大size.這個是用來設置存儲文件中的單個entry的大小上界。因為一個KeyValue是不能分割的，所以可以避免因為數據過大導致region不可分割。明智的做法是把它設為可以被最大region size整除的數。如果設置為0或者更小，就會禁用這個檢查。默認10MB。

默認: 10485760

hbase.regionserver.lease.period

客戶端租用HRegion server 期限，即超時閥值。單位是毫秒。默認情況下，客戶端必須在這個時間內發一條信息，否則視為死掉。

默認: 60000

hbase.regionserver.handler.count

RegionServers受理的RPC Server實例數量。對於Master來說，這個屬性是Master受理的handler數量

默認: 10

hbase.regionserver.msginterval

RegionServer 發消息給 Master 時間間隔，單位是毫秒

默認: 3000

hbase.regionserver.optionallogflushinterval

將Hlog同步到HDFS的間隔。如果Hlog沒有積累到一定的數量，到了時間，也會觸發同步。默認是1秒，單位毫秒。

默認: 1000

hbase.regionserver.regionSplitLimit

region的數量到了這個值后就不會在分裂了。這不是一個region數量的硬性限制。但是起到了一定指導性的作用，到了這個值就該停止分裂了。默認是MAX_INT.就是說不阻止分裂。

默認: 2147483647

hbase.regionserver.logroll.period

提交commit log的間隔，不管有沒有寫足夠的值。

默認: 3600000

hbase.regionserver.hlog.reader.impl

HLog file reader 的實現.

默認: org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader

hbase.regionserver.hlog.writer.impl

HLog file writer 的實現.

默認: org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter

hbase.regionserver.thread.splitcompactcheckfrequency

region server 多久執行一次split/compaction 檢查.

默認: 20000

hbase.regionserver.nbreservationblocks

儲備的內存block的數量(譯者注:就像石油儲備一樣)。當發生out of memory 異常的時候，我們可以用這些內存在RegionServer停止之前做清理操作。

默認: 4

hbase.zookeeper.dns.interface

當使用DNS的時候，Zookeeper用來上報的IP地址的網絡接口名字。

默認: default

hbase.zookeeper.dns.nameserver

當使用DNS的時候，Zookeepr使用的DNS的域名或者IP 地址，Zookeeper用它來確定和master用來進行通訊的域名.

默認: default

hbase.regionserver.dns.interface

當使用DNS的時候，RegionServer用來上報的IP地址的網絡接口名字。

默認: default

hbase.regionserver.dns.nameserver

當使用DNS的時候，RegionServer使用的DNS的域名或者IP 地址，RegionServer用它來確定和master用來進行通訊的域名.

默認: default

hbase.master.dns.interface

當使用DNS的時候，Master用來上報的IP地址的網絡接口名字。

默認: default

hbase.master.dns.nameserver

當使用DNS的時候，RegionServer使用的DNS的域名或者IP 地址，Master用它來確定用來進行通訊的域名.

默認: default

hbase.balancer.period

Master執行region balancer的間隔。

默認: 300000

hbase.regions.slop

當任一regionserver有average + (average * slop)個region是會執行Rebalance

默認: 0

hbase.master.logcleaner.ttl

Hlog存在於.oldlogdir 文件夾的最長時間, 超過了就會被 Master 的線程清理掉.

默認: 600000

hbase.master.logcleaner.plugins

LogsCleaner 服務會執行的一組LogCleanerDelegat。值用逗號間隔的文本表示。這些WAL/HLog cleaners會按順序調用。可以把先調用的放在前面。你可以實現自己的LogCleanerDelegat，加到Classpath下，然后在這里寫下類的全稱。一般都是加在默認值的前面。

默認: org.apache.hadoop.hbase.master.TimeToLiveLogCleaner

hbase.regionserver.global.memstore.upperLimit

單個region server的全部memtores的最大值。超過這個值，一個新的update操作會被掛起，強制執行flush操作。

默認: 0.4

hbase.regionserver.global.memstore.lowerLimit

當強制執行flush操作的時候，當低於這個值的時候，flush會停止。默認是堆大小的 35% . 如果這個值和 hbase.regionserver.global.memstore.upperLimit 相同就意味着當update操作因為內存限制被掛起時，會盡量少的執行flush(譯者注:一旦執行flush，值就會比下限要低，不再執行)

默認: 0.35

hbase.server.thread.wakefrequency

service工作的sleep間隔，單位毫秒。可以作為service線程的sleep間隔，比如log roller.

默認: 10000

hbase.hregion.memstore.flush.size

當memstore的大小超過這個值的時候，會flush到磁盤。這個值被一個線程每隔hbase.server.thread.wakefrequency檢查一下。

默認: 67108864

hbase.hregion.preclose.flush.size

當一個region中的memstore的大小大於這個值的時候，我們又觸發了close.會先運行“pre-flush”操作，清理這個需要關閉的 memstore，然后將這個region下線。當一個region下線了，我們無法再進行任何寫操作。如果一個memstore很大的時候，flush 操作會消耗很多時間。"pre-flush"操作意味着在region下線之前，會先把memstore清空。這樣在最終執行close操作的時候，flush操作會很快。

默認: 5242880

hbase.hregion.memstore.block.multiplier

如果memstore有hbase.hregion.memstore.block.multiplier倍數的 hbase.hregion.flush.size的大小，就會阻塞update操作。這是為了預防在update高峰期會導致的失控。如果不設上界，flush的時候會花很長的時間來合並或者分割，最壞的情況就是引發out of memory異常。(譯者注:內存操作的速度和磁盤不匹配，需要等一等。原文似乎有誤)

默認: 2

hbase.hregion.memstore.mslab.enabled

體驗特性：啟用memStore分配本地緩沖區。這個特性是為了防止在大量寫負載的時候堆的碎片過多。這可以減少GC操作的頻率。(GC有可能會Stop the world)(譯者注：實現的原理相當於預分配內存，而不是每一個值都要從堆里分配)

默認: false

hbase.hregion.max.filesize

最大HStoreFile大小。若某個Column families的HStoreFile增長達到這個值，這個Hegion會被切割成兩個。 Default: 256M.

默認: 268435456

hbase.hstore.compactionThreshold

當一個HStore含有多於這個值的HStoreFiles(每一個memstore flush產生一個HStoreFile)的時候，會執行一個合並操作，把這HStoreFiles寫成一個。這個值越大，需要合並的時間就越長。

默認: 3

hbase.hstore.blockingStoreFiles

當一個HStore含有多於這個值的HStoreFiles(每一個memstore flush產生一個HStoreFile)的時候，會執行一個合並操作，update會阻塞直到合並完成，直到超過了hbase.hstore.blockingWaitTime的值

默認: 7

hbase.hstore.blockingWaitTime

hbase.hstore.blockingStoreFiles所限制的StoreFile數量會導致update阻塞，這個時間是來限制阻塞時間的。當超過了這個時間，HRegion會停止阻塞update操作，不過合並還有沒有完成。默認為90s.

默認: 90000

hbase.hstore.compaction.max

每個“小”合並的HStoreFiles最大數量。

默認: 10

hbase.hregion.majorcompaction

一個Region中的所有HStoreFile的major compactions的時間間隔。默認是1天。設置為0就是禁用這個功能。

默認: 86400000

hbase.mapreduce.hfileoutputformat.blocksize

MapReduce 中HFileOutputFormat可以寫 storefiles/hfiles. 這個值是hfile的blocksize的最小值。通常在Hbase寫Hfile的時候，bloocksize是由table schema(HColumnDescriptor)決定的，但是在mapreduce寫的時候，我們無法獲取schema中blocksize。這個值越小，你的索引就越大，你隨機訪問需要獲取的數據就越小。如果你的cell都很小，而且你需要更快的隨機訪問，可以把這個值調低。

默認: 65536

hfile.block.cache.size

分配給HFile/StoreFile的block cache占最大堆(-Xmx setting)的比例。默認是20%，設置為0就是不分配。

默認: 0.2

hbase.hash.type

哈希函數使用的哈希算法。可以選擇兩個值:: murmur (MurmurHash) 和 jenkins (JenkinsHash). 這個哈希是給 bloom filters用的.

默認: murmur

hbase.master.keytab.file

HMaster server驗證登錄使用的kerberos keytab 文件路徑。(譯者注：Hbase使用Kerberos實現安全)

默認:

hbase.master.kerberos.principal

例如. "hbase/_HOST@EXAMPLE.COM". HMaster運行需要使用 kerberos principal name. principal name 可以在: user/hostname@DOMAIN 中獲取. 如果 "_HOST" 被用做hostname portion，需要使用實際運行的hostname來替代它。

默認:

hbase.regionserver.keytab.file

HRegionServer驗證登錄使用的kerberos keytab 文件路徑。

默認:

hbase.regionserver.kerberos.principal

例如. "hbase/_HOST@EXAMPLE.COM". HRegionServer運行需要使用 kerberos principal name. principal name 可以在: user/hostname@DOMAIN 中獲取. 如果 "_HOST" 被用做hostname portion，需要使用實際運行的hostname來替代它。在這個文件中必須要有一個entry來描述 hbase.regionserver.keytab.file

默認:

zookeeper.session.timeout

ZooKeeper 會話超時.Hbase把這個值傳遞改zk集群，向他推薦一個會話的最大超時時間。詳見http://hadoop.apache.org /zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions "The client sends a requested timeout, the server responds with the timeout that it can give the client. "。單位是毫秒

默認: 180000

zookeeper.znode.parent

ZooKeeper中的Hbase的根ZNode。所有的Hbase的ZooKeeper會用這個目錄配置相對路徑。默認情況下，所有的Hbase的ZooKeeper文件路徑是用相對路徑，所以他們會都去這個目錄下面。

默認: /hbase

zookeeper.znode.rootserver

ZNode 保存的根region的路徑. 這個值是由Master來寫，client和regionserver 來讀的。如果設為一個相對地址，父目錄就是 ${zookeeper.znode.parent}.默認情形下，意味着根region的路徑存儲在/hbase/root-region- server.

默認: root-region-server

hbase.zookeeper.quorum

Zookeeper 集群的地址列表，用逗號分割。例如："host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".默認是 localhost,是給偽分布式用的。要修改才能在完全分布式的情況下使用。如果在hbase-env.sh設置了HBASE_MANAGES_ZK，這些ZooKeeper節點就會和Hbase一起啟動。

默認: localhost

hbase.zookeeper.peerport

ZooKeeper節點使用的端口。詳細參見：http://hadoop.apache.org/zookeep ... ReplicatedZooKeeper

默認: 2888

hbase.zookeeper.leaderport

ZooKeeper用來選擇Leader的端口，詳細參見：http://hadoop.apache.org/zookeep ... ReplicatedZooKeeper

默認: 3888

hbase.zookeeper.property.initLimit

ZooKeeper的zoo.conf中的配置。初始化synchronization階段的ticks數量限制

默認: 10

hbase.zookeeper.property.syncLimit

ZooKeeper的zoo.conf中的配置。發送一個請求到獲得承認之間的ticks的數量限制

默認: 5

hbase.zookeeper.property.dataDir

ZooKeeper的zoo.conf中的配置。快照的存儲位置

默認: ${hbase.tmp.dir}/zookeeper

hbase.zookeeper.property.clientPort

ZooKeeper的zoo.conf中的配置。客戶端連接的端口

默認: 2181

hbase.zookeeper.property.maxClientCnxns

ZooKeeper的zoo.conf中的配置。 ZooKeeper集群中的單個節點接受的單個Client(以IP區分)的請求的並發數。這個值可以調高一點，防止在單機和偽分布式模式中出問題。

默認: 2000

hbase.rest.port

HBase REST server的端口

默認: 8080

hbase.rest.readonly

定義REST server的運行模式。可以設置成如下的值： false: 所有的HTTP請求都是被允許的 - GET/PUT/POST/DELETE. true:只有GET請求是被允許的

默認: false

HBase API代碼運行

隨着搭好的集群進行各種測試。。。練習下HBase API

package HbaseTest;

import akka.io.Tcp; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import java.util.ArrayList; import java.util.List; /** * Created by root on 5/30/16. */ public class HbaseTest { private Configuration conf; public void init(){ conf = HBaseConfiguration.create(); } public void createTable(){ Connection conn = null; try{ conn = ConnectionFactory.createConnection(conf); HBaseAdmin hadmin = (HBaseAdmin)conn.getAdmin(); HTableDescriptor desc = new HTableDescriptor("TableName".valueOf("yangsy")); desc.addFamily(new HColumnDescriptor("f1")); if(hadmin.tableExists("yangsy")){ System.out.println("table is exists!"); System.exit(0); }else{ hadmin.createTable(desc); System.out.println("create table success"); } }catch (Exception e){ e.printStackTrace(); }finally { { if(null != conn){ try{ conn.close(); }catch(Exception e){ e.printStackTrace(); } } } } } public void query(){ Connection conn = null; HTable table = null; ResultScanner scan = null; try{ conn = ConnectionFactory.createConnection(conf); table = (HTable)conn.getTable(TableName.valueOf("yangsy")); scan = table.getScanner(new Scan()); for(Result rs : scan){ System.out.println("rowkey:" + new String(rs.getRow())); for(Cell cell : rs.rawCells()){ System.out.println("column:" + new String(CellUtil.cloneFamily(cell))); System.out.println("columnQualifier:"+new String(CellUtil.cloneQualifier(cell))); System.out.println("columnValue:" + new String(CellUtil.cloneValue(cell))); System.out.println("----------------------------"); } } }catch(Exception e){ e.printStackTrace(); }finally{ try { table.close(); if(null != conn) { conn.close(); } }catch (Exception e){ e.printStackTrace(); } } } public void queryByRowKey(){ Connection conn = null; ResultScanner scann = null; HTable table = null; try { conn = ConnectionFactory.createConnection(conf); table = (HTable)conn.getTable(TableName.valueOf("yangsy")); Result rs = table.get(new Get("1445320222118".getBytes())); System.out.println("yangsy the value of rokey:1445320222118"); for(Cell cell : rs.rawCells()){ System.out.println("family" + new String(CellUtil.cloneFamily(cell))); System.out.println("value:"+new String(CellUtil.cloneValue(cell))); } }catch (Exception e){ e.printStackTrace(); }finally{ if(null != table){ try{ table.close(); }catch (Exception e){ e.printStackTrace(); } } } } public void insertData(){ Connection conn = null; HTable hTable = null; try{ conn = ConnectionFactory.createConnection(conf); hTable = (HTable)conn.getTable(TableName.valueOf("yangsy")); Put put1 = new Put(String.valueOf("1445320222118").getBytes()); put1.addColumn("f1".getBytes(),"Column_1".getBytes(),"123".getBytes()); put1.addColumn("f1".getBytes(),"Column_2".getBytes(),"456".getBytes()); put1.addColumn("f1".getBytes(),"Column_3".getBytes(),"789".getBytes()); Put put2 = new Put(String.valueOf("1445320222119").getBytes()); put2.addColumn("f1".getBytes(),"Column_1".getBytes(),"321".getBytes()); put2.addColumn("f1".getBytes(),"Column_2".getBytes(),"654".getBytes()); put2.addColumn("f1".getBytes(),"Column_3".getBytes(),"987".getBytes()); List<Put> puts = new ArrayList<Put>(); puts.add(put1); puts.add(put2); hTable.put(puts); }catch(Exception e){ e.printStackTrace(); }finally{ try { if (null != hTable) { hTable.close(); } }catch(Exception e){ e.printStackTrace(); } } } public static void main(String args[]){ HbaseTest test = new HbaseTest(); test.init(); test.createTable(); test.insertData(); test.query(); } }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 hbase參數調優 OpenTSDB/HBase的調優過程整理 JVM性能調優2：JVM性能調優參數整理 JVM 參數調優配置 HBase配置性能調優(轉) weblogic參數配置和調優 Hive調優參數配置 Hive參數配置調優 hive調優之SQL語法和運行參數 Flink 參數配置和常見參數調優