Apache HBase 是一個高可靠性、高性能、面向列、可伸縮的分布式存儲系統,是NoSQL數據庫,基於Google Bigtable思想的開源實現,可在廉價的PC Server上搭建大規模結構化存儲集群,利用Hadoop HDFS作為其文件存儲系統,利用Hadoop MapReduce來處理HBase海量數據,使用Zookeeper協調服務器集群。Apache HBase官網有詳細的介紹文檔。
Apache HBase的完全分布式集群安裝部署並不復雜,下面是部署的詳細過程:
1、規划HBase集群節點
本實驗有4個節點,要配置HBase Master、Master-backup、RegionServer,節點主機操作系統為Centos 6.9,各節點的進程規划如下:
主機 | IP | 節點進程 |
---|---|---|
hd1 | 172.17.0.1 | Master、Zookeeper |
hd2 | 172.17.0.2 | Master-backup、RegionServer、Zookeeper |
hd3 | 172.17.0.3 | RegionServer、Zookeeper |
hd4 | 172.17.0.4 | RegionServer |
2、安裝 JDK、Zookeeper、Hadoop
各服務器節點關閉防火牆、設置selinux為disabled
安裝 JDK、Zookeeper、Apache Hadoop 分布式集群(具體過程詳見我另一篇博文:Apache Hadoop 2.8分布式集群搭建超詳細過程)
安裝后設置環境變量,這些變量在安裝配置HBase時需要用到
export JAVA_HOME=/usr/java/jdk1.8.0_131 export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/home/ahadoop/hadoop-2.8.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export ZOOKEEPER_HOME=/home/ahadoop/zookeeper-3.4.10 export PATH=$PATH:$ZOOKEEPER_HOME/bin
3、安裝NTP,實現服務器節點間的時間一致
如果服務器節點之間時間不一致,可能會引發HBase的異常,這一點在HBase官網上有特別強調。在這里,設置第1個節點hd1為NTP的服務端節點,也即該節點(hd1)從國家授時中心同步時間,然后其它節點(hd2、hd3、hd4)作為客戶端從hd1同步時間
(1)安裝 NTP
# 安裝 NTP 服務 yum -y install ntp # 設置為開機啟動 chkconfig --add ntpd chkconfig ntpd on
啟動 NTP 服務
service ntpd start
(2)配置NTP服務端
在節點hd1,編輯 /etc/ntp.conf 文件,配置NTP服務,具體的配置改動項見以下中文注釋
vi /etc/ntp.conf
# For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5). driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery # Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # 添加允許接收請求的網絡范圍 restrict 172.17.0.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). #server 0.centos.pool.ntp.org iburst # 同步時鍾的服務器 server 210.72.145.44 perfer # 中國國家受時中心 server 202.112.10.36 # 1.cn.pool.ntp.org server 59.124.196.83 # 0.asia.pool.ntp.org #broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client #broadcast 224.0.1.1 autokey # multicast server #multicastclient 224.0.1.1 # multicast client #manycastserver 239.255.254.254 # manycast server #manycastclient 239.255.254.254 autokey # manycast client # 允許上層時間服務器主動修改本機時間 restrict 210.72.145.44 nomodify notrap noquery restrict 202.112.10.36 nomodify notrap noquery restrict 59.124.196.83 nomodify notrap noquery # 外部時間服務器不可用時,以本地時間作為時間服務 server 127.0.0.1 # local clock fudge 127.0.0.1 stratum 10 # Enable public key cryptography. #crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys # Specify the key identifiers which are trusted. #trustedkey 4 8 42 # Specify the key identifier to use with the ntpdc utility. #requestkey 8 # Specify the key identifier to use with the ntpq utility. #controlkey 8 # Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats # Disable the monitoring facility to prevent amplification attacks using ntpdc # monlist command when default restrict does not include the noquery flag. See # CVE-2013-5211 for more details. # Note: Monitoring will not be disabled with the limited restriction flag. disable monitor
重啟 NTP 服務
service ntpd restart
然后查看ntp狀態
[root@31d48048cb1e ahadoop]# service ntpd status ntpd dead but pid file exists
這時發現有報錯,原來ntpd服務有一個限制,ntpd僅同步更改與ntp server時差在1000s內的時間,而查了服務器節點的時間與實際時間差已超過了1000s,因此,必須先手動修改下操作系統時間與ntp server相差時間在1000s以內,然后再去同步服務
# 如果操作系統的時區有錯,先修改下時區(亞洲-上海) cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime # 修改日期、時間 date -s 20170703 date -s 15:32:00
其實還有另外一個小技巧,就是在安裝好NTP服務后,先通過授時服務器獲得准確的時間,這樣也不用手工修改了,命令如下:
ntpdate -u pool.ntp.orgpool.ntp.org
【注意】如果是在docker里面執行同步時間操作,系統會報錯
9 Jan 05:13:57 ntpdate[7299]: step-systime: Operation not permitted
如果出現這個錯誤,說明系統不允許自行設置時間。在docker里面,由於docker容器共享的是宿主機的內核,而修改系統時間是內核層面的功能,因此,在 docker 里面是無法修改時間
(3)配置NTP客戶端
在節點hd2、hd3、hd4編輯 /etc/ntp.conf 文件,配置 NPT 客戶端,具體的配置改動項,見以下的中文注釋
vi /etc/ntp.conf
# For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5). driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery # Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). #server 0.centos.pool.ntp.org iburst # 同步服務端的時間 server 172.17.0.1 restrict 172.17.0.1 nomodify notrap noquery # 同步失敗,則使用本地的時間 server 127.0.0.1 fudge 127.0.0.1 stratum 10 #broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client #broadcast 224.0.1.1 autokey # multicast server #multicastclient 224.0.1.1 # multicast client #manycastserver 239.255.254.254 # manycast server #manycastclient 239.255.254.254 autokey # manycast client # Enable public key cryptography. #crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys # Specify the key identifiers which are trusted. #trustedkey 4 8 42 # Specify the key identifier to use with the ntpdc utility. #requestkey 8 # Specify the key identifier to use with the ntpq utility. #controlkey 8 # Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats # Disable the monitoring facility to prevent amplification attacks using ntpdc # monlist command when default restrict does not include the noquery flag. See # CVE-2013-5211 for more details. # Note: Monitoring will not be disabled with the limited restriction flag. disable monitor
重啟NTP服務
service ntpd restart
啟動后,查看時間的同步情況
$ ntpq -p
$ ntpstat
4、修改ulimit
在Apache HBase官網的介紹中有提到,使用 HBase 推薦修改ulimit,以增加同時打開文件的數量,推薦 nofile 至少 10,000 但最好 10,240 (It is recommended to raise the ulimit to at least 10,000, but more likely 10,240, because the value is usually expressed in multiples of 1024.)
修改 /etc/security/limits.conf 文件,在最后加上nofile(文件數量)、nproc(進程數量)屬性,如下:
vi /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
修改后,重啟服務器生效
reboot
5、安裝配置Apache HBase
Apache HBase 官網提供了默認配置說明、參考的配置例子,建議在配置之前先閱讀一下。
在本實驗中,采用了獨立的zookeeper配置,也hadoop共用,zookeeper具體配置方法可參考我的另一篇博客。其實在HBase中,還支持使用內置的zookeeper服務,但如果是在生產環境中,建議單獨部署,方便日常的管理。
(1)下載Apache HBase
從官網上面下載最新的二進制版本:hbase-1.2.6-bin.tar.gz
然后解壓
tar -zxvf hbase-1.2.6-bin.tar.gz
配置環境變量
vi ~/.bash_profile
export HBASE_HOME=/home/ahadoop/hbase-1.2.6 export PATH=$PATH:$HBASE_HOME/bin # 使用環境變量生效 source ~/.bash_profile
(2)復制hdfs-site.xml配置文件
復制$HADOOP_HOME/etc/hadoop/hdfs-site.xml到$HBASE_HOME/conf目錄下,這樣以保證hdfs與hbase兩邊一致,這也是官網所推薦的方式。在官網中提到一個例子,例如hdfs中配置的副本數量為5,而默認為3,如果沒有將最新的hdfs-site.xml復制到$HBASE_HOME/conf目錄下,則hbase將會按3份備份,從而兩邊不一致,導致會出現異常。
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml $HBASE_HOME/conf/
(3)配置hbase-site.xml
使用自帶的zk
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hbase/zookeeper/data</value>
</property>
</configuration>
使用單獨的zk
編輯 $HBASE_HOME/conf/hbase-site.xml
<configuration> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hd1,hd2,hd3</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/ahadoop/zookeeper-data</value> <description> 注意這里的zookeeper數據目錄與hadoop ha的共用,也即要與 zoo.cfg 中配置的一致 Property from ZooKeeper config zoo.cfg. The directory where the snapshot is stored. </description> </property> <property> <name>hbase.rootdir</name> <value>hdfs://hd1:9000/hbase</value> <description>The directory shared by RegionServers. 官網多次強調這個目錄不要預先創建,hbase會自行創建,否則會做遷移操作,引發錯誤 至於端口,有些是8020,有些是9000,看 $HADOOP_HOME/etc/hadoop/hdfs-site.xml 里面的配置,本實驗配置的是 dfs.namenode.rpc-address.hdcluster.nn1 , dfs.namenode.rpc-address.hdcluster.nn2 </description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>分布式集群配置,這里要設置為true,如果是單節點的,則設置為false The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed ZooKeeper true: fully-distributed with unmanaged ZooKeeper Quorum (see hbase-env.sh) </description> </property> </configuration>
(4)配置regionserver文件
編輯 $HBASE_HOME/conf/regionservers 文件,輸入要運行 regionserver 的主機名
hd2
hd3
hd4
(5)配置 backup-masters 文件(master備用節點)
HBase 支持運行多個 master 節點,因此不會出現單點故障的問題,但只能有一個活動的管理節點(active master),其余為備用節點(backup master),編輯 $HBASE_HOME/conf/backup-masters 文件進行配置備用管理節點的主機名
hd2
(6)配置 hbase-env.sh 文件
編輯 $HBASE_HOME/conf/hbase-env.sh 配置環境變量,由於本實驗是使用單獨配置的zookeeper,因此,將其中的 HBASE_MANAGES_ZK 設置為 false
export HBASE_MANAGES_ZK=false
到此,HBase 配置完畢
6、啟動 Apache HBase
可使用 $HBASE_HOME/bin/start-hbase.sh 指令啟動整個集群,如果要使用該命令,則集群的節點必須實現ssh的免密碼登錄,這樣才能到不同的節點啟動服務
為了更加深入了解HBase啟動過程,本實驗將對各個節點依次啟動進程,經查看 start-hbase.sh 腳本,里面的啟動順序如下
if [ "$distMode" == 'false' ] then "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master $@ else "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" $commandToRun zookeeper "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_REGIONSERVERS}" $commandToRun regionserver "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_BACKUP_MASTERS}" $commandToRun master-backup fi
也就是使用 hbase-daemon.sh 命令依次啟動 zookeeper、master、regionserver、master-backup
因此,我們也按照這個順序,在各個節點進行啟動
在啟動HBase之前,必須先啟動Hadoop,以便於HBase初始化、讀取存儲在hdfs上的數據
(1)啟動zookeeper(hd1、hd2、hd3節點)
zkServer.sh start &
(2)啟動hadoop分布式集群(集群的具體配置和節點規划,見我的另一篇博客)
# 啟動 journalnode(hd1,hd2,hd3) hdfs journalnode & # 啟動 namenode active(hd1) hdfs namenode & # 啟動 namenode standby(hd2) hdfs namenode & # 啟動ZookeeperFailoverController(hd1,hd2) hdfs zkfc & # 啟動 datanode(hd2,hd3,hd4) hdfs datanode &
(3)啟動hbase master(hd1)
hbase-daemon.sh start master &
(4)啟動hbase regionserver(hd2、hd3、hd4)
hbase-daemon.sh start regionserver &
(5)啟動hbase backup-master(hd2)
hbase-daemon.sh start master --backup &
這里很奇怪,在 $HBASE_HOME/bin/start-hbase.sh 寫着啟動 backup-master 的命令為
"$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_BACKUP_MASTERS}" $commandToRun master-backup
但實際按這個指令執行時,卻報錯提示無法加載類 master-backup
[ahadoop@1620d6ed305d ~]$ hbase-daemon.sh start master-backup &
[5] 1113
[ahadoop@1620d6ed305d ~]$ starting master-backup, logging to /home/ahadoop/hbase-1.2.6/logs/hbase-ahadoop-master-backup-1620d6ed305d.out
Error: Could not find or load main class master-backup
最后經查資料,才改用了以下命令為啟動 backup-master
hbase-daemon.sh start master --backup &
經過以上步驟,就已成功地啟動了hbase集群,可到每個節點里面使用 jps 指令查看 hbase 的啟動進程情況。
啟動后,再查看 hdfs 、zookeeper 的 /hbase 目錄,發現均已初始化,並且已寫入了相應的文件,如下
[ahadoop@ee8319514df6 ~]$ hadoop fs -ls /hbase
17/07/02 13:14:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 7 items drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/.tmp drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/MasterProcWALs drwxr-xr-x - ahadoop supergroup 0 2017-07-02 13:03 /hbase/WALs drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/data -rw-r--r-- 3 ahadoop supergroup 42 2017-07-02 12:55 /hbase/hbase.id -rw-r--r-- 3 ahadoop supergroup 7 2017-07-02 12:55 /hbase/hbase.version drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/oldWALs
[ahadoop@31d48048cb1e ~]$ zkCli.sh -server hd1:2181
Connecting to hd1:2181
2017-07-05 11:31:44,663 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2017-07-05 11:31:44,667 [myid:] - INFO [main:Environment@100] - Client environment:host.name=31d48048cb1e
2017-07-05 11:31:44,668 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_131
2017-07-05 11:31:44,672 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2017-07-05 11:31:44,673 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_131/jre
2017-07-05 11:31:44,674 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/home/ahadoop/zookeeper-3.4.10/bin/../build/classes:/home/ahadoop/zookeeper-3.4.10/bin/../build/lib/*.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/slf4j-api-1.6.1.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/netty-3.10.5.Final.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/log4j-1.2.16.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/jline-0.9.94.jar:/home/ahadoop/zookeeper-3.4.10/bin/../zookeeper-3.4.10.jar:/home/ahadoop/zookeeper-3.4.10/bin/../src/java/lib/*.jar:/home/ahadoop/zookeeper-3.4.10/bin/../conf:.:/usr/java/jdk1.8.0_131/lib:/usr/java/jdk1.8.0_131/lib/dt.jar:/usr/java/jdk1.8.0_131/lib/tools.jar:/home/ahadoop/apache-ant-1.10.1/lib
2017-07-05 11:31:44,674 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-07-05 11:31:44,675 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2017-07-05 11:31:44,675 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2017-07-05 11:31:44,678 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2017-07-05 11:31:44,679 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2017-07-05 11:31:44,679 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.105-1.el6.elrepo.x86_64
2017-07-05 11:31:44,680 [myid:] - INFO [main:Environment@100] - Client environment:user.name=ahadoop
2017-07-05 11:31:44,680 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/ahadoop
2017-07-05 11:31:44,681 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/ahadoop
2017-07-05 11:31:44,686 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=hd1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@799f7e29 Welcome to ZooKeeper! 2017-07-05 11:31:44,724 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 31d48048cb1e/172.17.0.1:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2017-07-05 11:31:44,884 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@876] - Socket connection established to 31d48048cb1e/172.17.0.1:2181, initiating session [zk: hd1:2181(CONNECTED) 0] 2017-07-05 11:31:44,912 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server 31d48048cb1e/172.17.0.1:2181, sessionid = 0x15d10c18fc70002, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: hd1:2181(CONNECTED) 1] ls /hbase [replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, region-in-transition, online-snapshot, running, recovering-regions, draining, hbaseid, table]
7、HBase 測試使用
使用hbase shell進入到 hbase 的交互命令行界面,這時可進行測試使用
hbase shell
(1)查看集群狀態和節點數量
hbase(main):001:0> status
1 active master, 1 backup masters, 4 servers, 0 dead, 0.5000 average load
(2)創建表
hbase(main):002:0> create 'testtable','c1','c2' 0 row(s) in 1.4850 seconds => Hbase::Table - testtable
hbase創建表create命令語法為:表名、列名1、列名2、列名3……
(3)查看表
hbase(main):003:0> list 'testtable' TABLE testtable 1 row(s) in 0.0400 seconds => ["testtable"]
(4)導入數據
hbase(main):004:0> put 'testtable','row1','c1','row1_c1_value' 0 row(s) in 0.2230 seconds hbase(main):005:0> put 'testtable','row2','c2:s1','row1_c2_s1_value' 0 row(s) in 0.0310 seconds hbase(main):006:0> put 'testtable','row2','c2:s2','row1_c2_s2_value' 0 row(s) in 0.0170 seconds
導入數據的命令put的語法為表名、行值、列名(列名可加冒號,表示這個列簇下面還有子列)、列數據
(5)全表掃描數據
hbase(main):007:0> scan 'testtable' ROW COLUMN+CELL row1 column=c1:, timestamp=1499225862922, value=row1_c1_value row2 column=c2:s1, timestamp=1499225869471, value=row1_c2_s1_value row2 column=c2:s2, timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0820 seconds
(6)根據條件查詢數據
hbase(main):008:0> get 'testtable','row1' COLUMN CELL c1: timestamp=1499225862922, value=row1_c1_value 1 row(s) in 0.0560 seconds hbase(main):009:0> get 'testtable','row2' COLUMN CELL c2:s1 timestamp=1499225869471, value=row1_c2_s1_value c2:s2 timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0350 seconds
(7)表失效
使用 disable 命令可將某張表失效,失效后該表將不能使用,例如執行全表掃描操作,會報錯,如下
hbase(main):010:0> disable 'testtable' 0 row(s) in 2.3090 seconds hbase(main):011:0> scan 'testtable' ROW COLUMN+CELL ERROR: testtable is disabled. Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, ROWPREFIXFILTER, TIMESTAMP, MAXLENGTH or COLUMNS, CACHE or RAW, VERSIONS, ALL_METRICS or METRICS If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family'. The filter can be specified in two ways: 1. Using a filterString - more information on this is available in the Filter Language document attached to the HBASE-4176 JIRA 2. Using the entire package name of the filter. If you wish to see metrics regarding the execution of the scan, the ALL_METRICS boolean should be set to true. Alternatively, if you would prefer to see only a subset of the metrics, the METRICS array can be defined to include the names of only the metrics you care about. Some examples: hbase> scan 'hbase:meta' hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'} hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} hbase> scan 't1', {REVERSED => true} hbase> scan 't1', {ALL_METRICS => true} hbase> scan 't1', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']} hbase> scan 't1', {ROWPREFIXFILTER => 'row2', FILTER => " (QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"} hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} hbase> scan 't1', {CONSISTENCY => 'TIMELINE'} For setting the Operation Attributes hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}} hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} Also for experts, there is an advanced option -- RAW -- which instructs the scanner to return all cells (including delete markers and uncollected deleted cells). This option cannot be combined with requesting specific COLUMNS. Disabled by default. Example: hbase> scan 't1', {RAW => true, VERSIONS => 10} Besides the default 'toStringBinary' format, 'scan' supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the scan specification. The FORMATTER can be stipulated: 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString) 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'. Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } Note that you can specify a FORMATTER by column only (cf:qualifier). You cannot specify a FORMATTER for all columns of a column family. Scan can also be used directly from a table, by first getting a reference to a table, like such: hbase> t = get_table 't' hbase> t.scan Note in the above situation, you can still provide all the filtering, columns, options, etc as described above.
(8)表重新生效
使用 enable 可使表重新生效,表生效后,即可對表進行操作,例如進行全表掃描操作
hbase(main):012:0> enable 'testtable' 0 row(s) in 1.2800 seconds hbase(main):013:0> scan 'testtable' ROW COLUMN+CELL row1 column=c1:, timestamp=1499225862922, value=row1_c1_value row2 column=c2:s1, timestamp=1499225869471, value=row1_c2_s1_value row2 column=c2:s2, timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0590 seconds
(9)刪除數據表
使用drop命令對表進行刪除,但只有表在失效的情況下,才能進行刪除,否則會報錯,如下
hbase(main):014:0> drop 'testtable' ERROR: Table testtable is enabled. Disable it first. Here is some help for this command: Drop the named table. Table must first be disabled: hbase> drop 't1' hbase> drop 'ns1:t1'
先對表失效,然后再刪除,則可順序刪除表
hbase(main):008:0> disable 'testtable' 0 row(s) in 2.3170 seconds hbase(main):012:0> drop 'testtable' 0 row(s) in 1.2740 seconds
(10)退出 hbase shell
quit
以上就是使用hbase shell進行簡單的測試和使用
8、HBase 管理頁面
HBase 還提供了管理頁面,供用戶查看,可更加方便地查看集群狀態
在瀏覽器中輸入 http://172.17.0.1:16010 地址(默認端口為 16010),即可進入到管理頁面,如下圖
查看HBase里面的表信息,點擊上方的菜單欄 Table Details 可查看所有表信息,如下圖
在主頁的 Tables 下面也會列出表名出來,點擊可查看某張表的信息,如下圖
在 Tables 中點擊 System Tables 查看系統表,主要是元數據、命名空間,如下圖
以上就是Apache HBase集群配置,以及測試使用的詳細過程,歡迎大家批評指正,共同交流進步。