Hadoop學習筆記—15.HBase框架學習（基礎實踐篇）

本文轉載自查看原文 2015-04-09 22:41 4393 【016】雲計算與大數據

一、HBase的安裝配置

1.1 偽分布模式安裝

　　偽分布模式安裝即在一台計算機上部署HBase的各個角色，HMaster、HRegionServer以及ZooKeeper都在一台計算機上來模擬。

　　首先，准備好HBase的安裝包，我這里使用的是HBase-0.94.7的版本，已經上傳至百度網盤之中（URL：http://pan.baidu.com/s/1pJ3HTY7）

　　（1）通過FTP將hbase的安裝包拷貝到虛擬機hadoop-master中，並執行一系列操作：解壓縮、重命名、設置環境變量

　　①解壓縮：tar -zvxf hbase-0.94.7-security.tar.gz

　　②重命名：mv hbase-94.7-security hbase

　　③設置環境變量：vim /etc/profile，增加內容如下，修改后重新生效：source /etc/profile

export HBASE_HOME=/usr/local/hbase

export PATH=.:$HADOOP_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$PATH

　　（2）進入hbase/conf目錄下，修改hbase-env.sh文件：

export JAVA_HOME=/usr/local/jdk
export HBASE_MANAGES_ZK=true #告訴HBase使用它自己的zookeeper實例，分布式模式下需要設置為false

　　（3）在hbase/conf目錄下，繼續修改hbase-site.xml文件：

<property>
　　<name>hbase.rootdir</name>
　　<value>hdfs://hadoop-master:9000/hbase</value>
</property>
<property>
　　<name>hbase.cluster.distributed</name>
　　<value>true</value>
</property>
<property>
　　<name>hbase.zookeeper.quorum</name>
　　<value>hadoop-master</value>
</property>
<property>
　　<name>dfs.replication</name>
　　<value>1</value>
</property>

　　（4）【可選步湊】修改regionservers文件，將localhost改為主機名：hadoop-master

　　（5）啟動HBase：start-hbase.sh

PS：由上一篇可知，HBase是建立在Hadoop HDFS之上的，因此在啟動HBase之前要確保已經啟動了Hadoop，啟動Hadoop的命令是：start-all.sh

　　（6）驗證是否啟動HBase：jps

　　由上圖發現，多了三個java進程：HMaster、HRegionServer以及HQuorumPeer。

　　還可以通過訪問HBase的Web接口查看：http://hadoop-master:60010

1.2 分布式模式安裝

　　本次安裝在1.1節的偽分布模式的基礎上進行修改搭建分布式模式，本次的集群實驗環境結構如下圖所示：

　　由上圖可知，HMaster角色是192.168.80.100（主機名：hadoop-master），而兩個HRegionServer角色則是兩台192.168.80.101（主機名：hadoop-slave1）和192.168.80.102（主機名：hadoop-slave2）組成的。

　　（1）修改hadoop-master服務器上的的幾個關鍵配置文件：

　　①修改hbase/conf/hbase-env.sh：將最后一行修改為如下內容

export HBASE_MANAGES_ZK=false #不使用HBase自帶的zookeeper實例

　　②修改hbase/conf/regionservers：將原來的hadoop-master改為如下內容

hadoop-slave1

hadoop-slave2

　　（2）將hadoop-master上的hbase文件夾與/etc/profile配置文件整體復制到hadoop-slave1與hadoop-slave2中：

scp -r /usr/local/hbase hadoop-slave1:/usr/local/　　

scp -r /usr/local/hbase hadoop-slave2:/usr/local/

scp /etc/profile hadoop-slave1:/etc/

scp /etc/profile hadoop-slave2:/etc/

　　（3）在hadoop-slave1與hadoop-slave2中使配置文件生效：

source /etc/profile

　　（4）在hadoop-master中啟動Hadoop、Zookeeper與HBase：（注意先后順序）

start-all.sh

zkServer.sh start

start-hbase.sh

　　（5）在HBase的Web接口中查看Hbase集群狀態：

二、HBase Shell基本命令

2.1 DDL：創建與刪除表

　　（1）創建表：

>create 'users','user_id','address','info'

#這里創建了一張表users,有三個列族user_id,address,info

　　獲取表users的具體描述：

>describe 'users'

　　（2）列出所有表：

>list

　　（3）刪除表：在HBase中刪除表需要兩步，首先disable，其次drop

>disable 'users'

>drop 'users'

2.2 DML：增刪查改

　　（1）增加記錄：put

>put 'users','xiaoming','info:age','24';

>put 'users','xiaoming','info:birthday','1987-06-17';

>put 'users','xiaoming','info:company','alibaba';

>put 'users','xiaoming','address:contry','china';

>put 'users','xiaoming','address:province','zhejiang';

>put 'users','xiaoming','address:city','hangzhou';

　　（2）掃描users表的所有記錄：scan

>scan 'users'

　　（3）獲取一條記錄

　　①取得一個id（row_key）的所有數據

>get 'users','xiaoming'

　　②獲取一個id的一個列族的所有數據

>get 'users','xiaoming','info'

　　③獲取一個id，一個列族中一個列的所有數據

>get 'users','xiaoming','info:age'

　　（4）更新一條記錄：依然put

　　例如：更新users表中小明的年齡為29

>put 'users','xiaoming','info:age' ,'29'

>get 'users','xiaoming','info:age

　　（5）刪除記錄：delete與deleteall

　　①刪除xiaoming的值的'info:age'字段

>delete 'users','xiaoming','info:age'

　　②刪除xiaoming的整行信息

>deleteall 'users','xiaoming'

2.3 Other：其他幾個比較有用的命令

　　（1）count：統計行數

>count 'users'

　　（2）truncate：清空指定表

>truncate 'users'

三、HBase Java API操作

3.1 預備工作

　　（1）導入HBase的項目jar包

　　（2）導入HBase/lib下的所有依賴jar包

3.2 HBase Java開發必備：獲取配置

    /*
     * 獲取HBase配置
     */
    private static Configuration getConfiguration()
    {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir","hdfs://hadoop-master:9000/hbase");
        //使用eclipse時必須添加這個，否則無法定位
        conf.set("hbase.zookeeper.quorum","hadoop-master");
        
        return conf;
    }

3.3 使用HBaseAdmin進行DDL操作

　　（1）創建表

　　 /*
     * 創建表
     */
    private static void createTable()
            throws IOException {
        HBaseAdmin admin = new HBaseAdmin(getConfiguration());
        if (admin.tableExists(TABLE_NAME)) {
            System.out.println("The table is existed!");
        }else{
            HTableDescriptor tableDesc = new HTableDescriptor(TABLE_NAME);
            tableDesc.addFamily(new HColumnDescriptor(FAMILY_NAME));
            admin.createTable(tableDesc);
            System.out.println("Create table success!");
        }
    }

　　（2）刪除表

    /*
     * 刪除表
     */
    private static void dropTable(String tableName)
            throws IOException {        
        HBaseAdmin admin = new HBaseAdmin(getConfiguration());
        if(admin.tableExists(tableName)){
            try {
              admin.disableTable(tableName);
              admin.deleteTable(tableName);
            } catch (IOException e) {
              e.printStackTrace();
              System.out.println("Delete "+tableName+" failed!");
            }
        }
        System.out.println("Delete "+tableName+" success!");
    }

3.4 使用HTable進行DML操作

　　（1）新增記錄

    public static void putRecord(String tableName, String row, 
            String columnFamily, String column, String data) 
                    throws IOException{
        HTable table = new HTable(getConfiguration(), tableName);
        Put p1 = new Put(Bytes.toBytes(row));
        p1.add(Bytes.toBytes(columnFamily), Bytes.toBytes(column),     Bytes.toBytes(data));
        table.put(p1);
        System.out.println("put'"+row+"',"+columnFamily+":"+column+"','"+data+"'");
    }

　　（2）讀取記錄

    public static void getRecord(String tableName, String row) throws IOException{
        HTable table = new HTable(getConfiguration(), tableName);
        Get get = new Get(Bytes.toBytes(row));
        Result result = table.get(get);
        System.out.println("Get: "+result);
    }

　　（3）全表掃描

    public static void scan(String tableName) throws IOException{
    　　HTable table = new HTable(getConfiguration(), tableName);
    　　Scan scan = new Scan();
    　　ResultScanner scanner = table.getScanner(scan);
    　　for (Result result : scanner) {
        　　System.out.println("Scan: "+result);
    　　}
    }

3.5 API實戰：詳單入庫

　　結合本筆記第五篇《自定義類型處理手機上網日志》的手機上網日志為背景，我們要做的就是將日志通過MapReduce導入到HBase中進行存儲。該日志的數據結構定義如下圖所示：（該文件的下載地址為：http://pan.baidu.com/s/1dDzqHWX）

log

　　（1）在HBase中通過Shell創建一張表：wlan_log

> create 'wlan_log','cf'

　　這里為了簡單定義，之定義了一個列族cf

　　（2）在ecplise中新建一個類：BatchImportJob，該類的代碼如下所示：

package hbase;

import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

public class BatchImportJob {

    static class BatchImportMapper extends
            Mapper<LongWritable, Text, LongWritable, Text> {
        
        SimpleDateFormat dateformat1 = new SimpleDateFormat("yyyyMMddHHmmss");
        Text v2 = new Text();

        protected void map(LongWritable key, Text value, Context context)
                throws java.io.IOException, InterruptedException {
            final String[] splited = value.toString().split("\t");
            try {
                final Date date = new Date(Long.parseLong(splited[0].trim()));
                final String dateFormat = dateformat1.format(date);
                String rowKey = splited[1] + ":" + dateFormat;
                v2.set(rowKey + "\t" + value.toString());
                context.write(key, v2);
            } catch (NumberFormatException e) {
                final Counter counter = context.getCounter("BatchImportJob",
                        "ErrorFormat");
                counter.increment(1L);
                System.out.println("出錯了" + splited[0] + " " + e.getMessage());
            }
        };
    }

    static class BatchImportReducer extends
            TableReducer<LongWritable, Text, NullWritable> {
        protected void reduce(LongWritable key,
                java.lang.Iterable<Text> values, Context context)
                throws java.io.IOException, InterruptedException {
            for (Text text : values) {
                final String[] splited = text.toString().split("\t");

                final Put put = new Put(Bytes.toBytes(splited[0]));
                put.add(Bytes.toBytes("cf"), Bytes.toBytes("date"),
                        Bytes.toBytes(splited[1]));
                put.add(Bytes.toBytes("cf"), Bytes.toBytes("msisdn"),
                        Bytes.toBytes(splited[2]));
                // 省略其他字段，調用put.add(....)即可
                context.write(NullWritable.get(), put);
            }
        };
    }

    public static void main(String[] args) throws Exception {
        final Configuration configuration = new Configuration();
        // 設置zookeeper
        configuration.set("hbase.zookeeper.quorum", "hadoop-master");
        // 設置hbase表名稱
        configuration.set(TableOutputFormat.OUTPUT_TABLE, "wlan_log");
        // 將該值改大，防止hbase超時退出
        configuration.set("dfs.socket.timeout", "180000");

        final Job job = new Job(configuration, "HBaseBatchImportJob");

        job.setMapperClass(BatchImportMapper.class);
        job.setReducerClass(BatchImportReducer.class);
        // 設置map的輸出，不設置reduce的輸出類型
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        // 不再設置輸出路徑，而是設置輸出格式類型
        job.setOutputFormatClass(TableOutputFormat.class);

        FileInputFormat.setInputPaths(job, "hdfs://hadoop-master:9000/testdir/input/HTTP_20130313143750.dat");

        boolean success = job.waitForCompletion(true);
        if (success) {
            System.out.println("Bath import to HBase success!");
            System.exit(0);
        } else {
            System.out.println("Batch import to HBase failed!");
            System.exit(1);
        }
    }

}

View Code

　　通過執行后，在HBase中通過Shell命令（list）查看導入結果：

　　（3）在eclipse中新建一個類：MobileLogQueryApp，對已經存儲的wlan_log進行查詢的Java開發，該類的代碼如下所示：

package hbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

public class MobileLogQueryApp {

    private static final String TABLE_NAME = "wlan_log";
    private static final String FAMILY_NAME = "cf";

    /**
     * HBase Java API基本使用示例
     * 
     * @throws Exception
     */
    public static void main(String[] args) throws Exception {
        scan(TABLE_NAME,"13600217502");
        System.out.println();
        scanPeriod(TABLE_NAME, "136");
    }

    /*
     * 查詢手機13600217502的所有上網記錄
     */
    public static void scan(String tableName, String mobileNum)
            throws IOException {
        HTable table = new HTable(getConfiguration(), tableName);
        Scan scan = new Scan();
        scan.setStartRow(Bytes.toBytes(mobileNum + ":/"));
        scan.setStopRow(Bytes.toBytes(mobileNum + "::"));
        ResultScanner scanner = table.getScanner(scan);
        int i = 0;
        for (Result result : scanner) {
            System.out.println("Scan: " + i + " " + result);
            i++;
        }
    }

    /*
     * 查詢134號段的所有上網記錄
     */
    public static void scanPeriod(String tableName, String period)
            throws IOException {
        HTable table = new HTable(getConfiguration(), tableName);
        Scan scan = new Scan();
        scan.setStartRow(Bytes.toBytes(period + "/"));
        scan.setStopRow(Bytes.toBytes(period + ":"));
        scan.setMaxVersions(1);
        ResultScanner scanner = table.getScanner(scan);
        int i = 0;
        for (Result result : scanner) {
            System.out.println("Scan: " + i + " " + result);
            i++;
        }
    }

    /*
     * 獲取HBase配置
     */
    private static Configuration getConfiguration() {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir", "hdfs://hadoop-master:9000/hbase");
        // 使用eclipse時必須添加這個，否則無法定位
        conf.set("hbase.zookeeper.quorum", "hadoop-master");

        return conf;
    }

}

View Code

　　這里主要進行了兩個查詢操作：按指定手機號碼查詢和按指定手機號碼網段區間查詢，執行結果如下所示：

參考資料

　　（1）吳超，《Hadoop深入淺出》：http://www.superwu.cn

　　（2）新城主力唱好，《HBase Java API》：http://www.cnblogs.com/NicholasLee/archive/2012/09/13/2683432.html

作者：周旭龍

出處：http://www.cnblogs.com/edisonchou/

本文版權歸作者和博客園共有，歡迎轉載，但未經作者同意必須保留此段聲明，且在文章頁面明顯位置給出原文鏈接。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hadoop學習筆記—15.HBase框架學習（基礎知識篇） [Hadoop] Hadoop學習筆記之Hadoop基礎 HBase學習筆記（一）——基礎入門 Photoshop學習筆記基礎篇 ts 學習筆記-基礎篇 Hadoop學習筆記（一）：零Linux基礎安裝hadoop過程筆記 hadoop 學習筆記：mapreduce框架詳解 FPGA基礎學習(5) -- 時序約束（實踐篇） Hadoop學習筆記—16.Pig框架學習 Hadoop學習筆記—17.Hive框架學習