使用Cloudera Manager搭建Hive服務


              使用Cloudera Manager搭建Hive服務

                                      作者:尹正傑

版權聲明:原創作品,謝絕轉載!否則將追究法律責任。

 

 

 

一.安裝Hive環境

1>.進入CM服務安裝向導

2>.選擇需要安裝的hive服務 

3>.選擇hive的依賴環境,我們選擇第一個即可(hive不僅僅可以使用mr計算,還可以使用tez計算喲~ 

4>.為Hive分配角色

    Hive Metastore是管理和存儲元信息的服務,它保存了數據庫的基本信息以及數據表的定義等,為了能夠可靠地保存這些元信息,Hive Metastore一般將它們持久化到關系型數據庫中,默認采用了嵌入式數據庫Derby(數據存放在內存中),用戶可以根據需要啟用其他數據庫,比如MySQL。

    推薦閱讀:https://www.cnblogs.com/yinzhengjie/p/10836132.html
Hive Metastore 簡介戳我
    HCatalog是Hadoop中的表和存儲管理層,能夠支持用戶用不同的工具(Pig、MapReduce)更容易地表格化讀寫數據。

    HCatalog從Apache孵化器畢業,並於2013年3月26日與Hive項目合並。

    Hive版本0.11.0是包含HCatalog的第一個版本。(隨Hive一起安裝),CDH 5.15.1默認使用的是Hive版本為:1.1.0+cdh5.15.1+1395,即Apache Hive 1.1.0版本。



    HCatalog的表抽象向用戶提供了Hadoop分布式文件系統(HDFS)中數據的關系視圖,並確保用戶不必擔心數據存儲在哪里或以什么格式存儲 - RCFile格式,文本文件,SequenceFiles或ORC文件。

    HCatalog支持讀寫任意格式的SerDe(序列化 - 反序列化)文件。默認情況下,HCatalog支持RCFile,CSV,JSON和SequenceFile以及ORC文件格式。要使用自定義格式,您必須提供InputFormat,OutputFormat和SerDe。


    HCatalog構建於Hive metastore,並包含Hive的DDL。HCatalog為Pig和MapReduce提供讀寫接口,並使用Hive的命令行界面發布數據定義和元數據探索命令。
HCatalog 簡介戳我
    HiveServer2(HS2)是一個服務端接口,使遠程客戶端可以執行對Hive的查詢並返回結果。目前基於Thrift RPC的實現是HiveServer的改進版本,並支持多客戶端並發和身份驗證

    啟動hiveServer2服務后,就可以使用jdbc,odbc,或者thrift的方式連接。 用java編碼jdbc或則beeline連接使用jdbc的方式,據說hue是用thrift的方式連接的hive服務。
HiveServer2 簡介戳我

5>.hive的數據庫設置(存儲元數據metastore的數據庫)

mysql>  CREATE DATABASE hive  CHARACTER SET = utf8;
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%' IDENTIFIED BY 'yinzhengjie' WITH GRANT OPTION;                              
Query OK, 0 rows affected (0.07 sec)

mysql> 
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.02 sec)

mysql> quit
Bye
[root@node101.yinzhengjie.org.cn ~]# 
MySQL授權hive用戶的准備工作

6>.修改hive在hdfs的數據倉庫存放位置 

7>.等待Hive服務部署完成 

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| cdh                |
| hive               |
| mysql              |
| performance_schema |
+--------------------+
5 rows in set (0.00 sec)

mysql> use hive
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> 
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| COMPACTION_QUEUE          |
| COMPLETED_TXN_COMPONENTS  |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| DELEGATION_TOKENS         |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| HIVE_LOCKS                |
| IDXS                      |
| INDEX_PARAMS              |
| MASTER_KEYS               |
| METASTORE_DB_PROPERTIES   |
| NEXT_COMPACTION_QUEUE_ID  |
| NEXT_LOCK_ID              |
| NEXT_TXN_ID               |
| NOTIFICATION_LOG          |
| NOTIFICATION_SEQUENCE     |
| NUCLEUS_TABLES            |
| PARTITIONS                |
| PARTITION_EVENTS          |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| ROLE_MAP                  |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
+---------------------------+
54 rows in set (0.00 sec)

mysql> 
配置完成后,我們觀察hive數據庫中是存放元數據信息相關表的(說實話,初始化表挺多的,我這里現實有54張表,為隨機抽取記賬本看了下,都是空表~)

8>.Hive服務添加成功

9>.在CM界面中可以看到Hive服務是運行正常的

 

 

 

 

二.測試Hive環境是否可用

1>.將測試數據上傳到HDFS中

[root@node101.yinzhengjie.org.cn ~]# cat PageViewData.csv
1999/01/11 10:12,us,927,www.yahoo.com/clq,www.yahoo.com/jxq,948.323.252.617
1999/01/12 10:12,de,856,www.google.com/g4,www.google.com/uypu,416.358.537.539
1999/01/12 10:12,se,254,www.google.com/f5,www.yahoo.com/soeos,564.746.582.215
1999/01/12 10:12,de,465,www.google.com/h5,www.yahoo.com/agvne,685.631.592.264
1999/01/12 10:12,cn,856,www.yinzhengjie.org.cn/g4,www.google.com/uypu,416.358.537.539
1999/01/13 10:12,us,927,www.yahoo.com/clq,www.yahoo.com/jxq,948.323.252.617
1999/01/13 10:12,de,856,www.google.com/g4,www.google.com/uypu,416.358.537.539
1999/01/13 10:12,se,254,www.google.com/f5,www.yahoo.com/soeos,564.746.582.215
1999/01/13 10:12,de,465,www.google.com/h5,www.yahoo.com/agvne,685.631.592.264
1999/01/13 10:12,de,856,www.yinzhengjie.org.cn/g4,www.google.com/uypu,416.358.537.539
1999/01/13 10:12,us,927,www.yahoo.com/clq,www.yahoo.com/jxq,948.323.252.617
1999/01/14 10:12,de,856,www.google.com/g4,www.google.com/uypu,416.358.537.539
1999/01/14 10:12,se,254,www.google.com/f5,www.yahoo.com/soeos,564.746.582.215
1999/01/15 10:12,de,465,www.google.com/h5,www.yahoo.com/agvne,685.631.592.264
1999/01/15 10:12,de,856,www.yinzhengjie.org.cn/g4,www.google.com/uypu,416.358.537.539
1999/01/15 10:12,us,927,www.yahoo.com/clq,www.yahoo.com/jxq,948.323.252.617
1999/01/15 10:12,de,856,www.google.com/g4,www.google.com/uypu,416.358.537.539
1999/01/15 10:12,se,254,www.google.com/f5,www.yahoo.com/soeos,564.746.582.215
1999/01/15 10:12,de,465,www.google.com/h5,www.yahoo.com/agvne,685.631.592.264
1999/01/15 10:12,de,856,www.yinzhengjie.org.cn/g4,www.google.com/uypu,416.358.537.539
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# cat PageViewData.csv                ##查看本地文件日志,為了測試我就隨機寫了條數據 
[root@node101.yinzhengjie.org.cn ~]# hdfs dfs -ls  /tmp/
Found 5 items
d---------   - hdfs   supergroup          0 2019-05-20 10:48 /tmp/.cloudera_health_monitoring_canary_files
drwxr-xr-x   - yarn   supergroup          0 2018-10-19 15:00 /tmp/hadoop-yarn
drwx-wx-wx   - root   supergroup          0 2019-04-29 14:27 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2019-02-26 16:46 /tmp/logs
drwxr-xr-x   - mapred supergroup          0 2018-10-25 12:11 /tmp/mapred
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# ll
total 4
-rw-r--r-- 1 root root 1584 May 20 10:42 PageViewData.csv
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# hdfs dfs -put PageViewData.csv /tmp/
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# hdfs dfs -ls  /tmp/
Found 6 items
d---------   - hdfs   supergroup          0 2019-05-20 10:48 /tmp/.cloudera_health_monitoring_canary_files
-rw-r--r--   3 root   supergroup       1584 2019-05-20 10:49 /tmp/PageViewData.csv
drwxr-xr-x   - yarn   supergroup          0 2018-10-19 15:00 /tmp/hadoop-yarn
drwx-wx-wx   - root   supergroup          0 2019-04-29 14:27 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2019-02-26 16:46 /tmp/logs
drwxr-xr-x   - mapred supergroup          0 2018-10-25 12:11 /tmp/mapred
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# hdfs dfs -put PageViewData.csv /tmp/       #將數據上傳到HDFS的/tmp目錄中

2>.創建數據表page_view,以保證結構化用戶訪問日志

[root@node101.yinzhengjie.org.cn ~]# hive
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/jars/hive-common-1.1.0-cdh5.15.1.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> CREATE TABLE page_view(
    >     view_time String,
    >     country String,
    >     userid String,
    >     page_url String,
    >     referrer_url String,
    >     ip String)
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED by '\n'
    > STORED AS TEXTFILE;
OK
Time taken: 2.598 seconds
hive> show tables;
OK
page_view
Time taken: 0.166 seconds, Fetched: 1 row(s)
hive> 
創建Hive數據表時,需顯式指定數據存儲格式,在以上示例中,TEXTFILE表示文本文件,“,”表示每列分隔符為逗號,而“\n”表示分隔符。

3>.使用LOAD語句將HDFS上的指定目錄或文件加載到數據表page_view中

hive> LOAD DATA INPATH "/tmp/PageViewData.csv" INTO TABLE page_view;
Loading data to table default.page_view
Table default.page_view stats: [numFiles=1, totalSize=1584]
OK
Time taken: 0.594 seconds
hive> 

4>.使用HQL查詢數據。

hive> SELECT country,count(userid) FROM page_view WHERE view_time > "1990/01/12 10:12" GROUP BY country;
Query ID = root_20190523125656_e7558dc5-d450-4d17-bf81-209f802605de
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201905221917_0001, Tracking URL = http://node101.yinzhengjie.org.cn:50030/jobdetails.jsp?jobid=job_201905221917_0001
Kill Command = /opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/hadoop/bin/hadoop job  -kill job_201905221917_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-05-23 12:56:45,895 Stage-1 map = 0%,  reduce = 0%
2019-05-23 12:56:52,970 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.29 sec
2019-05-23 12:56:59,017 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.37 sec
MapReduce Total cumulative CPU time: 5 seconds 370 msec
Ended Job = job_201905221917_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.37 sec   HDFS Read: 10553 HDFS Write: 21 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 370 msec
OK
cn      1
de      11
se      4
us      4
Time taken: 25.063 seconds, Fetched: 4 row(s)
hive> 
hive> SELECT country,count(userid) FROM page_view WHERE view_time > "1990/01/12 10:12" GROUP BY country;

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM