hive的元數據


 簡介:

hive是建立在hadoop之上的數據倉庫,一般用於對大型數據集的讀寫和管理,存在hive里的數據實際上就是存在HDFS上,都是以文件的形式存在,不能進行讀寫操作,所以我們需要元數據或者說叫schem來對hdfs上的數據進行管理。那元數據表之間有沒有什么關聯呢?答案是肯定的。hive默認元數據表是存儲在derby中的,但是derby是單session的,所以我們一般會修改會mysql

那么該如何啟用mysql來管理元數據呢?

 1 <configuration>
 2  <property>
 3   <name>javax.jdo.option.ConnectionURL</name>
 4   <value>jdbc:mysql://hadoop001:3306/ruoze_d6?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8</value>
 5  </property>
 6 <property>
 7       <name>javax.jdo.option.ConnectionDriverName</name>
 8         <value>com.mysql.jdbc.Driver</value>
 9  </property>
10 <property>
11   <name>javax.jdo.option.ConnectionUserName</name>
12     <value>root</value>
13     </property>
14 <property>
15   <name>javax.jdo.option.ConnectionPassword</name>
16     <value>123456</value>
17     </property>
18 </configuration>
以上的配置就會啟用MYSQL管理元數據
第4行的配置是配置了mysql里的數據庫名字叫ruoze_d6,第11行和第16行式配置了MySQL的登錄用戶名和密碼,並且ruoze_d6這個庫不需要在mysql中特別建立

mysql> use ruoze_d6;
Database changed
mysql>

 

 1 mysql> show tables;
 2 +---------------------------+
 3 | Tables_in_ruoze_d6        |
 4 +---------------------------+
 5 | bucketing_cols            |
 6 | cds                       |
 7 | columns_v2                |
 8 | database_params           |
 9 | dbs                       |
10 | func_ru                   |
11 | funcs                     |
12 | global_privs              |
13 | groupinfor                |
14 | idxs                      |
15 | index_params              |
16 | makedata_job              |
17 | part_col_privs            |
18 | part_col_stats            |
19 | part_privs                |
20 | partition_key_vals        |
21 | partition_keys            |
22 | partition_params          |
23 | partitions                |
24 | roles                     |
25 | sd_params                 |
26 | sds                       |
27 | sequence_table            |
28 | serde_params              |
29 | serdes                    |
30 | skewed_col_names          |
31 | skewed_col_value_loc_map  |
32 | skewed_string_list        |
33 | skewed_string_list_values |
34 | skewed_values             |
35 | sort_cols                 |
36 | tab_col_stats             |
37 | table_params              |
38 | tbl_col_privs             |
39 | tbl_privs                 |
40 | tbls                      |
41 | version                   |
42 |                   |
43 +---------------------------+
44 37 rows in set (0.00 sec)

這里一共有37張表, 我們撿主次分析一下

  • version(存儲Hive版本的元數據表)

mysql> select * from version  ;
+--------+----------------+----------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT                        |
+--------+----------------+----------------------------------------+
|     11 | 1.1.0          | Set by MetaStore hadoop@172.16.202.233 |
+--------+----------------+----------------------------------------+
1 row in set (0.00 sec)
  • 說明
  1. 第一列是ID主鍵;第二列是hive的版本,第三列是版本說明,並且這張表里只有一條數據,且只能有一條數據,如果這張表被刪除,當啟動Hive-Cli時候,就會報錯”Table ‘hive.version’ doesn’t exist”。
  2. 但是前提示關閉某個參數,如果那個參數開着,那么你如果刪除了這張表或者說清空這張表,他都會自動建立,那個參數我忘記是啥了,回頭想起來會來補上

 

  • DBS(hive數據庫相關的元數據表)

mysql> select * from DBS \G;
*************************** 1. row ***************************
          DB_ID: 1
           DESC: Default Hive database
DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse
           NAME: default
     OWNER_NAME: public
     OWNER_TYPE: ROLE
*************************** 2. row ***************************
          DB_ID: 6
           DESC: NULL
DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/hadoop_g6.db
           NAME: hadoop_g6
     OWNER_NAME: hadoop
     OWNER_TYPE: USER
*************************** 3. row ***************************
          DB_ID: 11
           DESC: NULL
DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/ruoze_d6.db
           NAME: ruoze_d6
     OWNER_NAME: hadoop
     OWNER_TYPE: USER
3 rows in set (0.00 sec)
  • 說明:該表存儲Hive中所有數據庫的基本信息
列名 解釋
DB_ID
數據庫ID
DESC
數據庫描述
DB_LOCATION_URI
數據庫HDFS路徑
NAME
數據庫名
 OWNER_NAME
數據庫所有者用戶名
OWNER_TYPE
所有者角色

 

  • database_params(hive數據庫相關的元數據表)

mysql> desc database_params;
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| DB_ID       | bigint(20)    | NO   | PRI | NULL    |       |
| PARAM_KEY   | varchar(180)  | NO   | PRI | NULL    |       |
| PARAM_VALUE | varchar(4000) | YES  |     | NULL    |       |
+-------------+---------------+------+-----+---------+-------+
  • 說明:該表存儲數據庫的相關參數,在CREATE DATABASE時候用 WITH DBPROPERTIES (property_name=property_value, …)指定的參數
字段 說明 示例
DB_ID
數據庫ID 11
PARAM_KEY
參數名 createby
PARAM_VALUE 
參數值 root

 

 

  •  TBLS(Hive表和視圖相關的元數據表)

mysql> select * from TBLS \G;
*************************** 1. row ***************************
            TBL_ID: 37
       CREATE_TIME: 1555494334
             DB_ID: 1
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 37
          TBL_NAME: makedata_job
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
  • 說明:該表中存儲Hive表、視圖、索引表的基本信息。
TBL_ID
表ID
 CREATE_TIME
創建時間
 DB_ID
數據庫ID
LAST_ACCESS_TIME
上次訪問時間
OWNER
所有者
RETENTION
保留字段
 SD_ID
序列化配置信息(對應SDS表中的SD_ID
TBL_NAME
表名
TBL_TYPE
表類型
VIEW_EXPANDED_TEXT
視圖詳細的HQL語句
VIEW_ORIGINAL_TEXT
視圖原始的HQL語句
   

 

 

  • table_params(Hive表和視圖相關的元數據表)

mysql> select * from table_params;
+--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY             | PARAM_VALUE |
+--------+-----------------------+-------------+
|     37 | COLUMN_STATS_ACCURATE | true        |
|     37 | numFiles              | 5           |
|     37 | numRows               | 0           |
|     37 | rawDataSize           | 0           |
|     37 | totalSize             | 2921282     |
|     37 | transient_lastDdlTime | 1555551458  |
|     42 | EXTERNAL              | TRUE        |
|     42 | transient_lastDdlTime | 1555555620  |
|     46 | COLUMN_STATS_ACCURATE | true        |
|     46 | numFiles              | 1           |
|     46 | numRows               | 500000      |
|     46 | rawDataSize           | 72051224    |
|     46 | totalSize             | 30284817    |
|     46 | transient_lastDdlTime | 1555557177  |
|     51 | EXTERNAL              | TRUE        |
|     51 | transient_lastDdlTime | 1555772013  |
|     52 | COLUMN_STATS_ACCURATE | true        |
|     52 | numFiles              | 1           |
|     52 | numRows               | 500000      |
|     52 | rawDataSize           | 67551224    |
|     52 | totalSize             | 75265591    |
|     52 | transient_lastDdlTime | 1555772485  |
|     56 | COLUMN_STATS_ACCURATE | true        |
|     56 | numFiles              | 1           |
|     56 | numRows               | 500000      |
|     56 | rawDataSize           | 64051224    |
|     56 | totalSize             | 64641768    |
|     56 | transient_lastDdlTime | 1555773864  |
|     66 | COLUMN_STATS_ACCURATE | true        |
|     66 | numFiles              | 1           |
|     66 | numRows               | 500000      |
|     66 | rawDataSize           | 359000000   |
|     66 | totalSize             | 17782969    |
|     66 | transient_lastDdlTime | 1555775575  |
|     67 | COLUMN_STATS_ACCURATE | true        |
|     67 | numFiles              | 1           |
|     67 | numRows               | 500000      |
|     67 | orc.compress          | NONE        |
|     67 | rawDataSize           | 359000000   |
|     67 | totalSize             | 53967047    |
|     67 | transient_lastDdlTime | 1555775880  |
|     68 | COLUMN_STATS_ACCURATE | true        |
|     68 | numFiles              | 1           |
|     68 | numRows               | 500000      |
|     68 | rawDataSize           | 4000000     |
|     68 | totalSize             | 61117546    |
|     68 | transient_lastDdlTime | 1555776185  |
|     69 | COLUMN_STATS_ACCURATE | true        |
|     69 | numFiles              | 1           |
|     69 | numRows               | 500000      |
|     69 | rawDataSize           | 4000000     |
|     69 | totalSize             | 16854027    |
|     69 | transient_lastDdlTime | 1555776356  |
|     71 | COLUMN_STATS_ACCURATE | true        |
|     71 | numFiles              | 1           |
|     71 | numRows               | 1           |
|     71 | rawDataSize           | 0           |
|     71 | totalSize             | 1           |
|     71 | transient_lastDdlTime | 1555809751  |
|     76 | transient_lastDdlTime | 1555836141  |
|     77 | COLUMN_STATS_ACCURATE | true        |
|     77 | numFiles              | 1           |
|     77 | numRows               | 0           |
|     77 | rawDataSize           | 0           |
|     77 | totalSize             | 366         |
|     77 | transient_lastDdlTime | 1555837173  |
+--------+-----------------------+-------------+
  • 說明:該表存儲表/視圖的屬性信息。
字段 dec
TBL_ID
表ID(對應TBLS中的TBL_ID)
PARAM_KEY
屬性名
PARAM_VALUES
屬性值

 

  •  TBL_PRIVS 該表存儲表/視圖的授權信息(不做詳細說明)

mysql> desc TBL_PRIVS;
+----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| TBL_GRANT_ID   | bigint(20)   | NO   | PRI | NULL    |       |
| CREATE_TIME    | int(11)      | NO   |     | NULL    |       |
| GRANT_OPTION   | smallint(6)  | NO   |     | NULL    |       |
| GRANTOR        | varchar(128) | YES  |     | NULL    |       |
| GRANTOR_TYPE   | varchar(128) | YES  |     | NULL    |       |
| PRINCIPAL_NAME | varchar(128) | YES  |     | NULL    |       |
| PRINCIPAL_TYPE | varchar(128) | YES  |     | NULL    |       |
| TBL_PRIV       | varchar(128) | YES  |     | NULL    |       |
| TBL_ID         | bigint(20)   | YES  | MUL | NULL    |       |
+----------------+--------------+------+-----+---------+-------+
9 rows in set (0.01 sec)
TBL_ID對應TBLS中的TBL_ID
  • sds(Hive文件存儲信息相關的元數據表)

mysql> desc sds;
+---------------------------+---------------+------+-----+---------+-------+
| Field                     | Type          | Null | Key | Default | Extra |
+---------------------------+---------------+------+-----+---------+-------+
| SD_ID                     | bigint(20)    | NO   | PRI | NULL    |       |
| CD_ID                     | bigint(20)    | YES  | MUL | NULL    |       |
| INPUT_FORMAT              | varchar(4000) | YES  |     | NULL    |       |
| IS_COMPRESSED             | bit(1)        | NO   |     | NULL    |       |
| IS_STOREDASSUBDIRECTORIES | bit(1)        | NO   |     | NULL    |       |
| LOCATION                  | varchar(4000) | YES  |     | NULL    |       |
| NUM_BUCKETS               | int(11)       | NO   |     | NULL    |       |
| OUTPUT_FORMAT             | varchar(4000) | YES  |     | NULL    |       |
| SERDE_ID                  | bigint(20)    | YES  | MUL | NULL    |       |
+---------------------------+---------------+------+-----+---------+-------+

 

  • 說明:文件存儲的基本信息:
SD_ID
 
CD_ID 
字段信息ID
INPUT_FORMAT
文件輸入格式
IS_COMPRESSED
是否壓縮
IS_STOREDASSUBDIRECTORIES 
是否以子目錄存儲
LOCATION
HDFS路徑
 NUM_BUCKETS
分桶數量
 OUTPUT_FORMAT 
文件輸出格式
SERDE_ID 
序列化類ID
字段 說明

 

  • SD_PARAMS(Hive文件存儲信息相關的元數據表)

mysql> desc SD_PARAMS;
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| SD_ID       | bigint(20)    | NO   | PRI | NULL    |       |
| PARAM_KEY   | varchar(256)  | NO   | PRI | NULL    |       |
| PARAM_VALUE | varchar(4000) | YES  |     | NULL    |       |
+-------------+---------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
  • 說明:該表存儲Hive存儲的屬性信息,在創建表時候使用 

STORED BY ‘storage.handler.class.name’ [WITH SERDEPROPERTIES (…)指定。

  • serdes(Hive文件存儲信息相關的元數據表)

mysql> select * from serdes; 
+----------+------+-------------------------------------------------------------+
| SERDE_ID | NAME | SLIB                                                        |
+----------+------+-------------------------------------------------------------+
|       37 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       42 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       43 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       46 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       51 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       52 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       56 | NULL | org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe        |
|       66 | NULL | org.apache.hadoop.hive.ql.io.orc.OrcSerde                   |
|       67 | NULL | org.apache.hadoop.hive.ql.io.orc.OrcSerde                   |
|       68 | NULL | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
|       69 | NULL | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
|       71 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       76 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
|       77 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe          |
+----------+------+-------------------------------------------------------------+
14 rows in set (0.00 sec)
  • 說明:該表存儲序列化使用的類信息
字段 字段說明
SERDE_ID 
序列化類配置ID(對應SDS的SERDE_ID )

NAME 
序列化類別名
SLIB
序列化類
  • serde_params(Hive文件存儲信息相關的元數據表)

mysql> select * from serde_params; 
+----------+----------------------+-------------+
| SERDE_ID | PARAM_KEY            | PARAM_VALUE |
+----------+----------------------+-------------+
|       37 | field.delim          |                |
|       37 | serialization.format |                |
|       42 | field.delim          |                |
|       42 | serialization.format |                |
|       43 | field.delim          |                |
|       43 | serialization.format |                |
|       46 | serialization.format | 1           |
|       51 | field.delim          |                |
|       51 | serialization.format |                |
|       52 | serialization.format | 1           |
|       56 | serialization.format | 1           |
|       66 | serialization.format | 1           |
|       67 | serialization.format | 1           |
|       68 | serialization.format | 1           |
|       69 | serialization.format | 1           |
|       71 | serialization.format | 1           |
|       76 | field.delim          |                |
|       76 | serialization.format |                |
|       77 | field.delim          |                |
|       77 | serialization.format |                |
+----------+----------------------+-------------+
20 rows in set (0.00 sec)
  • 說明:該表存儲序列化的一些屬性、格式信息,比如:行、列分隔符
字段 字段說明
SERDE_ID 
序列化類配置ID(對應SDS的SERDE_ID )
PARAM_KEY          
屬性名
PARAM_VALUE 
屬性值

 

  • columns_v2Hive表字段相關的元數據表

mysql> select * from columns_v2; 
+-------+---------+-------------+--------------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME    | INTEGER_IDX |
+-------+---------+-------------+--------------+-------------+
|    37 | NULL    | ip          | varchar(20)  |           4 |
|    37 | NULL    | levelnm     | varchar(6)   |           2 |
|    37 | NULL    | region      | varchar(6)   |           1 |
|    37 | NULL    | time_random | varchar(20)  |           3 |
|    37 | NULL    | traffic     | varchar(12)  |           7 |
|    37 | NULL    | urlid       | varchar(100) |           6 |
|    37 | NULL    | urlnm       | varchar(6)   |           0 |
|    37 | NULL    | urlym       | varchar(20)  |           5 |
|    42 | NULL    | cdn         | string       |           0 |
|    42 | NULL    | domain      | string       |           5 |
|    42 | NULL    | ip          | string       |           4 |
|    42 | NULL    | level       | string       |           2 |
|    42 | NULL    | region      | string       |           1 |
|    42 | NULL    | time        | string       |           3 |
|    42 | NULL    | traffic     | bigint       |           7 |
|    42 | NULL    | url         | string       |           6 |
+-------+---------+-------------+--------------+-------------+
17 rows in set (0.00 sec)
  • 說明:表的字段信息
字段 字段說明
CD_ID 
字段信息ID(對應表SDS的CD_ID)
COMMENT 
字段注釋
COLUMN_NAME 
字段名
TYPE_NAME    
字段類型
INTEGER_IDX 
字段順序

 

 

 

  • partitions(Hive表分區相關的元數據表)

mysql> select * from partitions ; 
+---------+-------------+------------------+--------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME    | SD_ID | TBL_ID |
+---------+-------------+------------------+--------------+-------+--------+
|      21 |  1555555926 |                0 | day=20190418 |    43 |     42 |
+---------+-------------+------------------+--------------+-------+--------+
1 row in set (0.00 sec)
  • 說明:分區的基本信息
 字段 字段說明 
PART_ID
 分區ID
CREATE_TIME 
 分區創建時間
LAST_ACCESS_TIME 
 最后一次訪問時間
PART_NAME
 分區名稱
 
SD_ID 
 分區存儲ID
 
TBL_ID 
 表ID

 

 

  • partition_keys(Hive表分區相關的元數據表)

mysql> select * from partition_keys; 
+--------+--------------+-----------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-----------+-----------+-------------+
|     42 | NULL         | day       | string    |           0 |
+--------+--------------+-----------+-----------+-------------+
1 row in set (0.00 sec)
  • 說明:分區的字段信息
字段名稱 字段說明
TBL_ID 
表ID
PKEY_COMMENT 
分區字段說明
PKEY_NAME 
分區字段名稱
PKEY_TYPE 
分區字段類型
INTEGER_IDX 
分區字段順序

 

 

  • partition_key_vals(Hive表分區相關的元數據表)

mysql> select * from partition_key_vals; 
+---------+--------------+-------------+
| PART_ID | PART_KEY_VAL | INTEGER_IDX |
+---------+--------------+-------------+
|      21 | 20190418     |           0 |
+---------+--------------+-------------+
1 row in set (0.00 sec)
  • 說明:該表存儲分區字段值
字段 字段說明
PART_ID 
分區ID
PART_KEY_VAL 
分區字段值
INTEGER_IDX 
分區字段值順序

 

  • partition_params(Hive表分區相關的元數據表)

mysql> select * from partition_params; 
+---------+-----------------------+-------------+
| PART_ID | PARAM_KEY             | PARAM_VALUE |
+---------+-----------------------+-------------+
|      21 | COLUMN_STATS_ACCURATE | true        |
|      21 | numFiles              | 1           |
|      21 | totalSize             | 29975501    |
|      21 | transient_lastDdlTime | 1555556171  |
+---------+-----------------------+-------------+
4 rows in set (0.00 sec)
  • 說明:該表存儲分區的屬性信息.
字段 字段說明
PART_ID 
分區ID
PARAM_KEY 
分區屬性名
PARAM_VALUE 
分區屬性值

 

 

 

  • 其他不常用的元數據表


此圖轉載於https://mp.weixin.qq.com/s/c2C4SYaj-GUP6hTkPNV_hQ

參考博客:https://mp.weixin.qq.com/s/c2C4SYaj-GUP6hTkPNV_hQ


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM