HIVE簡單操作

本文轉載自查看原文 2018-07-30 10:19 775 bigdata

1.hive命令登錄HIVE數據庫后，執行show databases;命令可以看到hive數據庫中有一個默認的default數據庫。

[root@hadoop hive]# hive

Logging initialized using configuration in file:/usr/local/hive/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
default #可以看到HIVE默認自帶了一個數據庫default
Time taken: 21.043 seconds, Fetched: 1 row(s)
hive>

View Code

然后登錄mysql數據庫，show databases;顯示數據庫名，可以看到有一個hive數據庫；use hive; 進入hive數據庫；show tables;顯示表名；select * from DBS; #可以看到HIVE默認default數據庫的元數據信息。

[root@hadoop ~]# mysql -uroot -proot
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 24
Server version: 5.6.40-log MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| test               |
+--------------------+
5 rows in set (0.32 sec)

mysql> use hive
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| AUX_TABLE                 |
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| COMPACTION_QUEUE          |
| COMPLETED_COMPACTIONS     |
| COMPLETED_TXN_COMPONENTS  |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| DELEGATION_TOKENS         |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| HIVE_LOCKS                |
| IDXS                      |
| INDEX_PARAMS              |
| KEY_CONSTRAINTS           |
| MASTER_KEYS               |
| NEXT_COMPACTION_QUEUE_ID  |
| NEXT_LOCK_ID              |
| NEXT_TXN_ID               |
| NOTIFICATION_LOG          |
| NOTIFICATION_SEQUENCE     |
| NUCLEUS_TABLES            |
| PARTITIONS                |
| PARTITION_EVENTS          |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| ROLE_MAP                  |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
| WRITE_SET                 |
+---------------------------+
57 rows in set (0.00 sec)

mysql> select * from DBS; #可以看到HIVE默認數據庫default的元數據
+-------+-----------------------+----------------------------------------+---------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                        | NAME    | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+----------------------------------------+---------+------------+------------+
|     1 | Default Hive database | hdfs://hadoop:9000/user/hive/warehouse | default | public     | ROLE       |
+-------+-----------------------+----------------------------------------+---------+------------+------------+
1 row in set (0.00 sec)

mysql>

View Code

2.在hive創建一個測試庫

hive> create database testhive; #創建庫
OK
Time taken: 3.45 seconds

hive> show databases; #顯示庫
OK
default
testhive
Time taken: 1.123 seconds, Fetched: 2 row(s)

在mysql查看，發現顯示了測試庫元數據信息（包括testhive的DB_ID，在HDFS上的存儲位置等）

mysql> select * from DBS;
+-------+-----------------------+----------------------------------------------------+----------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                                    | NAME     | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+----------------------------------------------------+----------+------------+------------+
|     1 | Default Hive database | hdfs://hadoop:9000/user/hive/warehouse             | default  | public     | ROLE       |
|     6 | NULL                  | hdfs://hadoop:9000/user/hive/warehouse/testhive.db | testhive | root       | USER       |
+-------+-----------------------+----------------------------------------------------+----------+------------+------------+
2 rows in set (0.00 sec)

在HDFS查看，我們看一下testhive.db是什么。它其實就是一個目錄，所以說創建一個數據庫其實就是創建了一個目錄

我創建的hdfs目錄明明是/usr/hive/warehouse/，不知道為啥數據庫卻保存到了/user/hive/warehouse/？？哪里出錯了？？或者說是我的目錄創建錯了，應該創建的就是/user/hive/warehouse/？

[root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x   - root supergroup          0 2018-07-27 15:17 /user/hive/warehouse/testhive.db

3.創建表

hive> use testhive; #使用庫
OK
Time taken: 0.131 seconds

hive> create table test(id int); 創建表
OK
Time taken: 3.509 seconds

在mysql中查看表的信息，可以看到test表歸屬於DB_ID為6的數據庫，即testhive（可 select * from DBS; 查看）

mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
|      1 |  1532677542 |     6 |                0 | root  |         0 |     1 | test     | MANAGED_TABLE | NULL               | NULL               |                    |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
1 row in set (0.01 sec)

在HDFS中查看，發現HDFS為新表創建了一個目錄

[root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db
Found 1 items
drwxr-xr-x   - root supergroup          0 2018-07-27 16:03 /user/hive/warehouse/testhive.db/test

4.插入數據。

4.1 在表中插入數據 insert into test values (1); 可以看到系統在對數據進行MapReduce。

hive> insert into test values (1);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20180727155527_5971c7d8-9b5c-4ef3-98f7-63febe38c79a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1532671010251_0001, Tracking URL = http://hadoop:8088/proxy/application_1532671010251_0001/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1532671010251_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-27 16:02:25,979 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.32 sec
MapReduce Total cumulative CPU time: 3 seconds 320 msec
Ended Job = job_1532671010251_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoop:9000/user/hive/warehouse/testhive.db/test/.hive-staging_hive_2018-07-27_15-55-27_353_3121708441542170724-1/-ext-10000
Loading data to table testhive.test
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 3.32 sec   HDFS Read: 3951 HDFS Write: 71 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 320 msec
OK
Time taken: 453.982 seconds

View Code

在HDFS查看，發現HDFS將插入的數據封裝成了一個文件000000_0

[root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db/test
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:01 /user/hive/warehouse/testhive.db/test/000000_0
[root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/testhive.db/test/000000_0
1

4.2 再插入一個數據 insert into test values (2); 可以看到系統還是在對數據進行MapReduce。

hive>  insert into test values (2);

在HDFS中查看，發現HDFS將插入的數據封裝成了另外一個文件000000_0_copy_1

[root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db/test
Found 2 items
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:01 /user/hive/warehouse/testhive.db/test/000000_0
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:22 /user/hive/warehouse/testhive.db/test/000000_0_copy_1
[root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/testhive.db/test/000000_0_copy_1
2

4.3 再插入一個數據 insert into test values (3); 可以看到系統還是在對數據進行MapReduce。

在HDFS中查看，發現HDFS將插入的數據封裝成了另外一個文件000000_0_copy_2

[root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db/test
Found 3 items
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:01 /user/hive/warehouse/testhive.db/test/000000_0
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:22 /user/hive/warehouse/testhive.db/test/000000_0_copy_1
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:37 /user/hive/warehouse/testhive.db/test/000000_0_copy_2
[root@hadoop ~]# hdfs dfs -cat /user/hive/warehouse/testhive.db/test/000000_0_copy_2
3

4.4 在hive中查看表

hive> select * from test;
OK
1
2
3

Time taken: 5.483 seconds, Fetched: 3 row(s)

5.從本地文件加載數據

先創建文件

[root@hadoop ~]# vi hive.txt  #創建文件
4
5
6
7
8
9
0
#保存退出

然后加載數據

hive> load data local inpath '/root/hive.txt' into table testhive.test; #加載數據
Loading data to table testhive.test
OK
Time taken: 6.282 seconds

在hive中查看，發現文件內容被映射到了表中的對應的列里

hive> select * from test;
OK
1
2
3
4
5
6
7
8
9
0
Time taken: 0.534 seconds, Fetched: 10 row(s)

在HDFS查看，發現hive.txt文件被保存到了test表目錄下

[root@hadoop ~]# hdfs dfs -ls /user/hive/warehouse/testhive.db/test
Found 4 items
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:01 /user/hive/warehouse/testhive.db/test/000000_0
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:22 /user/hive/warehouse/testhive.db/test/000000_0_copy_1
-rwxr-xr-x   1 root supergroup          2 2018-07-27 16:37 /user/hive/warehouse/testhive.db/test/000000_0_copy_2
-rwxr-xr-x   1 root supergroup         14 2018-07-27 16:48 /user/hive/warehouse/testhive.db/test/hive.txt

6.hive也支持排序 select * from test order by id desc; 可以看到hive此時也是有一個MapReduce過程

hive> select * from test order by id desc; 
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20180730093619_c798eb69-b94f-4678-94cc-5ec56865ed5c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1532913019648_0001, Tracking URL = http://hadoop:8088/proxy/application_1532913019648_0001/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1532913019648_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-30 09:38:13,904 Stage-1 map = 0%,  reduce = 0%
2018-07-30 09:39:09,656 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 1.66 sec
2018-07-30 09:39:14,311 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.72 sec
2018-07-30 09:39:49,708 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.41 sec
MapReduce Total cumulative CPU time: 5 seconds 930 msec
Ended Job = job_1532913019648_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.93 sec   HDFS Read: 6799 HDFS Write: 227 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 930 msec
OK
9
8
7
6
5
4
3
2
1
0
Time taken: 224.27 seconds, Fetched: 10 row(s)

View Code

7.hive也支持desc test;

hive> desc test;
OK
id                      int                                         
Time taken: 6.194 seconds, Fetched: 1 row(s)

hive數據庫的操作和mysql其實差不多，它的缺點是沒有修改和刪除命令，優點是不需要用戶親自寫MapReduce，只需要通過簡單的sql語句的形式就可以實現復雜關系。

hive的操作還有很多，以后用到再整理吧。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 hive的安裝與簡單使用 Hive入門操作 Hive配置與操作實踐 Hive 之 Java API 操作使用PyHive操作Hive Hive的日志操作 Hive基本操作與案例 HIVE的Shell操作 sparksql 操作hive Hive入門&基本操作