hive 高級查詢1

本文轉載自查看原文 2019-07-23 23:32 407

hadoop hive 高級查詢

select基礎

1.0 一般查詢

1)select * from table_name

2)select * from table_name where name='....' limit 1;

1.1cte和嵌套查詢

1)with t as(select....) select * from t;

2)select * from(select....) a;(a一定要添加)

1.2列匹配正則表達式

在添加數據前：SET hive.support.quoted.identifiers = none;

就可以使用匹配列：SELECT ^o.* FROM offers;

1.3 虛擬列（Virtual Columns）

輸入文件名稱： select input_file_name from emps;

全局文件位置：select block_offset_inside_file from emps;（加重為固定格式~~~~）

我們會把小表放在前面，后面的表格稱為基表

在我們內外關聯的時候，先將外表的一行數據和子表的一行數據進行判斷，存在相當於關鍵詞exit (not exists),mysql中的關鍵詞 in（not in）

select * from userinfos u where userid not in(select b.userid from bankcards b where u.userid=b.userid group by userid);

Hive join-Mapjoin（內外部關聯）

首先我們要先開啟join操作:set hive.auto.convert.join

join——>相當於inner join

left join——>只查左邊的數據

right join——>只查右邊的數據

full join——>查詢所有的數據

Mapjoin操作不支持:

1)在UNION ALL, LATERAL VIEW, GROUP BY/JOIN/SORT BY/CLUSTER BY/DISTRIBUTE BY等操作后面

2)在UNION, JOIN 以及其他MAPJOIN之前

Hive 集合操作（union）

1）Union all：合並后保留重復項

2）Union ：合並后刪除重復項

裝載數據：load移動數據

1）load data local inpath '......' overwrite into table.....

2）load data local inpath '.......' overwrite in to table....partition(字段)

！！！沒有local 就是在hdfs 中的地址

！！！ LOCAL表示文件位於本地，OVERWRITE表示覆蓋現有數據

裝載數據：Insert表插入數據-2

1)單條語句插入(從一個表格中插入某一個)

from ctas_employee

insert overwrite table .....select '....'

！！！相當於兩個表的列數相同屬性相同，插入的數據才會有

2)多條語句插入（overwrite table 后面跟其他表格）

from ctas_employee

insert overwrite table employee select *

insert overwrite table employee_internal select *;

！！！在第一條語句的結尾不加；則可執行多條語句

3)插入到分區

from ctas_patitioned

insert overwrite table employee PARTITION (year, month)

select *,'2018','09';

！！！在執行靜態插入時要指定（year=2018,month=9）

！！！在執行動態插入時不需要指定，如果插入分區的關鍵字少了，直接在select中添加數值即可。

insert 語句將數據插入/導出到文件

-- 從同一數據源插入本地文件，hdfs文件，表（關鍵是同一數據源）

from ctas_employee（固定語句）

本地：insert overwrite local directory '/tmp/out1' select *；

hdfs：insert overwrite directory '/tmp/out1' select *；

table：insert overwrite table employee_internal select *;

Hive數據交換-import/export

1) 使用export導出數據

export table table_name to 'hdfs路徑'；

export table table_name_partition(year,month) to 'hdfs路徑'；（year,month的數據要有，才會生成新的表格）

import table table_name from '之前導出的數據地址'

import table old_table from ‘之前導出的數據地址’ （以有表格的分區要沒有才可以導入）

刪除分區： alter table uu drop partition(year=2017,month=12);

Hive數據排序 ORDER BY

select * from table_name order by 列名；

Hive數據排序-SORT BY/DISTRIBUTE BY

！！！關鍵：設置reduce 數量：set mapred.reduce.tasks = 15 （排序和reduce的數量有關）

1) sort by（對每個reducer中的數據進行排序）

設置reduce 數量：set mapred.reduce.tasks = 1

當reducer數量設置為1時，才可以保證表格的排序有效

當reducer數量設置為2時，分成兩段進行排序（表格的排序為兩種排序）

2) distribute by(類似於group by)

類似於先進行分組在配合sort by desc使用（前面的屬性）如下

Hive 數據排序-CLUSTER BY（集群）

3）cluster by = distribute by + sort by

SELECT name, employee_id FROM employee_hr CLUSTER BY name;

n為了充分利用所有的Reducer來執行全局排序，可以先使用CLUSTER BY，然后使用ORDER BY

實例一：（解決數據的傾斜）

設置reduce 數量：set mapred.reduce.tasks = 15

1.大小表

1.mapreduce

1.cacheFile

2.groupCombinapartition

2.hive中的處理

1.mapjoin中：set hive.auto.convert.join=true 25M

2.部分數據相對特別少

1.mapreduce

1.groupCombinapartition

2.hive

1.partition by (year,month)分區表

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive（五）hive的高級應用 Hive(三)hive的高級操作高級查詢 Hive高級：函數 Hive 查詢 hive的高級查詢（group by、 order by、 join 、 distribute by、sort by、 clusrer by、 union all等） Hive學習之路（十）Hive的高級操作 Hive（六）內置函數與高級操作 Hive（六）內置函數與高級操作 Hive 高級函數----開窗函數