對比hive和mysql查詢匯總

本文轉載自查看原文 2017-05-24 16:36 5713

由於底層的處理機制大不相同，hive和mysql在查詢上還是有較大差異的！

單個表的select操作

最簡單的查詢

.帶in關鍵字查詢：select 字段1，字段2 frome 表名 where 字段 [not]in(元素1，元素2)；

例：select * from t_student where age in (21,23);

select * from t_student where age not in (21,23);

帶between and的范圍查詢：select 字段1，字段2 frome 表名 where 字段 [not]between 取值1 and 取值2；

例：select * frome t_student where age between 21 and 29;

select * frome t_student where age not between 21 and 29;

帶like的模糊查詢：select 字段1，字段2... frome 表名 where 字段 [not] like '字符串'；

"%"代表任意字符；

"_"代表單個字符；

空值查詢：select 字段1，字段2...frome 表名 where 字段 is[not] null;

帶and的多條件查詢：

select 字段1，字段2...frome 表名 where 條件表達式1 and 條件表達式2 [and 條件表達式n]

例：select * frome t_student where gradeName='一年級' and age=23；

帶or的多條件查詢

select 字段1，字段2...frome 表名 where 條件表達式1 or 條件表達式2 [or 條件表達式n]

例：select * frome t_student where gradeName='一年級' or age=23；//或者，條件只要滿足一個

distinct去重復查詢：select distinct 字段名 from 表名

Orderby 和sortby 的區別（前者是要mapreduce操作后者在本機上排序）

分組查詢 group by 屬性名 [having 條件表達式][with rollup]

常用函數：count group_concat rollup

1.select gradeName,count(stuName) from t_student group by gradeName;

2.select gradeName,count(stuName) from t_student group by gradeName having count(stuName)>3;

3.select gradeName,group_concat(stuName) from t_student group by gradeName with rollup;

子查詢

0.一般嵌套子查詢

1.帶比較運算符的子查詢（子查詢可以使用比較運算符）

select * from t_book where price>=(select price from t_priceLevel where priceLevel=1);

2. 帶in關鍵字的子查詢（一個查詢語句的條件可能落在另一個select語句的查詢結果中）

select * from t_book where bookType in(select id from t_bookType);

select * from t_book where bookType not in(select id from t_bookType);

3.帶exists關鍵字的子查詢（加入子查詢查詢到記錄，則進行外層查詢，否則，不執行外層查詢）

select * from t_book where exists(select * from t_booktype);

select * from t_book where not exists(select * from t_booktype);

4.帶any關鍵字的子查詢（any關鍵字表示滿足其中任一條件）

select * from t_book where price>= any(select price from t_priceLevel);

5.帶all關鍵字的子查詢（all關鍵字表示滿足所有條件）

select * from t_book where price>= all(select price from t_priceLevel);

2,3,4,5 目前僅mysql支持

Hive中有基於partition的查詢，從效率上講是一個剪枝的過程

多表連接查詢

Mysql中支持內連接，左右外連接（注意外連接的工作原理，沒有匹配項返回null，可用where過濾），級聯多表連接的時候，從中間解讀；

與此對應hive中有內連接join，外連接（left/right outer join）加上full outer join（全表關聯），semi join是用來在hive中解決in exists子查詢的問題。

Hive的join可大致划分為common join 和map join ,兩者的區別在於后者應用於大小表數據傾斜的情況具體參考http://www.cnblogs.com/1130136248wlxk/articles/5517628.html

Map完輸出為相同key的list，然而按照hash分發到不同reduce的task中。

合並查詢

1.union

使用union關鍵字是，數據庫系統會將所有的查詢結果合並到一起，然后去掉相同的記錄；

select id from t_book union select id from t_bookType;

2.union all

使用union all，不會去除掉重復的記錄；

select id from t_book union all select id from t_bookType;

補充:hive 性能優化方向

列剪裁分區剪裁需要設定相關參數

join 小表依次放在前面，左邊的在reduce階段要放進內存，減少內存發生溢出的幾率

map join 用於小表和大表的傾斜情況

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 mysql 模糊查詢效率對比 Mysql-Sql查詢匯總 Solr與MySQL查詢性能對比 MySQL--Hive中字段數據類型對比 MySQL查詢不使用索引匯總 Hive 查詢 Hive文件格式對比 Pig和Hive的對比 hive 存儲格式對比 Hive與Clickhouse對比