Lateral View語法
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)* fromClause: FROM baseTable (lateralView)*
描述
lateral view用於和split, explode等UDTF一起使用,它能夠將一行數據拆成多行數據,在此基礎上可以對拆分后的數據進行聚合。lateral view首先為原始表的每行調用UDTF,UTDF會把一行拆分成一或者多行,lateral view再把結果組合,產生一個支持別名表的虛擬表。
例子
假設我們有一張表pageAds,它有兩列數據,第一列是pageid string,第二列是adid_list,即用逗號分隔的廣告ID集合:
string pageid | Array<int> adid_list |
"front_page" | [1, 2, 3] |
"contact_page" | [3, 4, 5] |
要統計所有廣告ID在所有頁面中出現的次數。
首先分拆廣告ID:
SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
執行結果如下:
string pageid | int adid |
"front_page" | 1 |
"front_page" | 2 |
"front_page" | 3 |
"contact_page" | 3 |
"contact_page" | 4 |
"contact_page" | 5 |
接下來就是一個聚合的統計:
SELECT adid, count(1) FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid GROUP BY adid;
執行結果如下:
int adid | count(1) |
1 | 1 |
2 | 1 |
3 | 2 |
4 | 1 |
5 | 1 |
多個lateral view語句
一個FROM語句后可以跟多個lateral view語句,后面的lateral view語句能夠引用它前面的所有表和列名。 以下面的表為例:
Array<int> col1 | Array<string> col2 |
[1, 2] | [a", "b", "c"] |
[3, 4] | [d", "e", "f"] |
SELECT myCol1, col2 FROM baseTable LATERAL VIEW explode(col1) myTable1 AS myCol1;
執行結果為:
int mycol1 | Array<string> col2 |
1 | [a", "b", "c"] |
2 | [a", "b", "c"] |
3 | [d", "e", "f"] |
4 | [d", "e", "f"] |
加上一個lateral view:
SELECT myCol1, myCol2 FROM baseTable LATERAL VIEW explode(col1) myTable1 AS myCol1 LATERAL VIEW explode(col2) myTable2 AS myCol2;
它的執行結果為:
int myCol1 | string myCol2 |
1 | "a" |
1 | "b" |
1 | "c" |
2 | "a" |
2 | "b" |
2 | "c" |
3 | "d" |
3 | "e" |
3 | "f" |
4 | "d" |
4 | "e" |
4 | "f" |
注意上面語句中,兩個lateral view按照出現的次序被執行。
轉自 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#
http://blog.csdn.net/inte_sleeper/article/details/7196114