Lateral View語法
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)* fromClause: FROM baseTable (lateralView)*
描述
lateral view用於和split, explode等UDTF一起使用,它能夠將一行數據拆成多行數據,在此基礎上可以對拆分后的數據進行聚合。lateral view首先為原始表的每行調用UDTF,UTDF會把一行拆分成一或者多行,lateral view再把結果組合,產生一個支持別名表的虛擬表。
例子
假設我們有一張表pageAds,它有兩列數據,第一列是pageid string,第二列是adid_list,即用逗號分隔的廣告ID集合:
| string pageid | Array<int> adid_list |
| "front_page" | [1, 2, 3] |
| "contact_page" | [3, 4, 5] |
要統計所有廣告ID在所有頁面中出現的次數。
首先分拆廣告ID:
SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
執行結果如下:
| string pageid | int adid |
| "front_page" | 1 |
| "front_page" | 2 |
| "front_page" | 3 |
| "contact_page" | 3 |
| "contact_page" | 4 |
| "contact_page" | 5 |
接下來就是一個聚合的統計:
SELECT adid, count(1)
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
GROUP BY adid;
執行結果如下:
| int adid | count(1) |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 1 |
多個lateral view語句
一個FROM語句后可以跟多個lateral view語句,后面的lateral view語句能夠引用它前面的所有表和列名。 以下面的表為例:
| Array<int> col1 | Array<string> col2 |
| [1, 2] | [a", "b", "c"] |
| [3, 4] | [d", "e", "f"] |
SELECT myCol1, col2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1;
執行結果為:
| int mycol1 | Array<string> col2 |
| 1 | [a", "b", "c"] |
| 2 | [a", "b", "c"] |
| 3 | [d", "e", "f"] |
| 4 | [d", "e", "f"] |
加上一個lateral view:
SELECT myCol1, myCol2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2;
它的執行結果為:
| int myCol1 | string myCol2 |
| 1 | "a" |
| 1 | "b" |
| 1 | "c" |
| 2 | "a" |
| 2 | "b" |
| 2 | "c" |
| 3 | "d" |
| 3 | "e" |
| 3 | "f" |
| 4 | "d" |
| 4 | "e" |
| 4 | "f" |
注意上面語句中,兩個lateral view按照出現的次序被執行。
轉自 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#
http://blog.csdn.net/inte_sleeper/article/details/7196114
