1、concat
concat函數在連接字符串的時候,只要其中一個是NULL,那么將返回NULL
hive> select concat('a','b'); ab hive> select concat('a','b',null); NULL
2、concat_ws
concat_ws函數在連接字符串的時候,只要有一個字符串不是NULL,就不會返回NULL。
concat_ws函數需要指定分隔符。
hive> select concat_ws('-','a','b'); a-b hive> select concat_ws('-','a','b',null); a-b hive> select concat_ws('','a','b',null); ab
3、STR_TO_MAP
- 語法
STR_TO_MAP(VARCHAR text, VARCHAR listDelimiter, VARCHAR keyValueDelimiter)
- 功能
- 使用
listDelimiter
將text分隔成K-V對, - 然后使用
keyValueDelimiter
分隔每個K-V對,組裝成MAP返回。 - 默認
listDelimiter為( ,)
,keyValueDelimiter為(=)
。
- 案例
str_to_map('1001=2020-03-10,1002=2020-03-10', ',' , '=') 輸出 {"1001":"2020-03-10","1002":"2020-03-10"}
4、案例
第一步:
hive> select order_id, concat(order_status,'=', operate_time) from order_status_log where dt='2020-03-10'; 3210 1001=2020-03-10 00:00:00.0 3211 1001=2020-03-10 00:00:00.0 3212 1001=2020-03-10 00:00:00.0 3210 1002=2020-03-10 00:00:00.0 3211 1002=2020-03-10 00:00:00.0 3212 1002=2020-03-10 00:00:00.0 3210 1005=2020-03-10 00:00:00.0 3211 1004=2020-03-10 00:00:00.0 3212 1004=2020-03-10 00:00:00.0
第二步:
hive > select order_id, collect_set(concat(order_status,'=',operate_time)) from order_status_log where dt='2020-03-10' group by order_id; 3210 ["1001=2020-03-10 00:00:00.0","1002=2020-03-10 00:00:00.0","1005=2020-03-10 00:00:00.0"] 3211 ["1001=2020-03-10 00:00:00.0","1002=2020-03-10 00:00:00.0","1004=2020-03-10 00:00:00.0"] 3212 ["1001=2020-03-10 00:00:00.0","1002=2020-03-10 00:00:00.0","1004=2020-03-10 00:00:00.0"]
第三步:
hive> select order_id, concat_ws(',', collect_set(concat(order_status,'=',operate_time))) from order_status_log where dt='2020-03-10' group by order_id; 3210 1001=2020-03-10 00:00:00.0,1002=2020-03-10 00:00:00.0,1005=2020-03-10 00:00:00.0 3211 1001=2020-03-10 00:00:00.0,1002=2020-03-10 00:00:00.0,1004=2020-03-10 00:00:00.0 3212 1001=2020-03-10 00:00:00.0,1002=2020-03-10 00:00:00.0,1004=2020-03-10 00:00:00.0
第四步:
hive > select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))), ',' , '=') tms from order_status_log where dt='2020-03-10' group by order_id; 3210 {"1001":"2020-03-10 00:00:00.0","1002":"2020-03-10 00:00:00.0","1005":"2020-03-10 00:00:00.0"} 3211 {"1001":"2020-03-10 00:00:00.0","1002":"2020-03-10 00:00:00.0","1004":"2020-03-10 00:00:00.0"} 3212 {"1001":"2020-03-10 00:00:00.0","1002":"2020-03-10 00:00:00.0","1004":"2020-03-10 00:00:00.0"}
第五步:取值
- tms['1001']:創建時間(未支付狀態)
- tms['1002']:支付時間(支付狀態)
- tms['1003']:取消時間(已取消狀態)
- tms['1004']:完成時間(已完成狀態)
- tms['1005']:退款時間(退款狀態)
- tms['1006']:退款完成時間(退款完成狀態)