為什么需要用戶行為寬表?把每個用戶單日的行為聚合起來組成一張多列寬表,以便之后關聯用戶維度信息后,進行不同角度的統計分析。
數據來源:DWD層相關的業務數據表
創建用戶行為寬表:
這張寬表整合了下單、支付和評論3種行為。
drop table if exists dws_user_action; create external table dws_user_action ( user_id string comment '用戶 id', order_count bigint comment '下單次數 ', order_amount decimal(16,2) comment '下單金額 ', payment_count bigint comment '支付次數', payment_amount decimal(16,2) comment '支付金額 ', comment_count bigint comment '評論次數' ) COMMENT '每日用戶行為寬表' PARTITIONED BY (`dt` string) stored as parquet location '/warehouse/gmall/dws/dws_user_action/' tblproperties ("parquet.compression"="snappy");
數據導入腳本:
with as基本語法為如下,作用是定義一個臨時表,可以在后續的語句中多次使用,提高sql可讀性。注意多個臨時表之間用逗號,而最后一個臨時表和查詢語句之間沒有符號。
WITH t1 AS ( SELECT * FROM carinfo ), t2 AS ( SELECT * FROM car_blacklist ) SELECT * FROM t1, t2
#!/bin/bash # 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive # 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天 if [ -n "$1" ] ;then do_date=$1 else do_date=`date -d "-1 day" +%F` fi sql=" with tmp_order as ( select user_id, sum(oi.total_amount) order_amount, count(*) order_count from "$APP".dwd_order_info oi where date_format(oi.create_time,'yyyy-MM-dd')='$do_date' group by user_id ) , tmp_payment as ( select user_id, sum(pi.total_amount) payment_amount, count(*) payment_count from "$APP".dwd_payment_info pi where date_format(pi.payment_time,'yyyy-MM-dd')='$do_date' group by user_id ), tmp_comment as ( select user_id, count(*) comment_count from "$APP".dwd_comment_log c where date_format(c.dt,'yyyy-MM-dd')='$do_date' group by user_id ) Insert overwrite table "$APP".dws_user_action partition(dt='$do_date') select user_actions.user_id, sum(user_actions.order_count), sum(user_actions.order_amount), sum(user_actions.payment_count), sum(user_actions.payment_amount), sum(user_actions.comment_count) from ( select user_id, order_count, order_amount, 0 payment_count, 0 payment_amount, 0 comment_count from tmp_order union all select user_id, 0, 0, payment_count, payment_amount, 0 from tmp_payment union all select user_id, 0, 0, 0, 0, comment_count from tmp_comment ) user_actions group by user_id; " $hive -e "$sql"