建表說明
ADS層不涉及建模,建表根據具體需求而定。
第一章 訪客主題
1.1 訪客統計
該需求為訪客綜合統計,其中包含若干指標,以下為對每個指標的解釋說明。
指標 |
說明 |
對應字段 |
訪客數 |
統計訪問人數 |
uv_count |
頁面停留時長 |
統計所有頁面訪問記錄總時長,以秒為單位 |
duration_sec |
平均頁面停留時長 |
統計每個會話平均停留時長,以秒為單位 |
avg_duration_sec |
頁面瀏覽總數 |
統計所有頁面訪問記錄總數 |
page_count |
平均頁面瀏覽數 |
統計每個會話平均瀏覽頁面數 |
avg_page_count |
會話總數 |
統計會話總數 |
sv_count |
跳出數 |
統計只瀏覽一個頁面的會話個數 |
bounce_count |
跳出率 |
只有一個頁面的會話的比例 |
bounce_rate |
1.建表語句
DROP TABLE IF EXISTS ads_visit_stats; CREATE EXTERNAL TABLE ads_visit_stats ( `dt` STRING COMMENT '統計日期', `is_new` STRING COMMENT '新老標識,1:新,0:老', `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天', `channel` STRING COMMENT '渠道', `uv_count` BIGINT COMMENT '日活(訪問人數)', `duration_sec` BIGINT COMMENT '頁面停留總時長', `avg_duration_sec` BIGINT COMMENT '一次會話,頁面停留平均時長,單位為描述', `page_count` BIGINT COMMENT '頁面總瀏覽數', `avg_page_count` BIGINT COMMENT '一次會話,頁面平均瀏覽數', `sv_count` BIGINT COMMENT '會話次數', `bounce_count` BIGINT COMMENT '跳出數', `bounce_rate` DECIMAL(16,2) COMMENT '跳出率' ) COMMENT '訪客統計' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_visit_stats/';
2.數據裝載
思路分析:該需求的關鍵點為會話的划分,總體實現思路可分為以下幾步:
第一步:對所有頁面訪問記錄進行會話的划分。
第二步:統計每個會話的瀏覽時長和瀏覽頁面數。
第三步:統計上述各指標。
insert overwrite table ads_visit_stats select * from ads_visit_stats union select '2020-06-14' dt, is_new, recent_days, channel, count(distinct(mid_id)) uv_count, cast(sum(duration)/1000 as bigint) duration_sec, cast(avg(duration)/1000 as bigint) avg_duration_sec, sum(page_count) page_count, cast(avg(page_count) as bigint) avg_page_count, count(*) sv_count, sum(if(page_count=1,1,0)) bounce_count, cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate from ( select session_id, mid_id, is_new, recent_days, channel, count(*) page_count, sum(during_time) duration from ( select mid_id, channel, recent_days, is_new, last_page_id, page_id, during_time, concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id from ( select mid_id, channel, last_page_id, page_id, during_time, ts, recent_days, if(visit_date_first>=date_add('2020-06-14',-recent_days+1),'1','0') is_new from ( select t1.mid_id, t1.channel, t1.last_page_id, t1.page_id, t1.during_time, t1.dt, t1.ts, t2.visit_date_first from ( select mid_id, channel, last_page_id, page_id, during_time, dt, ts from dwd_page_log where dt>=date_add('2020-06-14',-30) )t1 left join ( select mid_id, visit_date_first from dwt_visitor_topic where dt='2020-06-14' )t2 on t1.mid_id=t2.mid_id )t3 lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('2020-06-14',-recent_days+1) )t4 )t5 group by session_id,mid_id,is_new,recent_days,channel )t6 group by is_new,recent_days,channel;
1.2 路徑分析
用戶路徑分析,顧名思義,就是指用戶在APP或網站中的訪問路徑。為了衡量網站優化的效果或營銷推廣的效果,以及了解用戶行為偏好,時常要對訪問路徑進行分析。
用戶訪問路徑的可視化通常使用桑基圖。如下圖所示,該圖可真實還原用戶的訪問路徑,包括頁面跳轉和頁面訪問次序。
桑基圖需要我們提供每種頁面跳轉的次數,每個跳轉由source/target表示,source指跳轉起始頁面,target表示跳轉終到頁面。
1.建表語句
DROP TABLE IF EXISTS ads_page_path; CREATE EXTERNAL TABLE ads_page_path ( `dt` STRING COMMENT '統計日期', `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天', `source` STRING COMMENT '跳轉起始頁面ID', `target` STRING COMMENT '跳轉終到頁面ID', `path_count` BIGINT COMMENT '跳轉次數' ) COMMENT '頁面瀏覽路徑' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_page_path/';
2.數據裝載
思路分析:該需求要統計的就是每種跳轉的次數,故理論上對source/target進行分組count()即可。統計時需注意以下兩點:
第一點:桑基圖的source不允許為空,但target可為空。
第二點:桑基圖所展示的流程不允許存在環。
insert overwrite table ads_page_path select * from ads_page_path union select '2020-06-14', recent_days, source, target, count(*) from ( select recent_days, concat('step-',step,':',source) source, concat('step-',step+1,':',target) target from ( select recent_days, page_id source, lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target, row_number() over (partition by recent_days,session_id order by ts) step from ( select recent_days, last_page_id, page_id, ts, concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id from dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('2020-06-14',-30) and dt>=date_add('2020-06-14',-recent_days+1) )t2 )t3 )t4 group by recent_days,source,target;
第二章 用戶主題
2.1 用戶統計
該需求為用戶綜合統計,其中包含若干指標,以下為對每個指標的解釋說明。
指標 |
說明 |
對應字段 |
新增用戶數 |
統計新增注冊用戶人數 |
new_user_count |
新增下單用戶數 |
統計新增下單用戶人數 |
new_order_user_count |
下單總金額 |
統計所有訂單總額 |
order_final_amount |
下單用戶數 |
統計下單用戶總數 |
order_user_count |
未下單用戶數 |
統計活躍但未下單用戶數 |
no_order_user_count |
1.建表語句
DROP TABLE IF EXISTS ads_user_total; CREATE EXTERNAL TABLE `ads_user_total` ( `dt` STRING COMMENT '統計日期', `recent_days` BIGINT COMMENT '最近天數,0:累積值,1:最近1天,7:最近7天,30:最近30天', `new_user_count` BIGINT COMMENT '新注冊用戶數', `new_order_user_count` BIGINT COMMENT '新增下單用戶數', `order_final_amount` DECIMAL(16,2) COMMENT '下單總金額', `order_user_count` BIGINT COMMENT '下單用戶數', `no_order_user_count` BIGINT COMMENT '未下單用戶數(具體指活躍用戶中未下單用戶)' ) COMMENT '用戶統計' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_total/';
2.數據裝載
insert overwrite table ads_user_total select * from ads_user_total union select '2020-06-14', recent_days, sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count, sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count, sum(order_final_amount) order_final_amount, sum(if(order_final_amount>0,1,0)) order_user_count, sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count from ( select recent_days, user_id, login_date_first, login_date_last, order_date_first, case when recent_days=0 then order_final_amount when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_final_amount, if(recent_days=0,'1970-01-01',date_add('2020-06-14',-recent_days+1)) recent_days_ago from dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 group by recent_days;
2.2 用戶變動統計
該需求包括兩個指標,分別為流失用戶數和回流用戶數,以下為對兩個指標的解釋說明。
指標 |
說明 |
對應字段 |
流失用戶數 |
之前活躍過的用戶,最近一段時間未活躍,就稱為流失用戶。此處要求統計7日前(只包含7日前當天)活躍,但最近7日未活躍的用戶總數。 |
user_churn_count |
回流用戶數 |
之前的活躍用戶,一段時間未活躍(流失),今日又活躍了,就稱為回流用戶。此處要求統計回流用戶總數。 |
new_order_user_count |
1.建表語句
DROP TABLE IF EXISTS ads_user_change; CREATE EXTERNAL TABLE `ads_user_change` ( `dt` STRING COMMENT '統計日期', `user_churn_count` BIGINT COMMENT '流失用戶數', `user_back_count` BIGINT COMMENT '回流用戶數' ) COMMENT '用戶變動統計' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_change/';
2.數據裝載
思路分析:
流失用戶:末次活躍時間為7日前的用戶即為流失用戶。
回流用戶:末次活躍時間為今日,上次活躍時間在8日前的用戶即為回流用戶。
insert overwrite table ads_user_change select * from ads_user_change union select churn.dt, user_churn_count, user_back_count from ( select '2020-06-14' dt, count(*) user_churn_count from dwt_user_topic where dt='2020-06-14' and login_date_last=date_add('2020-06-14',-7) )churn join ( select '2020-06-14' dt, count(*) user_back_count from ( select user_id, login_date_last from dwt_user_topic where dt='2020-06-14' and login_date_last='2020-06-14' )t1 join ( select user_id, login_date_last login_date_previous from dwt_user_topic where dt=date_add('2020-06-14',-1) )t2 on t1.user_id=t2.user_id where datediff(login_date_last,login_date_previous)>=8 )back on churn.dt=back.dt;
2.3 用戶行為漏斗分析
漏斗分析是一個數據分析模型,它能夠科學反映一個業務過程從起點到終點各階段用戶轉化情況。由於其能將各階段環節都展示出來,故哪個階段存在問題,就能一目了然。
該需求要求統計一個完整的購物流程各個階段的人數。
1.建表語句
DROP TABLE IF EXISTS ads_user_action; CREATE EXTERNAL TABLE `ads_user_action` ( `dt` STRING COMMENT '統計日期', `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天', `home_count` BIGINT COMMENT '瀏覽首頁人數', `good_detail_count` BIGINT COMMENT '瀏覽商品詳情頁人數', `cart_count` BIGINT COMMENT '加入購物車人數', `order_count` BIGINT COMMENT '下單人數', `payment_count` BIGINT COMMENT '支付人數' ) COMMENT '漏斗分析' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_action/';
2.數據裝載
with tmp_page as ( select '2020-06-14' dt, recent_days, sum(if(array_contains(pages,'home'),1,0)) home_count, sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count from ( select recent_days, mid_id, collect_set(page_id) pages from ( select dt, mid_id, page.page_id from dws_visitor_action_daycount lateral view explode(page_stats) tmp as page where dt>=date_add('2020-06-14',-29) and page.page_id in('home','good_detail') )t1 lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('2020-06-14',-recent_days+1) group by recent_days,mid_id )t2 group by recent_days ), tmp_cop as ( select '2020-06-14' dt, recent_days, sum(if(cart_count>0,1,0)) cart_count, sum(if(order_count>0,1,0)) order_count, sum(if(payment_count>0,1,0)) payment_count from ( select recent_days, user_id, case when recent_days=1 then cart_last_1d_count when recent_days=7 then cart_last_7d_count when recent_days=30 then cart_last_30d_count end cart_count, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then payment_last_1d_count when recent_days=7 then payment_last_7d_count when recent_days=30 then payment_last_30d_count end payment_count from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 group by recent_days ) insert overwrite table ads_user_action select * from ads_user_action union select tmp_page.dt, tmp_page.recent_days, home_count, good_detail_count, cart_count, order_count, payment_count from tmp_page join tmp_cop on tmp_page.recent_days=tmp_cop.recent_days;
2.4 用戶留存率
留存分析一般包含新增留存和活躍留存分析。
新增留存分析是分析某天的新增用戶中,有多少人有后續的活躍行為。活躍留存分析是分析某天的活躍用戶中,有多少人有后續的活躍行為。
留存分析是衡量產品對用戶價值高低的重要指標。
此處要求統計新增留存率,新增留存率具體是指留存用戶數與新增用戶數的比值,例如2020-06-14新增100個用戶,1日之后(2020-06-15)這100人中有80個人活躍了,那2020-06-14的1日留存數則為80,2020-06-14的1日留存率則為80%。
要求統計每天的1至7日留存率,如下圖所示。
1.建表語句
DROP TABLE IF EXISTS ads_user_retention; CREATE EXTERNAL TABLE ads_user_retention ( `dt` STRING COMMENT '統計日期', `create_date` STRING COMMENT '用戶新增日期', `retention_day` BIGINT COMMENT '截至當前日期留存天數', `retention_count` BIGINT COMMENT '留存用戶數量', `new_user_count` BIGINT COMMENT '新增用戶數量', `retention_rate` DECIMAL(16,2) COMMENT '留存率' ) COMMENT '用戶留存率' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_retention/';
2.數據裝載
insert overwrite table ads_user_retention select * from ads_user_retention union select '2020-06-14', login_date_first create_date, datediff('2020-06-14',login_date_first) retention_day, sum(if(login_date_last='2020-06-14',1,0)) retention_count, count(*) new_user_count, cast(sum(if(login_date_last='2020-06-14',1,0))/count(*)*100 as decimal(16,2)) retention_rate from dwt_user_topic where dt='2020-06-14' and login_date_first>=date_add('2020-06-14',-7) and login_date_first<'2020-06-14' group by login_date_first;
第三章 商品主題
3.1 商品統計
該指標為商品綜合統計,包含每個spu被下單總次數和被下單總金額。
1.建表語句
DROP TABLE IF EXISTS ads_order_spu_stats; CREATE EXTERNAL TABLE `ads_order_spu_stats` ( `dt` STRING COMMENT '統計日期', `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天', `spu_id` STRING COMMENT '商品ID', `spu_name` STRING COMMENT '商品名稱', `tm_id` STRING COMMENT '品牌ID', `tm_name` STRING COMMENT '品牌名稱', `category3_id` STRING COMMENT '三級品類ID', `category3_name` STRING COMMENT '三級品類名稱', `category2_id` STRING COMMENT '二級品類ID', `category2_name` STRING COMMENT '二級品類名稱', `category1_id` STRING COMMENT '一級品類ID', `category1_name` STRING COMMENT '一級品類名稱', `order_count` BIGINT COMMENT '訂單數', `order_amount` DECIMAL(16,2) COMMENT '訂單金額' ) COMMENT '商品銷售統計' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_spu_stats/';
2.數據裝載
insert overwrite table ads_order_spu_stats select * from ads_order_spu_stats union select '2020-06-14' dt, recent_days, spu_id, spu_name, tm_id, tm_name, category3_id, category3_name, category2_id, category2_name, category1_id, category1_name, sum(order_count), sum(order_amount) from ( select recent_days, sku_id, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_amount from dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 left join ( select id, spu_id, spu_name, tm_id, tm_name, category3_id, category3_name, category2_id, category2_name, category1_id, category1_name from dim_sku_info where dt='2020-06-14' )t2 on t1.sku_id=t2.id group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;
3.2 品牌復購率
品牌復購率是指一段時間內重復購買某品牌的人數與購買過該品牌的人數的比值。重復購買即購買次數大於等於2,購買過即購買次數大於1。
此處要求統計最近1,7,30天的各品牌復購率。
1.建表語句
DROP TABLE IF EXISTS ads_repeat_purchase; CREATE EXTERNAL TABLE `ads_repeat_purchase` ( `dt` STRING COMMENT '統計日期', `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天', `tm_id` STRING COMMENT '品牌ID', `tm_name` STRING COMMENT '品牌名稱', `order_repeat_rate` DECIMAL(16,2) COMMENT '復購率' ) COMMENT '品牌復購率' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_repeat_purchase/';
2.數據裝載
思路分析:該需求可分兩步實現:
第一步:統計每個用戶購買每個品牌的次數。
第二步:分別統計購買次數大於1的人數和大於2的人數。
insert overwrite table ads_repeat_purchase select * from ads_repeat_purchase union select '2020-06-14' dt, recent_days, tm_id, tm_name, cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2)) from ( select recent_days, user_id, tm_id, tm_name, sum(order_count) order_count from ( select recent_days, user_id, sku_id, count(*) order_count from dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('2020-06-14',-29) and dt>=date_add('2020-06-14',-recent_days+1) group by recent_days, user_id,sku_id )t1 left join ( select id, tm_id, tm_name from dim_sku_info where dt='2020-06-14' )t2 on t1.sku_id=t2.id group by recent_days,user_id,tm_id,tm_name )t3 group by recent_days,tm_id,tm_name;
第四章 訂單主題
4.1 訂單統計
該需求包含訂單總數,訂單總金額和下單總人數。
1.建表語句
DROP TABLE IF EXISTS ads_order_total; CREATE EXTERNAL TABLE `ads_order_total` ( `dt` STRING COMMENT '統計日期', `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天', `order_count` BIGINT COMMENT '訂單數', `order_amount` DECIMAL(16,2) COMMENT '訂單金額', `order_user_count` BIGINT COMMENT '下單人數' ) COMMENT '訂單統計' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_total/';
2.數據裝載
insert overwrite table ads_order_total select * from ads_order_total union select '2020-06-14', recent_days, sum(order_count), sum(order_final_amount) order_final_amount, sum(if(order_final_amount>0,1,0)) order_user_count from ( select recent_days, user_id, case when recent_days=0 then order_count when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=0 then order_final_amount when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_final_amount from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 group by recent_days;
4.2 各地區訂單統計
該需求包含各省份訂單總數和訂單總金額。
1.建表語句
DROP TABLE IF EXISTS ads_order_by_province; CREATE EXTERNAL TABLE `ads_order_by_province` ( `dt` STRING COMMENT '統計日期', `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天', `province_id` STRING COMMENT '省份ID', `province_name` STRING COMMENT '省份名稱', `area_code` STRING COMMENT '地區編碼', `iso_code` STRING COMMENT '國際標准地區編碼', `iso_code_3166_2` STRING COMMENT '國際標准地區編碼', `order_count` BIGINT COMMENT '訂單數', `order_amount` DECIMAL(16,2) COMMENT '訂單金額' ) COMMENT '各地區訂單統計' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_by_province/';
2.數據裝載
insert overwrite table ads_order_by_province select * from ads_order_by_province union select dt, recent_days, province_id, province_name, area_code, iso_code, iso_3166_2, order_count, order_amount from ( select '2020-06-14' dt, recent_days, province_id, sum(order_count) order_count, sum(order_amount) order_amount from ( select recent_days, province_id, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_amount from dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 group by recent_days,province_id )t2 join dim_base_province t3 on t2.province_id=t3.id;
第五章 優惠券主題
5.1 優惠券統計
該需求要求統計最近30日發布的所有優惠券的領用情況和補貼率,補貼率是指,優惠金額與使用優惠券的訂單的原價金額的比值。
1.建表語句
DROP TABLE IF EXISTS ads_coupon_stats; CREATE EXTERNAL TABLE ads_coupon_stats ( `dt` STRING COMMENT '統計日期', `coupon_id` STRING COMMENT '優惠券ID', `coupon_name` STRING COMMENT '優惠券名稱', `start_date` STRING COMMENT '發布日期', `rule_name` STRING COMMENT '優惠規則,例如滿100元減10元', `get_count` BIGINT COMMENT '領取次數', `order_count` BIGINT COMMENT '使用(下單)次數', `expire_count` BIGINT COMMENT '過期次數', `order_original_amount` DECIMAL(16,2) COMMENT '使用優惠券訂單原始金額', `order_final_amount` DECIMAL(16,2) COMMENT '使用優惠券訂單最終金額', `reduce_amount` DECIMAL(16,2) COMMENT '優惠金額', `reduce_rate` DECIMAL(16,2) COMMENT '補貼率' ) COMMENT '商品銷售統計' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_coupon_stats/';
2.數據裝載
insert overwrite table ads_coupon_stats select * from ads_coupon_stats union select '2020-06-14' dt, t1.id, coupon_name, start_date, rule_name, get_count, order_count, expire_count, order_original_amount, order_final_amount, reduce_amount, reduce_rate from ( select id, coupon_name, date_format(start_time,'yyyy-MM-dd') start_date, case when coupon_type='3201' then concat('滿',condition_amount,'元減',benefit_amount,'元') when coupon_type='3202' then concat('滿',condition_num,'件打', (1-benefit_discount)*10,'折') when coupon_type='3203' then concat('減',benefit_amount,'元') end rule_name from dim_coupon_info where dt='2020-06-14' and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29) )t1 left join ( select coupon_id, get_count, order_count, expire_count, order_original_amount, order_final_amount, order_reduce_amount reduce_amount, cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate from dwt_coupon_topic where dt='2020-06-14' )t2 on t1.id=t2.coupon_id;
第六章 活動主題
6.1 活動統計
該需求要求統計最近30日發布的所有活動的參與情況和補貼率,補貼率是指,優惠金額與參與活動的訂單原價金額的比值。
1.建表語句
DROP TABLE IF EXISTS ads_activity_stats; CREATE EXTERNAL TABLE `ads_activity_stats` ( `dt` STRING COMMENT '統計日期', `activity_id` STRING COMMENT '活動ID', `activity_name` STRING COMMENT '活動名稱', `start_date` STRING COMMENT '活動開始日期', `order_count` BIGINT COMMENT '參與活動訂單數', `order_original_amount` DECIMAL(16,2) COMMENT '參與活動訂單原始金額', `order_final_amount` DECIMAL(16,2) COMMENT '參與活動訂單最終金額', `reduce_amount` DECIMAL(16,2) COMMENT '優惠金額', `reduce_rate` DECIMAL(16,2) COMMENT '補貼率' ) COMMENT '商品銷售統計' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_activity_stats/';
2.數據裝載
insert overwrite table ads_activity_stats select * from ads_activity_stats union select '2020-06-14' dt, t4.activity_id, activity_name, start_date, order_count, order_original_amount, order_final_amount, reduce_amount, reduce_rate from ( select activity_id, activity_name, date_format(start_time,'yyyy-MM-dd') start_date from dim_activity_rule_info where dt='2020-06-14' and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29) group by activity_id,activity_name,start_time )t4 left join ( select activity_id, sum(order_count) order_count, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(order_reduce_amount) reduce_amount, cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate from dwt_activity_topic where dt='2020-06-14' group by activity_id )t5 on t4.activity_id=t5.activity_id;
第七章 ADS層業務數據導入腳本
1)編寫腳本
(1)在/home/atguigu/bin目錄下創建腳本dwt_to_ads.sh
[atguigu@hadoop102 bin]$ vim dwt_to_ads.sh
在腳本中填寫如下內容
#!/bin/bash APP=gmall # 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi ads_activity_stats=" insert overwrite table ${APP}.ads_activity_stats select * from ${APP}.ads_activity_stats union select '$do_date' dt, t4.activity_id, activity_name, start_date, order_count, order_original_amount, order_final_amount, reduce_amount, reduce_rate from ( select activity_id, activity_name, date_format(start_time,'yyyy-MM-dd') start_date from ${APP}.dim_activity_rule_info where dt='$do_date' and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29) group by activity_id,activity_name,start_time )t4 left join ( select activity_id, sum(order_count) order_count, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(order_reduce_amount) reduce_amount, cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate from ${APP}.dwt_activity_topic where dt='$do_date' group by activity_id )t5 on t4.activity_id=t5.activity_id; " ads_coupon_stats=" insert overwrite table ${APP}.ads_coupon_stats select * from ${APP}.ads_coupon_stats union select '$do_date' dt, t1.id, coupon_name, start_date, rule_name, get_count, order_count, expire_count, order_original_amount, order_final_amount, reduce_amount, reduce_rate from ( select id, coupon_name, date_format(start_time,'yyyy-MM-dd') start_date, case when coupon_type='3201' then concat('滿',condition_amount,'元減',benefit_amount,'元') when coupon_type='3202' then concat('滿',condition_num,'件打', (1-benefit_discount)*10,'折') when coupon_type='3203' then concat('減',benefit_amount,'元') end rule_name from ${APP}.dim_coupon_info where dt='$do_date' and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29) )t1 left join ( select coupon_id, get_count, order_count, expire_count, order_original_amount, order_final_amount, order_reduce_amount reduce_amount, cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate from ${APP}.dwt_coupon_topic where dt='$do_date' )t2 on t1.id=t2.coupon_id; " ads_order_by_province=" insert overwrite table ${APP}.ads_order_by_province select * from ${APP}.ads_order_by_province union select dt, recent_days, province_id, province_name, area_code, iso_code, iso_3166_2, order_count, order_amount from ( select '$do_date' dt, recent_days, province_id, sum(order_count) order_count, sum(order_amount) order_amount from ( select recent_days, province_id, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_amount from ${APP}.dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='$do_date' )t1 group by recent_days,province_id )t2 join ${APP}.dim_base_province t3 on t2.province_id=t3.id; " ads_order_spu_stats=" insert overwrite table ${APP}.ads_order_spu_stats select * from ${APP}.ads_order_spu_stats union select '$do_date' dt, recent_days, spu_id, spu_name, tm_id, tm_name, category3_id, category3_name, category2_id, category2_name, category1_id, category1_name, sum(order_count), sum(order_amount) from ( select recent_days, sku_id, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_amount from ${APP}.dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='$do_date' )t1 left join ( select id, spu_id, spu_name, tm_id, tm_name, category3_id, category3_name, category2_id, category2_name, category1_id, category1_name from ${APP}.dim_sku_info where dt='$do_date' )t2 on t1.sku_id=t2.id group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name; " ads_order_total=" insert overwrite table ${APP}.ads_order_total select * from ${APP}.ads_order_total union select '$do_date', recent_days, sum(order_count), sum(order_final_amount) order_final_amount, sum(if(order_final_amount>0,1,0)) order_user_count from ( select recent_days, user_id, case when recent_days=0 then order_count when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=0 then order_final_amount when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_final_amount from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='$do_date' )t1 group by recent_days; " ads_page_path=" insert overwrite table ${APP}.ads_page_path select * from ${APP}.ads_page_path union select '$do_date', recent_days, source, target, count(*) from ( select recent_days, concat('step-',step,':',source) source, concat('step-',step+1,':',target) target from ( select recent_days, page_id source, lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target, row_number() over (partition by recent_days,session_id order by ts) step from ( select recent_days, last_page_id, page_id, ts, concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id from ${APP}.dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('$do_date',-30) and dt>=date_add('$do_date',-recent_days+1) )t2 )t3 )t4 group by recent_days,source,target; " ads_repeat_purchase=" insert overwrite table ${APP}.ads_repeat_purchase select * from ${APP}.ads_repeat_purchase union select '$do_date' dt, recent_days, tm_id, tm_name, cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2)) from ( select recent_days, user_id, tm_id, tm_name, sum(order_count) order_count from ( select recent_days, user_id, sku_id, count(*) order_count from ${APP}.dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('$do_date',-29) and dt>=date_add('$do_date',-recent_days+1) group by recent_days, user_id,sku_id )t1 left join ( select id, tm_id, tm_name from ${APP}.dim_sku_info where dt='$do_date' )t2 on t1.sku_id=t2.id group by recent_days,user_id,tm_id,tm_name )t3 group by recent_days,tm_id,tm_name; " ads_user_action=" with tmp_page as ( select '$do_date' dt, recent_days, sum(if(array_contains(pages,'home'),1,0)) home_count, sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count from ( select recent_days, mid_id, collect_set(page_id) pages from ( select dt, mid_id, page.page_id from ${APP}.dws_visitor_action_daycount lateral view explode(page_stats) tmp as page where dt>=date_add('$do_date',-29) and page.page_id in('home','good_detail') )t1 lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('$do_date',-recent_days+1) group by recent_days,mid_id )t2 group by recent_days ), tmp_cop as ( select '$do_date' dt, recent_days, sum(if(cart_count>0,1,0)) cart_count, sum(if(order_count>0,1,0)) order_count, sum(if(payment_count>0,1,0)) payment_count from ( select recent_days, user_id, case when recent_days=1 then cart_last_1d_count when recent_days=7 then cart_last_7d_count when recent_days=30 then cart_last_30d_count end cart_count, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then payment_last_1d_count when recent_days=7 then payment_last_7d_count when recent_days=30 then payment_last_30d_count end payment_count from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='$do_date' )t1 group by recent_days ) insert overwrite table ${APP}.ads_user_action select * from ${APP}.ads_user_action union select tmp_page.dt, tmp_page.recent_days, home_count, good_detail_count, cart_count, order_count, payment_count from tmp_page join tmp_cop on tmp_page.recent_days=tmp_cop.recent_days; " ads_user_change=" insert overwrite table ${APP}.ads_user_change select * from ${APP}.ads_user_change union select churn.dt, user_churn_count, user_back_count from ( select '$do_date' dt, count(*) user_churn_count from ${APP}.dwt_user_topic where dt='$do_date' and login_date_last=date_add('$do_date',-7) )churn join ( select '$do_date' dt, count(*) user_back_count from ( select user_id, login_date_last from ${APP}.dwt_user_topic where dt='$do_date' and login_date_last='$do_date' )t1 join ( select user_id, login_date_last login_date_previous from ${APP}.dwt_user_topic where dt=date_add('$do_date',-1) )t2 on t1.user_id=t2.user_id where datediff(login_date_last,login_date_previous)>=8 )back on churn.dt=back.dt; " ads_user_retention=" insert overwrite table ${APP}.ads_user_retention select * from ${APP}.ads_user_retention union select '$do_date', login_date_first create_date, datediff('$do_date',login_date_first) retention_day, sum(if(login_date_last='$do_date',1,0)) retention_count, count(*) new_user_count, cast(sum(if(login_date_last='$do_date',1,0))/count(*)*100 as decimal(16,2)) retention_rate from ${APP}.dwt_user_topic where dt='$do_date' and login_date_first>=date_add('$do_date',-7) and login_date_first<'$do_date' group by login_date_first; " ads_user_total=" insert overwrite table ${APP}.ads_user_total select * from ${APP}.ads_user_total union select '$do_date', recent_days, sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count, sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count, sum(order_final_amount) order_final_amount, sum(if(order_final_amount>0,1,0)) order_user_count, sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count from ( select recent_days, user_id, login_date_first, login_date_last, order_date_first, case when recent_days=0 then order_final_amount when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_final_amount, if(recent_days=0,'1970-01-01',date_add('$do_date',-recent_days+1)) recent_days_ago from ${APP}.dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days where dt='$do_date' )t1 group by recent_days; " ads_visit_stats=" insert overwrite table ${APP}.ads_visit_stats select * from ${APP}.ads_visit_stats union select '$do_date' dt, is_new, recent_days, channel, count(distinct(mid_id)) uv_count, cast(sum(duration)/1000 as bigint) duration_sec, cast(avg(duration)/1000 as bigint) avg_duration_sec, sum(page_count) page_count, cast(avg(page_count) as bigint) avg_page_count, count(*) sv_count, sum(if(page_count=1,1,0)) bounce_count, cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate from ( select session_id, mid_id, is_new, recent_days, channel, count(*) page_count, sum(during_time) duration from ( select mid_id, channel, recent_days, is_new, last_page_id, page_id, during_time, concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id from ( select mid_id, channel, last_page_id, page_id, during_time, ts, recent_days, if(visit_date_first>=date_add('$do_date',-recent_days+1),'1','0') is_new from ( select t1.mid_id, t1.channel, t1.last_page_id, t1.page_id, t1.during_time, t1.dt, t1.ts, t2.visit_date_first from ( select mid_id, channel, last_page_id, page_id, during_time, dt, ts from ${APP}.dwd_page_log where dt>=date_add('$do_date',-30) )t1 left join ( select mid_id, visit_date_first from ${APP}.dwt_visitor_topic where dt='$do_date' )t2 on t1.mid_id=t2.mid_id )t3 lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('$do_date',-recent_days+1) )t4 )t5 group by session_id,mid_id,is_new,recent_days,channel )t6 group by is_new,recent_days,channel; " case $1 in "ads_activity_stats" ) hive -e "$ads_activity_stats" ;; "ads_coupon_stats" ) hive -e "$ads_coupon_stats" ;; "ads_order_by_province" ) hive -e "$ads_order_by_province" ;; "ads_order_spu_stats" ) hive -e "$ads_order_spu_stats" ;; "ads_order_total" ) hive -e "$ads_order_total" ;; "ads_page_path" ) hive -e "$ads_page_path" ;; "ads_repeat_purchase" ) hive -e "$ads_repeat_purchase" ;; "ads_user_action" ) hive -e "$ads_user_action" ;; "ads_user_change" ) hive -e "$ads_user_change" ;; "ads_user_retention" ) hive -e "$ads_user_retention" ;; "ads_user_total" ) hive -e "$ads_user_total" ;; "ads_visit_stats" ) hive -e "$ads_visit_stats" ;; "all" ) hive -e "$ads_activity_stats$ads_coupon_stats$ads_order_by_province$ads_order_spu_stats$ads_order_total$ads_page_path$ads_repeat_purchase$ads_user_action$ads_user_change$ads_user_retention$ads_user_total$ads_visit_stats" ;; esac
(2)增加腳本執行權限
[atguigu@hadoop102 bin]$ chmod 777 dwt_to_ads.sh
2)腳本使用
(1)執行腳本
[atguigu@hadoop102 bin]$ dwt_to_ads.sh all 2020-06-14
(2)查看數據是否導入