數據倉庫(十)——ADS 層


建表說明

ADS層不涉及建模,建表根據具體需求而定。

第一章 訪客主題

1.1 訪客統計

該需求為訪客綜合統計,其中包含若干指標,以下為對每個指標的解釋說明。

指標

說明

對應字段

訪客數

統計訪問人數

uv_count

頁面停留時長

統計所有頁面訪問記錄總時長,以秒為單位

duration_sec

平均頁面停留時長

統計每個會話平均停留時長,以秒為單位

avg_duration_sec

頁面瀏覽總數

統計所有頁面訪問記錄總數

page_count

平均頁面瀏覽數

統計每個會話平均瀏覽頁面數

avg_page_count

會話總數

統計會話總數

sv_count

跳出數

統計只瀏覽一個頁面的會話個數

bounce_count

跳出率

只有一個頁面的會話的比例

bounce_rate

1.建表語句

DROP TABLE IF EXISTS ads_visit_stats;
CREATE EXTERNAL TABLE ads_visit_stats (
  `dt` STRING COMMENT '統計日期',
  `is_new` STRING COMMENT '新老標識,1:新,0:老',
  `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天',
  `channel` STRING COMMENT '渠道',
  `uv_count` BIGINT COMMENT '日活(訪問人數)',
  `duration_sec` BIGINT COMMENT '頁面停留總時長',
  `avg_duration_sec` BIGINT COMMENT '一次會話,頁面停留平均時長,單位為描述',
  `page_count` BIGINT COMMENT '頁面總瀏覽數',
  `avg_page_count` BIGINT COMMENT '一次會話,頁面平均瀏覽數',
  `sv_count` BIGINT COMMENT '會話次數',
  `bounce_count` BIGINT COMMENT '跳出數',
  `bounce_rate` DECIMAL(16,2) COMMENT '跳出率'
) COMMENT '訪客統計'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_visit_stats/';

2.數據裝載
思路分析:該需求的關鍵點為會話的划分,總體實現思路可分為以下幾步:
第一步:對所有頁面訪問記錄進行會話的划分。
第二步:統計每個會話的瀏覽時長和瀏覽頁面數。
第三步:統計上述各指標。

insert overwrite table ads_visit_stats
select * from ads_visit_stats
union
select
    '2020-06-14' dt,
    is_new,
    recent_days,
    channel,
    count(distinct(mid_id)) uv_count,
    cast(sum(duration)/1000 as bigint) duration_sec,
    cast(avg(duration)/1000 as bigint) avg_duration_sec,
    sum(page_count) page_count,
    cast(avg(page_count) as bigint) avg_page_count,
    count(*) sv_count,
    sum(if(page_count=1,1,0)) bounce_count,
    cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate
from
(
    select
        session_id,
        mid_id,
        is_new,
        recent_days,
        channel,
        count(*) page_count,
        sum(during_time) duration
    from
    (
        select
            mid_id,
            channel,
            recent_days,
            is_new,
            last_page_id,
            page_id,
            during_time,
            concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id
        from
        (
            select
                mid_id,
                channel,
                last_page_id,
                page_id,
                during_time,
                ts,
                recent_days,
                if(visit_date_first>=date_add('2020-06-14',-recent_days+1),'1','0') is_new
            from
            (
                select
                    t1.mid_id,
                    t1.channel,
                    t1.last_page_id,
                    t1.page_id,
                    t1.during_time,
                    t1.dt,
                    t1.ts,
                    t2.visit_date_first
                from
                (
                    select
                        mid_id,
                        channel,
                        last_page_id,
                        page_id,
                        during_time,
                        dt,
                        ts
                    from dwd_page_log
                    where dt>=date_add('2020-06-14',-30)
                )t1
                left join
                (
                    select
                        mid_id,
                        visit_date_first
                    from dwt_visitor_topic
                    where dt='2020-06-14'
                )t2
                on t1.mid_id=t2.mid_id
            )t3 lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt>=date_add('2020-06-14',-recent_days+1)
        )t4
    )t5
    group by session_id,mid_id,is_new,recent_days,channel
)t6
group by is_new,recent_days,channel;

1.2 路徑分析

用戶路徑分析,顧名思義,就是指用戶在APP或網站中的訪問路徑。為了衡量網站優化的效果或營銷推廣的效果,以及了解用戶行為偏好,時常要對訪問路徑進行分析。

用戶訪問路徑的可視化通常使用桑基圖。如下圖所示,該圖可真實還原用戶的訪問路徑,包括頁面跳轉和頁面訪問次序。

桑基圖需要我們提供每種頁面跳轉的次數,每個跳轉由source/target表示,source指跳轉起始頁面,target表示跳轉終到頁面。

1.建表語句

DROP TABLE IF EXISTS ads_page_path;
CREATE EXTERNAL TABLE ads_page_path
(
    `dt` STRING COMMENT '統計日期',
    `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天',
    `source` STRING COMMENT '跳轉起始頁面ID',
    `target` STRING COMMENT '跳轉終到頁面ID',
    `path_count` BIGINT COMMENT '跳轉次數'
)  COMMENT '頁面瀏覽路徑'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_page_path/';

2.數據裝載
思路分析:該需求要統計的就是每種跳轉的次數,故理論上對source/target進行分組count()即可。統計時需注意以下兩點:
第一點:桑基圖的source不允許為空,但target可為空。
第二點:桑基圖所展示的流程不允許存在環。

insert overwrite table ads_page_path
select * from ads_page_path
union
select
    '2020-06-14',
    recent_days,
    source,
    target,
    count(*)
from
(
    select
        recent_days,
        concat('step-',step,':',source) source,
        concat('step-',step+1,':',target) target
    from
    (
        select
            recent_days,
            page_id source,
            lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target,
            row_number() over (partition by recent_days,session_id order by ts) step
        from
        (
            select
                recent_days,
                last_page_id,
                page_id,
                ts,
                concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id
            from dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt>=date_add('2020-06-14',-30)
            and dt>=date_add('2020-06-14',-recent_days+1)
        )t2
    )t3
)t4
group by recent_days,source,target;

第二章 用戶主題

2.1 用戶統計

該需求為用戶綜合統計,其中包含若干指標,以下為對每個指標的解釋說明。

指標

說明

對應字段

新增用戶數

統計新增注冊用戶人數

new_user_count

新增下單用戶數

統計新增下單用戶人數

new_order_user_count

下單總金額

統計所有訂單總額

order_final_amount

下單用戶數

統計下單用戶總數

order_user_count

未下單用戶數

統計活躍但未下單用戶數

no_order_user_count


1.建表語句

DROP TABLE IF EXISTS ads_user_total;
CREATE EXTERNAL TABLE `ads_user_total` (
  `dt` STRING COMMENT '統計日期',
  `recent_days` BIGINT COMMENT '最近天數,0:累積值,1:最近1天,7:最近7天,30:最近30天',
  `new_user_count` BIGINT COMMENT '新注冊用戶數',
  `new_order_user_count` BIGINT COMMENT '新增下單用戶數',
  `order_final_amount` DECIMAL(16,2) COMMENT '下單總金額',
  `order_user_count` BIGINT COMMENT '下單用戶數',
  `no_order_user_count` BIGINT COMMENT '未下單用戶數(具體指活躍用戶中未下單用戶)'
) COMMENT '用戶統計'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_total/';

2.數據裝載

insert overwrite table ads_user_total
select * from ads_user_total
union
select
    '2020-06-14',
    recent_days,
    sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count,
    sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count,
    sum(order_final_amount) order_final_amount,
    sum(if(order_final_amount>0,1,0)) order_user_count,
    sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count
from
(
    select
        recent_days,
        user_id,
        login_date_first,
        login_date_last,
        order_date_first,
        case when recent_days=0 then order_final_amount
             when recent_days=1 then order_last_1d_final_amount
             when recent_days=7 then order_last_7d_final_amount
             when recent_days=30 then order_last_30d_final_amount
        end order_final_amount,
        if(recent_days=0,'1970-01-01',date_add('2020-06-14',-recent_days+1)) recent_days_ago
    from dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days
    where dt='2020-06-14'
)t1
group by recent_days;

2.2 用戶變動統計

該需求包括兩個指標,分別為流失用戶數和回流用戶數,以下為對兩個指標的解釋說明。

指標

說明

對應字段

流失用戶數

之前活躍過的用戶,最近一段時間未活躍,就稱為流失用戶。此處要求統計7日前(只包含7日前當天)活躍,但最近7日未活躍的用戶總數。

user_churn_count

回流用戶數

之前的活躍用戶,一段時間未活躍(流失),今日又活躍了,就稱為回流用戶。此處要求統計回流用戶總數。

new_order_user_count

1.建表語句

DROP TABLE IF EXISTS ads_user_change;
CREATE EXTERNAL TABLE `ads_user_change` (
  `dt` STRING COMMENT '統計日期',
  `user_churn_count` BIGINT COMMENT '流失用戶數',
  `user_back_count` BIGINT COMMENT '回流用戶數'
) COMMENT '用戶變動統計'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_change/';

2.數據裝載
思路分析:
流失用戶:末次活躍時間為7日前的用戶即為流失用戶。
回流用戶:末次活躍時間為今日,上次活躍時間在8日前的用戶即為回流用戶。

insert overwrite table ads_user_change
select * from ads_user_change
union
select
    churn.dt,
    user_churn_count,
    user_back_count
from
(
    select
        '2020-06-14' dt,
        count(*) user_churn_count
    from dwt_user_topic
    where dt='2020-06-14'
    and login_date_last=date_add('2020-06-14',-7)
)churn
join
(
    select
        '2020-06-14' dt,
        count(*) user_back_count
    from
    (
        select
            user_id,
            login_date_last
        from dwt_user_topic
        where dt='2020-06-14'
        and login_date_last='2020-06-14'
    )t1
    join
    (
        select
            user_id,
            login_date_last login_date_previous
        from dwt_user_topic
        where dt=date_add('2020-06-14',-1)
    )t2
    on t1.user_id=t2.user_id
    where datediff(login_date_last,login_date_previous)>=8
)back
on churn.dt=back.dt;

2.3 用戶行為漏斗分析

漏斗分析是一個數據分析模型,它能夠科學反映一個業務過程從起點到終點各階段用戶轉化情況。由於其能將各階段環節都展示出來,故哪個階段存在問題,就能一目了然。

該需求要求統計一個完整的購物流程各個階段的人數。
1.建表語句

DROP TABLE IF EXISTS ads_user_action;
CREATE EXTERNAL TABLE `ads_user_action` (
  `dt` STRING COMMENT '統計日期',
  `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天',
  `home_count` BIGINT COMMENT '瀏覽首頁人數',
  `good_detail_count` BIGINT COMMENT '瀏覽商品詳情頁人數',
  `cart_count` BIGINT COMMENT '加入購物車人數',
  `order_count` BIGINT COMMENT '下單人數',
  `payment_count` BIGINT COMMENT '支付人數'
) COMMENT '漏斗分析'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_action/';

2.數據裝載

with
tmp_page as
(
    select
        '2020-06-14' dt,
        recent_days,
        sum(if(array_contains(pages,'home'),1,0)) home_count,
        sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
    from
    (
        select
            recent_days,
            mid_id,
            collect_set(page_id) pages
        from
        (
            select
                dt,
                mid_id,
                page.page_id
            from dws_visitor_action_daycount lateral view explode(page_stats) tmp as page
            where dt>=date_add('2020-06-14',-29)
            and page.page_id in('home','good_detail')
        )t1 lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt>=date_add('2020-06-14',-recent_days+1)
        group by recent_days,mid_id
    )t2
    group by recent_days
),
tmp_cop as
(
    select
        '2020-06-14' dt,
        recent_days,
        sum(if(cart_count>0,1,0)) cart_count,
        sum(if(order_count>0,1,0)) order_count,
        sum(if(payment_count>0,1,0)) payment_count
    from
    (
        select
            recent_days,
            user_id,
            case
                when recent_days=1 then cart_last_1d_count
                when recent_days=7 then cart_last_7d_count
                when recent_days=30 then cart_last_30d_count
            end cart_count,
            case
                when recent_days=1 then order_last_1d_count
                when recent_days=7 then order_last_7d_count
                when recent_days=30 then order_last_30d_count
            end order_count,
            case
                when recent_days=1 then payment_last_1d_count
                when recent_days=7 then payment_last_7d_count
                when recent_days=30 then payment_last_30d_count
            end payment_count
        from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days
)
insert overwrite table ads_user_action
select * from ads_user_action
union
select
    tmp_page.dt,
    tmp_page.recent_days,
    home_count,
    good_detail_count,
    cart_count,
    order_count,
    payment_count
from tmp_page
join tmp_cop
on tmp_page.recent_days=tmp_cop.recent_days;

2.4 用戶留存率

留存分析一般包含新增留存和活躍留存分析。

新增留存分析是分析某天的新增用戶中,有多少人有后續的活躍行為。活躍留存分析是分析某天的活躍用戶中,有多少人有后續的活躍行為。

留存分析是衡量產品對用戶價值高低的重要指標。

此處要求統計新增留存率,新增留存率具體是指留存用戶數與新增用戶數的比值,例如2020-06-14新增100個用戶,1日之后(2020-06-15)這100人中有80個人活躍了,那2020-06-14的1日留存數則為80,2020-06-14的1日留存率則為80%。

要求統計每天的1至7日留存率,如下圖所示。

1.建表語句

DROP TABLE IF EXISTS ads_user_retention;
CREATE EXTERNAL TABLE ads_user_retention (
  `dt` STRING COMMENT '統計日期',
  `create_date` STRING COMMENT '用戶新增日期',
  `retention_day` BIGINT COMMENT '截至當前日期留存天數',
  `retention_count` BIGINT COMMENT '留存用戶數量',
  `new_user_count` BIGINT COMMENT '新增用戶數量',
  `retention_rate` DECIMAL(16,2) COMMENT '留存率'
) COMMENT '用戶留存率'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_user_retention/';

2.數據裝載

insert overwrite table ads_user_retention
select * from ads_user_retention
union
select
    '2020-06-14',
    login_date_first create_date,
    datediff('2020-06-14',login_date_first) retention_day,
    sum(if(login_date_last='2020-06-14',1,0)) retention_count,
    count(*) new_user_count,
    cast(sum(if(login_date_last='2020-06-14',1,0))/count(*)*100 as decimal(16,2)) retention_rate
from dwt_user_topic
where dt='2020-06-14'
and login_date_first>=date_add('2020-06-14',-7)
and login_date_first<'2020-06-14'
group by login_date_first;

第三章 商品主題

3.1 商品統計

該指標為商品綜合統計,包含每個spu被下單總次數和被下單總金額。

1.建表語句

DROP TABLE IF EXISTS ads_order_spu_stats;
CREATE EXTERNAL TABLE `ads_order_spu_stats` (
    `dt` STRING COMMENT '統計日期',
    `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天',
    `spu_id` STRING COMMENT '商品ID',
    `spu_name` STRING COMMENT '商品名稱',
    `tm_id` STRING COMMENT '品牌ID',
    `tm_name` STRING COMMENT '品牌名稱',
    `category3_id` STRING COMMENT '三級品類ID',
    `category3_name` STRING COMMENT '三級品類名稱',
    `category2_id` STRING COMMENT '二級品類ID',
    `category2_name` STRING COMMENT '二級品類名稱',
    `category1_id` STRING COMMENT '一級品類ID',
    `category1_name` STRING COMMENT '一級品類名稱',
    `order_count` BIGINT COMMENT '訂單數',
    `order_amount` DECIMAL(16,2) COMMENT '訂單金額'
) COMMENT '商品銷售統計'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_order_spu_stats/';

2.數據裝載

insert overwrite table ads_order_spu_stats
select * from ads_order_spu_stats
union
select
    '2020-06-14' dt,
    recent_days,
    spu_id,
    spu_name,
    tm_id,
    tm_name,
    category3_id,
    category3_name,
    category2_id,
    category2_name,
    category1_id,
    category1_name,
    sum(order_count),
    sum(order_amount)
from
(
    select
        recent_days,
        sku_id,
        case
            when recent_days=1 then order_last_1d_count
            when recent_days=7 then order_last_7d_count
            when recent_days=30 then order_last_30d_count
        end order_count,
        case
            when recent_days=1 then order_last_1d_final_amount
            when recent_days=7 then order_last_7d_final_amount
            when recent_days=30 then order_last_30d_final_amount
        end order_amount
    from dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days
    where dt='2020-06-14'
)t1
left join
(
    select
        id,
        spu_id,
        spu_name,
        tm_id,
        tm_name,
        category3_id,
        category3_name,
        category2_id,
        category2_name,
        category1_id,
        category1_name
    from dim_sku_info
    where dt='2020-06-14'
)t2
on t1.sku_id=t2.id
group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;

3.2 品牌復購率

品牌復購率是指一段時間內重復購買某品牌的人數與購買過該品牌的人數的比值。重復購買即購買次數大於等於2,購買過即購買次數大於1。

此處要求統計最近1,7,30天的各品牌復購率。

1.建表語句

DROP TABLE IF EXISTS ads_repeat_purchase;
CREATE EXTERNAL TABLE `ads_repeat_purchase` (
  `dt` STRING COMMENT '統計日期',
  `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天',
  `tm_id` STRING COMMENT '品牌ID',
  `tm_name` STRING COMMENT '品牌名稱',
  `order_repeat_rate` DECIMAL(16,2) COMMENT '復購率'
) COMMENT '品牌復購率'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_repeat_purchase/';

2.數據裝載
思路分析:該需求可分兩步實現:
第一步:統計每個用戶購買每個品牌的次數。
第二步:分別統計購買次數大於1的人數和大於2的人數。

insert overwrite table ads_repeat_purchase
select * from ads_repeat_purchase
union
select
    '2020-06-14' dt,
    recent_days,
    tm_id,
    tm_name,
    cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2))
from
(
    select
        recent_days,
        user_id,
        tm_id,
        tm_name,
        sum(order_count) order_count
    from
    (
        select
            recent_days,
            user_id,
            sku_id,
            count(*) order_count
        from dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt>=date_add('2020-06-14',-29)
        and dt>=date_add('2020-06-14',-recent_days+1)
        group by recent_days, user_id,sku_id
    )t1
    left join
    (
        select
            id,
            tm_id,
            tm_name
        from dim_sku_info
        where dt='2020-06-14'
    )t2
    on t1.sku_id=t2.id
    group by recent_days,user_id,tm_id,tm_name
)t3
group by recent_days,tm_id,tm_name;

第四章 訂單主題

4.1 訂單統計

該需求包含訂單總數,訂單總金額和下單總人數。
1.建表語句

DROP TABLE IF EXISTS ads_order_total;
CREATE EXTERNAL TABLE `ads_order_total` (
  `dt` STRING COMMENT '統計日期',
  `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天',
  `order_count` BIGINT COMMENT '訂單數',
  `order_amount` DECIMAL(16,2) COMMENT '訂單金額',
  `order_user_count` BIGINT COMMENT '下單人數'
) COMMENT '訂單統計'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_order_total/';

2.數據裝載

insert overwrite table ads_order_total
select * from ads_order_total
union
select
    '2020-06-14',
    recent_days,
    sum(order_count),
    sum(order_final_amount) order_final_amount,
    sum(if(order_final_amount>0,1,0)) order_user_count
from
(
    select
        recent_days,
        user_id,
        case when recent_days=0 then order_count
             when recent_days=1 then order_last_1d_count
             when recent_days=7 then order_last_7d_count
             when recent_days=30 then order_last_30d_count
        end order_count,
        case when recent_days=0 then order_final_amount
             when recent_days=1 then order_last_1d_final_amount
             when recent_days=7 then order_last_7d_final_amount
             when recent_days=30 then order_last_30d_final_amount
        end order_final_amount
    from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
    where dt='2020-06-14'
)t1
group by recent_days;

4.2 各地區訂單統計

該需求包含各省份訂單總數和訂單總金額。
1.建表語句

DROP TABLE IF EXISTS ads_order_by_province;
CREATE EXTERNAL TABLE `ads_order_by_province` (
  `dt` STRING COMMENT '統計日期',
  `recent_days` BIGINT COMMENT '最近天數,1:最近1天,7:最近7天,30:最近30天',
  `province_id` STRING COMMENT '省份ID',
  `province_name` STRING COMMENT '省份名稱',
  `area_code` STRING COMMENT '地區編碼',
  `iso_code` STRING COMMENT '國際標准地區編碼',
  `iso_code_3166_2` STRING COMMENT '國際標准地區編碼',
  `order_count` BIGINT COMMENT '訂單數',
  `order_amount` DECIMAL(16,2) COMMENT '訂單金額'
) COMMENT '各地區訂單統計'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_order_by_province/';

2.數據裝載

insert overwrite table ads_order_by_province
select * from ads_order_by_province
union
select
    dt,
    recent_days,
    province_id,
    province_name,
    area_code,
    iso_code,
    iso_3166_2,
    order_count,
    order_amount
from
(
    select
        '2020-06-14' dt,
        recent_days,
        province_id,
        sum(order_count) order_count,
        sum(order_amount) order_amount
    from
    (
        select
            recent_days,
            province_id,
            case
                when recent_days=1 then order_last_1d_count
                when recent_days=7 then order_last_7d_count
                when recent_days=30 then order_last_30d_count
            end order_count,
            case
                when recent_days=1 then order_last_1d_final_amount
                when recent_days=7 then order_last_7d_final_amount
                when recent_days=30 then order_last_30d_final_amount
            end order_amount
        from dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days,province_id
)t2
join dim_base_province t3
on t2.province_id=t3.id;

第五章 優惠券主題

5.1 優惠券統計

該需求要求統計最近30日發布的所有優惠券的領用情況和補貼率,補貼率是指,優惠金額與使用優惠券的訂單的原價金額的比值。
1.建表語句

DROP TABLE IF EXISTS ads_coupon_stats;
CREATE EXTERNAL TABLE ads_coupon_stats (
  `dt` STRING COMMENT '統計日期',
  `coupon_id` STRING COMMENT '優惠券ID',
  `coupon_name` STRING COMMENT '優惠券名稱',
  `start_date` STRING COMMENT '發布日期',
  `rule_name` STRING COMMENT '優惠規則,例如滿100元減10元',
  `get_count`  BIGINT COMMENT '領取次數',
  `order_count` BIGINT COMMENT '使用(下單)次數',
  `expire_count`  BIGINT COMMENT '過期次數',
  `order_original_amount` DECIMAL(16,2) COMMENT '使用優惠券訂單原始金額',
  `order_final_amount` DECIMAL(16,2) COMMENT '使用優惠券訂單最終金額',
  `reduce_amount` DECIMAL(16,2) COMMENT '優惠金額',
  `reduce_rate` DECIMAL(16,2) COMMENT '補貼率'
) COMMENT '商品銷售統計'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_coupon_stats/';

2.數據裝載

insert overwrite table ads_coupon_stats
select * from ads_coupon_stats
union
select
    '2020-06-14' dt,
    t1.id,
    coupon_name,
    start_date,
    rule_name,
    get_count,
    order_count,
    expire_count,
    order_original_amount,
    order_final_amount,
    reduce_amount,
    reduce_rate
from
(
    select
        id,
        coupon_name,
        date_format(start_time,'yyyy-MM-dd') start_date,
        case
            when coupon_type='3201' then concat('滿',condition_amount,'元減',benefit_amount,'')
            when coupon_type='3202' then concat('滿',condition_num,'件打', (1-benefit_discount)*10,'')
            when coupon_type='3203' then concat('',benefit_amount,'')
        end rule_name
    from dim_coupon_info
    where dt='2020-06-14'
    and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29)
)t1
left join
(
    select
        coupon_id,
        get_count,
        order_count,
        expire_count,
        order_original_amount,
        order_final_amount,
        order_reduce_amount reduce_amount,
        cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate
    from dwt_coupon_topic
    where dt='2020-06-14'
)t2
on t1.id=t2.coupon_id;

第六章 活動主題

6.1 活動統計

該需求要求統計最近30日發布的所有活動的參與情況和補貼率,補貼率是指,優惠金額與參與活動的訂單原價金額的比值。
1.建表語句

DROP TABLE IF EXISTS ads_activity_stats;
CREATE EXTERNAL TABLE `ads_activity_stats` (
  `dt` STRING COMMENT '統計日期',
  `activity_id` STRING COMMENT '活動ID',
  `activity_name` STRING COMMENT '活動名稱',
  `start_date` STRING COMMENT '活動開始日期',
  `order_count` BIGINT COMMENT '參與活動訂單數',
  `order_original_amount` DECIMAL(16,2) COMMENT '參與活動訂單原始金額',
  `order_final_amount` DECIMAL(16,2) COMMENT '參與活動訂單最終金額',
  `reduce_amount` DECIMAL(16,2) COMMENT '優惠金額',
  `reduce_rate` DECIMAL(16,2) COMMENT '補貼率'
) COMMENT '商品銷售統計'
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/ads/ads_activity_stats/';

2.數據裝載

insert overwrite table ads_activity_stats
select * from ads_activity_stats
union
select
    '2020-06-14' dt,
    t4.activity_id,
    activity_name,
    start_date,
    order_count,
    order_original_amount,
    order_final_amount,
    reduce_amount,
    reduce_rate
from
(
    select
        activity_id,
        activity_name,
        date_format(start_time,'yyyy-MM-dd') start_date
    from dim_activity_rule_info
    where dt='2020-06-14'
    and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29)
    group by activity_id,activity_name,start_time
)t4
left join
(
    select
        activity_id,
        sum(order_count) order_count,
        sum(order_original_amount) order_original_amount,
        sum(order_final_amount) order_final_amount,
        sum(order_reduce_amount) reduce_amount,
        cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate
    from dwt_activity_topic
    where dt='2020-06-14'
    group by activity_id
)t5
on t4.activity_id=t5.activity_id;

第七章 ADS層業務數據導入腳本

1)編寫腳本
(1)在/home/atguigu/bin目錄下創建腳本dwt_to_ads.sh

[atguigu@hadoop102 bin]$ vim dwt_to_ads.sh

在腳本中填寫如下內容

#!/bin/bash

APP=gmall

# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當前時間的前一天
if [ -n "$2" ] ;then
    do_date=$2
else
    do_date=`date -d "-1 day" +%F`
fi

ads_activity_stats="
insert overwrite table ${APP}.ads_activity_stats
select * from ${APP}.ads_activity_stats
union
select
    '$do_date' dt,
    t4.activity_id,
    activity_name,
    start_date,
    order_count,
    order_original_amount,
    order_final_amount,
    reduce_amount,
    reduce_rate
from
(
    select
        activity_id,
        activity_name,
        date_format(start_time,'yyyy-MM-dd') start_date
    from ${APP}.dim_activity_rule_info
    where dt='$do_date'
    and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)
    group by activity_id,activity_name,start_time
)t4
left join
(
    select
        activity_id,
        sum(order_count) order_count,
        sum(order_original_amount) order_original_amount,
        sum(order_final_amount) order_final_amount,
        sum(order_reduce_amount) reduce_amount,
        cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate
    from ${APP}.dwt_activity_topic
    where dt='$do_date'
    group by activity_id
)t5
on t4.activity_id=t5.activity_id;
"
ads_coupon_stats="
insert overwrite table ${APP}.ads_coupon_stats
select * from ${APP}.ads_coupon_stats
union
select
    '$do_date' dt,
    t1.id,
    coupon_name,
    start_date,
    rule_name,
    get_count,
    order_count,
    expire_count,
    order_original_amount,
    order_final_amount,
    reduce_amount,
    reduce_rate
from
(
    select
        id,
        coupon_name,
        date_format(start_time,'yyyy-MM-dd') start_date,
        case
            when coupon_type='3201' then concat('滿',condition_amount,'元減',benefit_amount,'')
            when coupon_type='3202' then concat('滿',condition_num,'件打', (1-benefit_discount)*10,'')
            when coupon_type='3203' then concat('',benefit_amount,'')
        end rule_name
    from ${APP}.dim_coupon_info
    where dt='$do_date'
    and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)
)t1
left join
(
    select
        coupon_id,
        get_count,
        order_count,
        expire_count,
        order_original_amount,
        order_final_amount,
        order_reduce_amount reduce_amount,
        cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate
    from ${APP}.dwt_coupon_topic
    where dt='$do_date'
)t2
on t1.id=t2.coupon_id;
"

ads_order_by_province="
insert overwrite table ${APP}.ads_order_by_province
select * from ${APP}.ads_order_by_province
union
select
    dt,
    recent_days,
    province_id,
    province_name,
    area_code,
    iso_code,
    iso_3166_2,
    order_count,
    order_amount
from
(
    select
        '$do_date' dt,
        recent_days,
        province_id,
        sum(order_count) order_count,
        sum(order_amount) order_amount
    from
    (
        select
            recent_days,
            province_id,
            case
                when recent_days=1 then order_last_1d_count
                when recent_days=7 then order_last_7d_count
                when recent_days=30 then order_last_30d_count
            end order_count,
            case
                when recent_days=1 then order_last_1d_final_amount
                when recent_days=7 then order_last_7d_final_amount
                when recent_days=30 then order_last_30d_final_amount
            end order_amount
        from ${APP}.dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt='$do_date'
    )t1
    group by recent_days,province_id
)t2
join ${APP}.dim_base_province t3
on t2.province_id=t3.id;
"

ads_order_spu_stats="
insert overwrite table ${APP}.ads_order_spu_stats
select * from ${APP}.ads_order_spu_stats
union
select
    '$do_date' dt,
    recent_days,
    spu_id,
    spu_name,
    tm_id,
    tm_name,
    category3_id,
    category3_name,
    category2_id,
    category2_name,
    category1_id,
    category1_name,
    sum(order_count),
    sum(order_amount)
from
(
    select
        recent_days,
        sku_id,
        case
            when recent_days=1 then order_last_1d_count
            when recent_days=7 then order_last_7d_count
            when recent_days=30 then order_last_30d_count
        end order_count,
        case
            when recent_days=1 then order_last_1d_final_amount
            when recent_days=7 then order_last_7d_final_amount
            when recent_days=30 then order_last_30d_final_amount
        end order_amount
    from ${APP}.dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days
    where dt='$do_date'
)t1
left join
(
    select
        id,
        spu_id,
        spu_name,
        tm_id,
        tm_name,
        category3_id,
        category3_name,
        category2_id,
        category2_name,
        category1_id,
        category1_name
    from ${APP}.dim_sku_info
    where dt='$do_date'
)t2
on t1.sku_id=t2.id
group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;
"

ads_order_total="
insert overwrite table ${APP}.ads_order_total
select * from ${APP}.ads_order_total
union
select
    '$do_date',
    recent_days,
    sum(order_count),
    sum(order_final_amount) order_final_amount,
    sum(if(order_final_amount>0,1,0)) order_user_count
from
(
    select
        recent_days,
        user_id,
        case when recent_days=0 then order_count
             when recent_days=1 then order_last_1d_count
             when recent_days=7 then order_last_7d_count
             when recent_days=30 then order_last_30d_count
        end order_count,
        case when recent_days=0 then order_final_amount
             when recent_days=1 then order_last_1d_final_amount
             when recent_days=7 then order_last_7d_final_amount
             when recent_days=30 then order_last_30d_final_amount
        end order_final_amount
    from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
    where dt='$do_date'
)t1
group by recent_days;
"

ads_page_path="
insert overwrite table ${APP}.ads_page_path
select * from ${APP}.ads_page_path
union
select
    '$do_date',
    recent_days,
    source,
    target,
    count(*)
from
(
    select
        recent_days,
        concat('step-',step,':',source) source,
        concat('step-',step+1,':',target) target
    from
    (
        select
            recent_days,
            page_id source,
            lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target,
            row_number() over (partition by recent_days,session_id order by ts) step
        from
        (
            select
                recent_days,
                last_page_id,
                page_id,
                ts,
                concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id
            from ${APP}.dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt>=date_add('$do_date',-30)
            and dt>=date_add('$do_date',-recent_days+1)
        )t2
    )t3
)t4
group by recent_days,source,target;
"

ads_repeat_purchase="
insert overwrite table ${APP}.ads_repeat_purchase
select * from ${APP}.ads_repeat_purchase
union
select
    '$do_date' dt,
    recent_days,
    tm_id,
    tm_name,
    cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2))
from
(
    select
        recent_days,
        user_id,
        tm_id,
        tm_name,
        sum(order_count) order_count
    from
    (
        select
            recent_days,
            user_id,
            sku_id,
            count(*) order_count
        from ${APP}.dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt>=date_add('$do_date',-29)
        and dt>=date_add('$do_date',-recent_days+1)
        group by recent_days, user_id,sku_id
    )t1
    left join
    (
        select
            id,
            tm_id,
            tm_name
        from ${APP}.dim_sku_info
        where dt='$do_date'
    )t2
    on t1.sku_id=t2.id
    group by recent_days,user_id,tm_id,tm_name
)t3
group by recent_days,tm_id,tm_name;
"

ads_user_action="
with
tmp_page as
(
    select
        '$do_date' dt,
        recent_days,
        sum(if(array_contains(pages,'home'),1,0)) home_count,
        sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
    from
    (
        select
            recent_days,
            mid_id,
            collect_set(page_id) pages
        from
        (
            select
                dt,
                mid_id,
                page.page_id
            from ${APP}.dws_visitor_action_daycount lateral view explode(page_stats) tmp as page
            where dt>=date_add('$do_date',-29)
            and page.page_id in('home','good_detail')
        )t1 lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt>=date_add('$do_date',-recent_days+1)
        group by recent_days,mid_id
    )t2
    group by recent_days
),
tmp_cop as
(
    select
        '$do_date' dt,
        recent_days,
        sum(if(cart_count>0,1,0)) cart_count,
        sum(if(order_count>0,1,0)) order_count,
        sum(if(payment_count>0,1,0)) payment_count
    from
    (
        select
            recent_days,
            user_id,
            case
                when recent_days=1 then cart_last_1d_count
                when recent_days=7 then cart_last_7d_count
                when recent_days=30 then cart_last_30d_count
            end cart_count,
            case
                when recent_days=1 then order_last_1d_count
                when recent_days=7 then order_last_7d_count
                when recent_days=30 then order_last_30d_count
            end order_count,
            case
                when recent_days=1 then payment_last_1d_count
                when recent_days=7 then payment_last_7d_count
                when recent_days=30 then payment_last_30d_count
            end payment_count
        from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt='$do_date'
    )t1
    group by recent_days
)
insert overwrite table ${APP}.ads_user_action
select * from ${APP}.ads_user_action
union
select
    tmp_page.dt,
    tmp_page.recent_days,
    home_count,
    good_detail_count,
    cart_count,
    order_count,
    payment_count
from tmp_page
join tmp_cop
on tmp_page.recent_days=tmp_cop.recent_days;
"

ads_user_change="
insert overwrite table ${APP}.ads_user_change
select * from ${APP}.ads_user_change
union
select
    churn.dt,
    user_churn_count,
    user_back_count
from
(
    select
        '$do_date' dt,
        count(*) user_churn_count
    from ${APP}.dwt_user_topic
    where dt='$do_date'
    and login_date_last=date_add('$do_date',-7)
)churn
join
(
    select
        '$do_date' dt,
        count(*) user_back_count
    from
    (
        select
            user_id,
            login_date_last
        from ${APP}.dwt_user_topic
        where dt='$do_date'
        and login_date_last='$do_date'
    )t1
    join
    (
        select
            user_id,
            login_date_last login_date_previous
        from ${APP}.dwt_user_topic
        where dt=date_add('$do_date',-1)
    )t2
    on t1.user_id=t2.user_id
    where datediff(login_date_last,login_date_previous)>=8
)back
on churn.dt=back.dt;
"

ads_user_retention="
insert overwrite table ${APP}.ads_user_retention
select * from ${APP}.ads_user_retention
union
select
    '$do_date',
    login_date_first create_date,
    datediff('$do_date',login_date_first) retention_day,
    sum(if(login_date_last='$do_date',1,0)) retention_count,
    count(*) new_user_count,
    cast(sum(if(login_date_last='$do_date',1,0))/count(*)*100 as decimal(16,2)) retention_rate
from ${APP}.dwt_user_topic
where dt='$do_date'
and login_date_first>=date_add('$do_date',-7)
and login_date_first<'$do_date'
group by login_date_first;
"

ads_user_total="
insert overwrite table ${APP}.ads_user_total
select * from ${APP}.ads_user_total
union
select
    '$do_date',
    recent_days,
    sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count,
    sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count,
    sum(order_final_amount) order_final_amount,
    sum(if(order_final_amount>0,1,0)) order_user_count,
    sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count
from
(
    select
        recent_days,
        user_id,
        login_date_first,
        login_date_last,
        order_date_first,
        case when recent_days=0 then order_final_amount
             when recent_days=1 then order_last_1d_final_amount
             when recent_days=7 then order_last_7d_final_amount
             when recent_days=30 then order_last_30d_final_amount
        end order_final_amount,
        if(recent_days=0,'1970-01-01',date_add('$do_date',-recent_days+1)) recent_days_ago
    from ${APP}.dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days
    where dt='$do_date'
)t1
group by recent_days;
"

ads_visit_stats="
insert overwrite table ${APP}.ads_visit_stats
select * from ${APP}.ads_visit_stats
union
select
    '$do_date' dt,
    is_new,
    recent_days,
    channel,
    count(distinct(mid_id)) uv_count,
    cast(sum(duration)/1000 as bigint) duration_sec,
    cast(avg(duration)/1000 as bigint) avg_duration_sec,
    sum(page_count) page_count,
    cast(avg(page_count) as bigint) avg_page_count,
    count(*) sv_count,
    sum(if(page_count=1,1,0)) bounce_count,
    cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate
from
(
    select
        session_id,
        mid_id,
        is_new,
        recent_days,
        channel,
        count(*) page_count,
        sum(during_time) duration
    from
    (
        select
            mid_id,
            channel,
            recent_days,
            is_new,
            last_page_id,
            page_id,
            during_time,
            concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id
        from
        (
            select
                mid_id,
                channel,
                last_page_id,
                page_id,
                during_time,
                ts,
                recent_days,
                if(visit_date_first>=date_add('$do_date',-recent_days+1),'1','0') is_new
            from
            (
                select
                    t1.mid_id,
                    t1.channel,
                    t1.last_page_id,
                    t1.page_id,
                    t1.during_time,
                    t1.dt,
                    t1.ts,
                    t2.visit_date_first
                from
                (
                    select
                        mid_id,
                        channel,
                        last_page_id,
                        page_id,
                        during_time,
                        dt,
                        ts
                    from ${APP}.dwd_page_log
                    where dt>=date_add('$do_date',-30)
                )t1
                left join
                (
                    select
                        mid_id,
                        visit_date_first
                    from ${APP}.dwt_visitor_topic
                    where dt='$do_date'
                )t2
                on t1.mid_id=t2.mid_id
            )t3 lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt>=date_add('$do_date',-recent_days+1)
        )t4
    )t5
    group by session_id,mid_id,is_new,recent_days,channel
)t6
group by is_new,recent_days,channel;
"

case $1 in
    "ads_activity_stats" )
        hive -e "$ads_activity_stats"
    ;;
    "ads_coupon_stats" )
        hive -e "$ads_coupon_stats"
    ;;
    "ads_order_by_province" )
        hive -e "$ads_order_by_province"
    ;;
    "ads_order_spu_stats" )
        hive -e "$ads_order_spu_stats"
    ;;
    "ads_order_total" )
        hive -e "$ads_order_total"
    ;;
    "ads_page_path" )
        hive -e "$ads_page_path"
    ;;
    "ads_repeat_purchase" )
        hive -e "$ads_repeat_purchase"
    ;;
    "ads_user_action" )
        hive -e "$ads_user_action"
    ;;
    "ads_user_change" )
        hive -e "$ads_user_change"
    ;;
    "ads_user_retention" )
        hive -e "$ads_user_retention"
    ;;
    "ads_user_total" )
        hive -e "$ads_user_total"
    ;;
    "ads_visit_stats" )
        hive -e "$ads_visit_stats"
    ;;
    "all" )
        hive -e "$ads_activity_stats$ads_coupon_stats$ads_order_by_province$ads_order_spu_stats$ads_order_total$ads_page_path$ads_repeat_purchase$ads_user_action$ads_user_change$ads_user_retention$ads_user_total$ads_visit_stats"
    ;;
esac

(2)增加腳本執行權限

[atguigu@hadoop102 bin]$ chmod 777 dwt_to_ads.sh

2)腳本使用
(1)執行腳本

[atguigu@hadoop102 bin]$ dwt_to_ads.sh all 2020-06-14                                                           

(2)查看數據是否導入


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM