拉鏈表-增量更新方法一


參考文檔:http://lxw1234.com/archives/2015/08/473.htm

一、元表結構

1、定義業務庫原始訂單表:

drop table chavin.orders;

CREATE TABLE orders (

orderid INT,

createtime STRING,

modifiedtime STRING,

status STRING

)row format delimited fields terminated by '\t'

stored AS textfile;

--加載測試數據

1 2015-08-18 2015-08-18 創建

2 2015-08-18 2015-08-18 創建

3 2015-08-19 2015-08-21 支付

4 2015-08-19 2015-08-21 完成

5 2015-08-19 2015-08-20 支付

6 2015-08-20 2015-08-20 創建

7 2015-08-20 2015-08-21 支付

8 2015-08-21 2015-08-21 創建

2、定義ODS訂單表結構,采用日分區存儲:

drop table t_ods_orders_inc;

CREATE TABLE t_ods_orders_inc (

orderid INT,

createtime STRING,

modifiedtime STRING,

status STRING

) PARTITIONED BY (day STRING)

row format delimited fields terminated by '\t'

stored AS textfile;

3、創建dw層歷史訂單表:

drop table t_dw_orders_his;

CREATE TABLE t_dw_orders_his (

orderid INT,

createtime STRING,

modifiedtime STRING,

status STRING,

dw_start_date STRING,

dw_end_date STRING

)row format delimited fields terminated by '\t'

stored AS textfile;

二、初始化dw層歷史訂單表:

1、將源庫訂單表歷史數據插入到ods訂單表中:

INSERT overwrite TABLE t_ods_orders_inc PARTITION (day = '2015-08-20')

SELECT orderid,createtime,modifiedtime,status

FROM chavin.orders

WHERE cast(createtime as date) <= '2015-08-20';

2、通過ods層訂單表數據初始化dw層歷史訂單表:

INSERT overwrite TABLE t_dw_orders_his

SELECT orderid,createtime,modifiedtime,status,

createtime AS dw_start_date,

'9999-12-31' AS dw_end_date

FROM t_ods_orders_inc

WHERE day = '2015-08-20';

三、增量添加數據

1、將原始訂單表增量數據插入到ods層訂單表前一天分區中:

INSERT overwrite TABLE t_ods_orders_inc PARTITION (day = '2015-08-21')

SELECT orderid,createtime,modifiedtime,status

FROM orders

WHERE createtime = '2015-08-21' OR modifiedtime = '2015-08-21';

2、通過dw歷史數據和ods增量數據刷新dw歷史數據,此處采用臨時表方法:

DROP TABLE IF EXISTS t_dw_orders_his_tmp;

CREATE TABLE t_dw_orders_his_tmp AS

SELECT orderid,

createtime,

modifiedtime,

status,

dw_start_date,

dw_end_date

FROM (

    SELECT a.orderid,

    a.createtime,

    a.modifiedtime,

    a.status,

    a.dw_start_date,

    CASE WHEN b.orderid IS NOT NULL AND a.dw_end_date > '2015-08-21' THEN '2015-08-20' ELSE a.dw_end_date END AS dw_end_date

    FROM t_dw_orders_his a

    left outer join (SELECT * FROM t_ods_orders_inc WHERE day = '2015-08-21') b

    ON (a.orderid = b.orderid)

    UNION ALL

    SELECT orderid,

    createtime,

    modifiedtime,

    status,

    modifiedtime AS dw_start_date,

    '9999-12-31' AS dw_end_date

    FROM t_ods_orders_inc

    WHERE day = '2015-08-21'

) x

ORDER BY orderid,dw_start_date;

3、根據歷史表更新dw層歷史訂單表:

INSERT overwrite TABLE t_dw_orders_his

SELECT * FROM t_dw_orders_his_tmp;

4、根據上面步驟增加22號數據:

--加載增量數據到ods層訂單表分區'2015-08-22'中:

1 2015-08-18 2015-08-22 支付

2 2015-08-18 2015-08-22 完成

6 2015-08-20 2015-08-22 支付

9 2015-08-22 2015-08-22 創建

8 2015-08-22 2015-08-22 支付

10 2015-08-22 2015-08-22 支付

alter table t_ods_orders_inc add partition(day='2015-08-22');

load data local inpath '/opt/datas/orders22.txt' into table chavin.t_ods_orders_inc partition(day='2015-08-22');

--根據歷史訂單數據和增量數據更新歷史訂單表數據,此處采用臨時表:

DROP TABLE IF EXISTS t_dw_orders_his_tmp;

CREATE TABLE t_dw_orders_his_tmp AS

SELECT orderid,

createtime,

modifiedtime,

status,

dw_start_date,

dw_end_date

FROM (

    SELECT a.orderid,

    a.createtime,

    a.modifiedtime,

    a.status,

    a.dw_start_date,

    CASE WHEN b.orderid IS NOT NULL AND a.dw_end_date > '2015-08-22' THEN '2015-08-21' ELSE a.dw_end_date END AS dw_end_date

    FROM t_dw_orders_his a

    left outer join (SELECT * FROM t_ods_orders_inc WHERE day = '2015-08-22') b

    ON (a.orderid = b.orderid)

    UNION ALL

    SELECT orderid,

    createtime,

    modifiedtime,

    status,

    modifiedtime AS dw_start_date,

    '9999-12-31' AS dw_end_date

    FROM t_ods_orders_inc

    WHERE day = '2015-08-22'

) x

ORDER BY orderid,dw_start_date;

--根據臨時表更新歷史訂單表:

INSERT overwrite TABLE t_dw_orders_his

SELECT * FROM t_dw_orders_his_tmp;

5、查看2015-08-21、2015-08-21歷史快照:

select * from t_dw_orders_his where dw_start_date <= '2015-08-21' and dw_end_date >= '2015-08-21';

select * from t_dw_orders_his where dw_start_date <= '2015-08-22' and dw_end_date >= '2015-08-22';


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM