hive--新增字段，數據重跑坑

本文轉載自查看原文 2018-10-30 20:02 2435 Hive

場景：

當我們建表完成並按照時間分區插入數據，之后我們發現需要增加一個字段。

辦法：

我們首先想到的是先在表中增加字段。

1）alter table table_name add columns(new_attr string);

然后重跑數據

2)insert overwrite table table_name partition(pattr='20181029')

這種后果是，我們新增的字段new_attr的值為空。

解決辦法：

在insert之前，一定記得刪該分區

1.5）alter table table_name drop partition(pattr='20181029');

補充：（最近發現另一種可以解決同樣問題的方法）

alter table table_name replace columns(, , , , ,new_attr string) cascade;

實例：

原始數據，

hive> select *from xunying where inc_day='1123'
    > ;
OK
1    12.100000000000000000    1123
1    -12.100000000000000000    1123
2    15.528450000000000000    1123
2    -15.528450000000000000    1123
3    -6.010000000000000000    1123
3    6.010000000000000000    1123
4    2.000000000000000000    1123
4    -1.000000000000000000    1123
5    0.000000000000000000    1123
6    0.000000000000000000    1123
6    0.000000000000000000    1123

若按照add新增字段，結果為

>> alter table xunying add colums(name string);

>>hive> insert overwrite table xunying partition(inc_day='1123') select id,amt,'1' name from tb_xunying;

hive> select *from xunying where inc_day='1123';
OK
1    12.100000000000000000    NULL    1123
1    -12.100000000000000000    NULL    1123
2    15.528450000000000000    NULL    1123
2    -15.528450000000000000    NULL    1123
3    -6.010000000000000000    NULL    1123
3    6.010000000000000000    NULL    1123
4    2.000000000000000000    NULL    1123
4    -1.000000000000000000    NULL    1123
5    0.000000000000000000    NULL    1123
6    0.000000000000000000    NULL    1123
6    0.000000000000000000    NULL    1123

通過replace columns cascade解決

>>alter table xunying replace columns(id string,amt string,name string,name2 string) cascade;

>>insert overwrite table xunying partition(inc_day='1123') select id,amt,'1' name,'2' name2 from tb_xunying;

hive> select *from xunying where inc_day='1123';
OK
1    12.100000000000000000    1    2    1123
1    -12.100000000000000000    1    2    1123
2    15.528450000000000000    1    2    1123
2    -15.528450000000000000    1    2    1123
3    -6.010000000000000000    1    2    1123
3    6.010000000000000000    1    2    1123
4    2.000000000000000000    1    2    1123
4    -1.000000000000000000    1    2    1123
5    0.000000000000000000    1    2    1123
6    0.000000000000000000    1    2    1123
6    0.000000000000000000    1    2    1123

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 hive--數據倉庫 hive的insert語句列順序問題以及新增字段遇到的坑 hive新增字段和修改字段的影響 hive表更改存儲格式（從text改到orc）重跑數據以后查詢報錯：Invalid postscript python定時重跑獲取數據 Hive--關聯表（join） Hive--關聯查詢 Hive查詢某一重復字段記錄第一條數據 hive-- 常見錯誤解決一 hive--[ array、map、struct]使用