問題來源
官網原話是這樣的:
Since Sqoop breaks down export process into multiple transactions, it is possible that a failed export job may result in partial data being committed to the database.
This can further lead to subsequent jobs failing due to insert collisions in some cases, or lead to duplicated data in others.
You can overcome this problem by specifying a staging table via the --staging-table option which acts as an auxiliary table that is used to stage exported data.
The staged data is finally moved to the destination table in a single transaction.
大概意思就是
“由於Sqoop將導出過程分解為多個事務,因此失敗的導出作業可能會導致將部分數據提交到數據庫。
在某些情況下,這可能進一步導致后續作業因插入沖突而失敗,而在其他情況下,則可能導致數據重復。
您可以通過--staging-table選項指定暫存表來解決此問題,該選項用作用於暫存導出數據的輔助表。
最后,已分階段處理的數據將在單個事務中移至目標表。”
解決
sqoop export
--connect jdbc:mysql://192.168.137.10:3306/user_behavior
--username root
--password 123456
--table app_cource_study_report
--columns watch_video_cnt,complete_video_cnt,dt
--fields-terminated-by "\t"
--export-dir "/user/hive/warehouse/tmp.db/app_cource_study_analysis_${day}"
--staging-table app_cource_study_report_tmp #創建臨時表來存儲結果,全部成功后再提交
--clear-staging-table
--input-null-string '\N'