sqoop job 實現自動增量導入


一、測試環境

1、MySQL表結構

mysql> show create table autoextend\G
CREATE TABLE `autoextend` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) DEFAULT NULL,
  `remark` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=17 DEFAULT CHARSET=latin1

2、hive表結構

hive> show create table autoextend;
OK
CREATE TABLE `autoextend`(
  `id` string,
  `name` string,
  `remark` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://data0:9000/home/hadoop/hive/data/hdb.db/autoextend'
TBLPROPERTIES (
  'transient_lastDdlTime'='1572594915')

二、普通增量導入

# 這個問題在於我們每次再增量導入的時候就要手動去更改--last-value  \的值。
# 否則就每次都是全量導入。顯得不靈活
sqoop import --connect jdbc:mysql://172.16.100.173:3306/hdb \
--username root --password oracletest \
--table autoextend \
-m 1 \
--incremental append \
--check-column id \
--last-value 11 \
--fields-terminated-by '\t' \
--hive-import --hive-database hdb --hive-table autoextend

 

三、sqoop job增量導入

1、sqoop job 參數

Job management arguments:
   --create <job-id>            Create a new saved job
   --delete <job-id>            Delete a saved job
   --exec <job-id>              Run a saved job
   --help                       Print usage instructions
   --list                       List saved jobs
   --meta-connect <jdbc-uri>    Specify JDBC connect string for the
                                metastore
   --show <job-id>              Show the parameters for a saved job
   --verbose                    Print more information while working

2、查看已經存在的job

sqoop job --list

3、刪除sqoop job

sqoop job --delete mytest1

4、創建sqoop job

sqoop job每次會為我們維護last-value的值,達到自動增量導入的目的

sqoop job --create myjobsqoop -- import --connect jdbc:mysql://172.16.100.173:3306/hdb --username root --password oracletest --table autoextend -m 1 --incremental append --check-column id --last-value 16 --fields-terminated-by '\t' --hive-import --hive-database hdb --hive-table autoextend

查看job

 

5、運行job並驗證

1)無新數據運行

sqoop job --exec myjobsqoop1

 

2)有新數據

MySQL新寫入

 

 運行sqoop job --exec myjobsqoop1

 

 

 查看hive表

 

 

轉載內容:


創建job

## -- import 中間有個空格 
bin/sqoop job --create mysql_hive_append -- import --connect jdbc:mysql://hadoop001:3306/learn \
--username root --password 123456 \
--table user \
-m 1 \
--incremental append \
--check-column user_id \
--last-value 0 \
--fields-terminated-by ',' \
--hive-import \
--hive-table zzy.test3
 

sqoop.Sqoop: Got exception running Sqoop:
java.lang.NullPointerException,沒遇到可以跳過

19/09/20 09:57:47 ERROR sqoop.Sqoop: Got exception running Sqoop: 
java.lang.NullPointerException
	at org.json.JSONObject.<init>(JSONObject.java:144)  ##  缺少的東西
	at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
	at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)
	at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:399)
	at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.create(HsqldbJobStorage.java:379)
	at org.apache.sqoop.tool.JobTool.createJob(JobTool.java:181)
	at org.apache.sqoop.tool.JobTool.run(JobTool.java:294)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
 

查了半天是缺少java-json.jar這么一個jar包。找了半天大部分CSDN都要錢。下面整理了一些可下載的地址。

下載地址(需要翻牆)

百度網盤

如果還是報同樣的錯誤可能還需要下面這些包
百度網盤

運行job

bin/sqoop job --exec mysql_hive_append
 

我這里明明設置了密碼。但是還是要求我再輸入一次mysql的連接密碼。暫時沒解決,輸入就是了。

[zzy@hadoop001 sqoop-1.4.7]$bin/sqoop job --exec mysql_hive_append
19/09/20 10:20:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/moudle/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/moudle/hive-1.2.1/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password:
 
部分內容整理自:https://blog.csdn.net/weixin_43326165/article/details/101053116

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM