sqoop job工具
sqoop job工具可以用於創建保存經常使用的命令為一個任務,還可以用於實現定時調用任務,用於sqoop增量導入新數據。
sqoop語法:
$ sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
$ sqoop-job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
參照下例可知,[ ]里面的是其它sqoop工具 兩個()指的是本節介紹的參數和參數內容
sqoop job的使用參數:
Argument
|
Description
|
--create <job-id>
|
Define a new saved job with the specified job-id (name). A second Sqoop command-line, separated by a -- should be specified; this defines the saved job.
|
--delete <job-id>
|
Delete a saved job.
|
--exec <job-id>
|
Given a job defined with --create, run the saved job.
運行任務時,可以使用--形式的參數覆蓋之前創建時設置的參數
|
--show <job-id>
|
Show the parameters for a saved job.
|
--list
|
List all saved jobs
|
創建一個任務自動增量導入的任務:
自動導入nodes表
sqoop可以將數據全部導入到hive,但是如果原數據(mysql)出現update和delete操作,是無法同步到hive中
sqoop job --create testdata_nodes -- import --connect jdbc:
mysql://192.168.10.80:33060/testdata --username root --password lovelsl --table nodes --hive-import --hive-table testdata.nodes --null-string '\\N' --null-non-string '\\N' --incremental append --check-column id --last-value 415
[root@localhost ~]# sqoop job --create testdata_nodes -- import --connect jdbc:
mysql://192.168.10.80:33060/testdata --username root --password lovelsl --table nodes --hive-import --hive-table testdata.nodes --null-string '\\N' --null-non-string '\\N' --incremental append --check-column id --last-value 415
Warning: /lovelsl/sqoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /lovelsl/sqoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /lovelsl/sqoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/07/25 21:23:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/07/25 21:23:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/07/25 21:23:07 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/07/25 21:23:07 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
[root@localhost ~]#
執行任務
注意默認的情況下是要求輸入數據庫密碼的,可以通過配置conf/sqoop-site.xml 指定sqoop.metastore.client.record.password為true避免
配置為
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
</property>
指令執行:
sqoop job --exec testdata_nodes
[root@localhost ~]# sqoop job --exec testdata_nodes
Warning: /lovelsl/sqoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /lovelsl/sqoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /lovelsl/sqoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/07/26 00:32:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
........
刪除任務
sqoop job --delete testdata_nodes
實現定時任務:
Centos 7的定時任務實現
[root@localhost shell]# cat cron.sh
#!/bin/sh
#
# 負責為sqoop job提供定時啟動接口
#
echo "30 12 * * * /lovelsl/dev/shell/sqoop_job.sh" >> /etc/crontab
crontab /etc/crontab
systemctl enable crond
Centos 7 下sqoop增強導入任務
[root@localhost shell]# cat sqoop_job.sh
#!/bin/sh
#
# 配置所有需要啟動sqoop的任務
#
sqoop job -exec testdata_nodes
[root@localhost shell]#