sqoop job工具
sqoop job工具可以用于创建保存经常使用的命令为一个任务,还可以用于实现定时调用任务,用于sqoop增量导入新数据。
sqoop语法:
$ sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
$ sqoop-job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
参照下例可知,[ ]里面的是其它sqoop工具 两个()指的是本节介绍的参数和参数内容
sqoop job的使用参数:
Argument
|
Description
|
--create <job-id>
|
Define a new saved job with the specified job-id (name). A second Sqoop command-line, separated by a -- should be specified; this defines the saved job.
|
--delete <job-id>
|
Delete a saved job.
|
--exec <job-id>
|
Given a job defined with --create, run the saved job.
运行任务时,可以使用--形式的参数覆盖之前创建时设置的参数
|
--show <job-id>
|
Show the parameters for a saved job.
|
--list
|
List all saved jobs
|
创建一个任务自动增量导入的任务:
自动导入nodes表
sqoop可以将数据全部导入到hive,但是如果原数据(mysql)出现update和delete操作,是无法同步到hive中
sqoop job --create testdata_nodes -- import --connect jdbc:
mysql://192.168.10.80:33060/testdata --username root --password lovelsl --table nodes --hive-import --hive-table testdata.nodes --null-string '\\N' --null-non-string '\\N' --incremental append --check-column id --last-value 415
[root@localhost ~]# sqoop job --create testdata_nodes -- import --connect jdbc:
mysql://192.168.10.80:33060/testdata --username root --password lovelsl --table nodes --hive-import --hive-table testdata.nodes --null-string '\\N' --null-non-string '\\N' --incremental append --check-column id --last-value 415
Warning: /lovelsl/sqoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /lovelsl/sqoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /lovelsl/sqoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/07/25 21:23:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/07/25 21:23:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/07/25 21:23:07 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/07/25 21:23:07 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
[root@localhost ~]#
执行任务
注意默认的情况下是要求输入数据库密码的,可以通过配置conf/sqoop-site.xml 指定sqoop.metastore.client.record.password为true避免
配置为
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
</property>
指令执行:
sqoop job --exec testdata_nodes
[root@localhost ~]# sqoop job --exec testdata_nodes
Warning: /lovelsl/sqoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /lovelsl/sqoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /lovelsl/sqoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/07/26 00:32:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
........
删除任务
sqoop job --delete testdata_nodes
实现定时任务:
Centos 7的定时任务实现
[root@localhost shell]# cat cron.sh
#!/bin/sh
#
# 负责为sqoop job提供定时启动接口
#
echo "30 12 * * * /lovelsl/dev/shell/sqoop_job.sh" >> /etc/crontab
crontab /etc/crontab
systemctl enable crond
Centos 7 下sqoop增强导入任务
[root@localhost shell]# cat sqoop_job.sh
#!/bin/sh
#
# 配置所有需要启动sqoop的任务
#
sqoop job -exec testdata_nodes
[root@localhost shell]#