(MySQL里的數據)通過Sqoop Import HDFS 里 和 通過Sqoop Export HDFS 里的數據到(MySQL)(五)


 

 

 

  下面我們結合 HDFS,介紹 Sqoop 從關系型數據庫的導入和導出


一、MySQL里的數據通過Sqoop import HDFS
  它的功能是將數據從關系型數據庫導入 HDFS 中,其流程圖如下所示。

  我們來分析一下 Sqoop 數據導入流程,首先用戶輸入一個 Sqoop import 命令,Sqoop 會從關系型數據庫中獲取元數據信息,
比如要操作數據庫表的 schema是什么樣子,這個表有哪些字段,這些字段都是什么數據類型等。
  它獲取這些信息之后,會將輸入命令轉化為基於 Map 的 MapReduce作業
這樣 MapReduce作業中有很多 Map 任務,每個 Map 任務從數據庫中讀取一片數據,這樣多個 Map 任務實現並發的拷貝,把整個數據快速的拷貝到 HDFS 上。

 

 

 

 

 

Sqoop Import HDFS(帶着官網)

  具體,自己去嘗試做吧!

 

 

 

 

   在這之前,先啟動hadoop集群,sbin/start-all.sh。這里不多贅述。

  同時,開啟MySQL數據庫。這里,不多贅述。

 

 

  同時,因為后續的sqoop運行啊,會產生一些日志等,我這里先新建一個目錄,用來專門存放它。在哪個目錄下運行后續的sqoop操作,就在哪個目錄下新建就好。(因為,已經配置了環境變量,在任何路徑下都是可以運行的)

[hadoop@djt002 sqoop]$ pwd
/usr/local/sqoop
[hadoop@djt002 sqoop]$ ll
total 4
drwxr-xr-x. 9 hadoop hadoop 4096 Apr 27  2015 sqoop-1.4.6
[hadoop@djt002 sqoop]$ mkdir sqoopRunCreate
[hadoop@djt002 sqoop]$ ll
total 8
drwxr-xr-x. 9 hadoop hadoop 4096 Apr 27  2015 sqoop-1.4.6
drwxrwxr-x. 2 hadoop hadoop 4096 Mar 17 23:33 sqoopRunCreate
[hadoop@djt002 sqoop]$ cd sqoopRunCreate/
[hadoop@djt002 sqoopRunCreate]$ pwd
/usr/local/sqoop/sqoopRunCreate
[hadoop@djt002 sqoopRunCreate]$

 

   比如,以后我就在這個目錄下運行操作sqoop,/usr/local/sqoop/sqoopRunCreate。

 

 

 

 

 

 Sqoop Import 應用場景——密碼訪問

   (1)明碼訪問

[hadoop@djt002 sqoopRunCreate]$ sqoop list-databases \ > --connect jdbc:mysql://192.168.80.200/ \
> --username hive \ > --password hive

 

 

 

   (2)交互式密碼訪問

 

[hadoop@djt002 sqoopRunCreate]$ sqoop list-databases \
> --connect jdbc:mysql://192.168.80.200/ \
> --username hive \
> -P

Enter password: (輸入hive)

 

 

 

  (3)文件授權密碼訪問

  因為,官網上是這么給的,在家目錄,且需賦予400權限。所以

[hadoop@djt002 ~]$ pwd
/home/hadoop
[hadoop@djt002 ~]$ echo -n "hive" > .password
[hadoop@djt002 ~]$ ls -a
.            .bash_history  .cache   djt        flume    .gnote           .gvfs            .local          .nautilus  .pulse         Videos       .xsession-errors
..           .bash_logout   .config  Documents  .gconf   .gnupg           .hivehistory     .mozilla        .password  .pulse-cookie  .vim         .xsession-errors.old
.abrt        .bash_profile  .dbus    Downloads  .gconfd  .gstreamer-0.10  .ICEauthority    Music           Pictures   .ssh           .viminfo
anagram.jar  .bashrc        Desktop  .esd_auth  .gnome2  .gtk-bookmarks   .imsettings.log  .mysql_history  Public     Templates      .Xauthority
[hadoop@djt002 ~]$ more .password hive
[hadoop@djt002 ~]$ 

 

[hadoop@djt002 ~]$ chmod 400 .password 

[hadoop@djt002 sqoopRunCreate]$ sqoop list-databases \ > --connect jdbc:mysql://192.168.80.200/ \
> --username hive \ > --password-file /home/hadoop/.password


java.io.IOException: The provided password file /home/hadoop/.password does not exist!

 

[hadoop@djt002 local]$ $HADOOP_HOME/bin/hadoop dfs -put /home/hadoop/.password /user/hadoop



[hadoop@djt002 local]$ $HADOOP_HOME/bin/hadoop dfs -chmod 400 /user/hadoop/.password

 

   

 

[hadoop@djt002 ~]$ rm .password 
rm: remove write-protected regular file `.password'? y

 

 

 

 

[hadoop@djt002 sqoopRunCreate]$ sqoop list-databases \
> --connect jdbc:mysql://192.168.80.200/ \
> --username hive \
> --password-file /user/hadoop/.password

 

 

 

 

 

 

 Sqoop Import 應用場景——導入全表

  (1)不指定目錄 (則默認是在/user/hadoop/下)

 

 

 

  我這里啊,給大家嘗試另一個軟件。(為什么,要這樣帶大家使用,是為了你們的多適應和多自學能力)(別嫌麻煩!)

SQLyog之MySQL客戶端的下載、安裝和使用

 

   這里,我們選擇在hive這個數據庫里,創建新的表,命名為

 

 

   如果,面對 SQLyog不能正常顯示中文數據的情況:在SQLyog下輸入SET character_set_results = gb2312(或 gbk),執行,重新啟動SQLyog,顯示應該也可以看到你所插入的中文數據了。

SQLyog軟件里無法插入中文(即由默認的latin1改成UTF8編碼格式)

 

  注意,我的數據表是djt-user。我這里改名啦!

 

[hadoop@djt002 sqoopRunCreate]$ sqoop import \ > --connect jdbc:mysql://192.168.80.200/hive \
> --username hive \ > --password-file /user/hadoop/.password \ > --table djt-user

 

 

 

 

 

 

 

 

 

 

 

SET character_set_database=utf8;
SET character_set_server=utf8;
SHOW VARIABLES LIKE 'character%'; 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -rmr /user/hadoop/djt-user

 

 

[hadoop@djt002 sqoopRunCreate]$ sqoop import --connect jdbc:mysql://192.168.80.200/hive --username hive --password-file /user/hadoop/.password --table djt-user

[hadoop@djt002 sqoopRunCreate]$ sqoop import --connect jdbc:mysql://192.168.80.200/hive --username hive --password-file /user/hadoop/.password --table djt-user
Warning: /usr/local/sqoop/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/03/18 04:17:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/hbase-1.2.3/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/03/18 04:17:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/03/18 04:17:14 INFO tool.CodeGenTool: Beginning code generation
17/03/18 04:17:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `djt-user` AS t LIMIT 1
17/03/18 04:17:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `djt-user` AS t LIMIT 1
17/03/18 04:17:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop/hadoop-2.6.0
Note: /tmp/sqoop-hadoop/compile/38104c9fe28c7f43fdb42c26826dbf91/djt_user.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/03/18 04:17:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/38104c9fe28c7f43fdb42c26826dbf91/djt-user.jar
17/03/18 04:17:21 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/03/18 04:17:21 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/03/18 04:17:21 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/03/18 04:17:21 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/03/18 04:17:21 INFO mapreduce.ImportJobBase: Beginning import of djt-user
17/03/18 04:17:21 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/03/18 04:17:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/03/18 04:17:22 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

17/03/18 04:17:30 INFO db.DBInputFormat: Using read commited transaction isolation
17/03/18 04:17:30 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `djt-user`
17/03/18 04:17:31 INFO mapreduce.JobSubmitter: number of splits:3
17/03/18 04:17:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489767532299_0002
17/03/18 04:17:33 INFO impl.YarnClientImpl: Submitted application application_1489767532299_0002
17/03/18 04:17:33 INFO mapreduce.Job: The url to track the job: http://djt002:8088/proxy/application_1489767532299_0002/
17/03/18 04:17:33 INFO mapreduce.Job: Running job: job_1489767532299_0002
17/03/18 04:18:03 INFO mapreduce.Job: Job job_1489767532299_0002 running in uber mode : false
17/03/18 04:18:03 INFO mapreduce.Job: map 0% reduce 0%

17/03/18 04:19:09 INFO mapreduce.Job: map 67% reduce 0%
17/03/18 04:19:12 INFO mapreduce.Job: map 100% reduce 0%
17/03/18 04:19:13 INFO mapreduce.Job: Job job_1489767532299_0002 completed successfully
17/03/18 04:19:13 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=370638
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=295
HDFS: Number of bytes written=105
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=174022
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=174022
Total vcore-seconds taken by all map tasks=174022
Total megabyte-seconds taken by all map tasks=178198528
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=295
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=5172
CPU time spent (ms)=9510

Physical memory (bytes) snapshot=362741760
Virtual memory (bytes) snapshot=2535641088
Total committed heap usage (bytes)=181862400
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=105
17/03/18 04:19:13 INFO mapreduce.ImportJobBase: Transferred 105 bytes in 111.9157 seconds (0.9382 bytes/sec)
17/03/18 04:19:13 INFO mapreduce.ImportJobBase: Retrieved 3 records.
[hadoop@djt002 sqoopRunCreate]$

 

 

 

 

 

 

 

 

 

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /user/hadoop/djt-user/part-m-*
1,王菲,female,36,歌手
2,謝霆鋒,male,30,歌手
3,周傑倫,male,33,導演
[hadoop@djt002 ~]$ 

 

 

 

   總結

 
         
不指定目錄  

sqoop import \ --connect 'jdbc:mysql://192.168.128.200/hive \ --username hive \ --password-file /user/hadoop/.password \ --table djt_user






不指定目錄 (推薦這種) sqoop
import \ --connect 'jdbc:mysql://192.168.128.200/hive?useUnicode=true&characterEncoding=utf-8' \ --username hive \ --password-file /user/hadoop/.password \ --table djt_user

     即,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,這里是在/user/hadoop/djt_user

 

 

 

 

 

 

 

 

 

(2)指定目錄

  任意可以指定的。

 

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*
1,王菲,female,36,歌手
2,謝霆鋒,male,30,歌手
3,周傑倫,male,33,導演
[hadoop@djt002 ~]$ 

 

 

 

 

 

   這里,為統一標准和規范化,用數據表djt_user。

 

 

 

 

   (3)目錄已存在

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect jdbc:mysql://192.168.80.200/hive \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user \
> --target-dir /sqoop/test/djt_user \
> --delete-target-dir

 

 

 

Warning: /usr/local/sqoop/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/03/18 04:43:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/hbase-1.2.3/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/03/18 04:43:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/03/18 04:43:45 INFO tool.CodeGenTool: Beginning code generation
17/03/18 04:43:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `djt_user` AS t LIMIT 1
17/03/18 04:43:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `djt_user` AS t LIMIT 1
17/03/18 04:43:46 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop/hadoop-2.6.0
Note: /tmp/sqoop-hadoop/compile/1fae17dd362476d95608e216756efa34/djt_user.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/03/18 04:43:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/1fae17dd362476d95608e216756efa34/djt_user.jar
17/03/18 04:43:52 INFO tool.ImportTool: Destination directory /sqoop/test/djt_user deleted.
17/03/18 04:43:52 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/03/18 04:43:52 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/03/18 04:43:52 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

17/03/18 04:43:52 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/03/18 04:43:52 INFO mapreduce.ImportJobBase: Beginning import of djt_user
17/03/18 04:43:52 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/03/18 04:43:52 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/03/18 04:43:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/03/18 04:44:02 INFO db.DBInputFormat: Using read commited transaction isolation
17/03/18 04:44:02 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `djt_user`
17/03/18 04:44:02 INFO mapreduce.JobSubmitter: number of splits:3
17/03/18 04:44:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489767532299_0005
17/03/18 04:44:03 INFO impl.YarnClientImpl: Submitted application application_1489767532299_0005
17/03/18 04:44:03 INFO mapreduce.Job: The url to track the job: http://djt002:8088/proxy/application_1489767532299_0005/
17/03/18 04:44:03 INFO mapreduce.Job: Running job: job_1489767532299_0005
17/03/18 04:44:23 INFO mapreduce.Job: Job job_1489767532299_0005 running in uber mode : false
17/03/18 04:44:23 INFO mapreduce.Job: map 0% reduce 0%

17/03/18 04:45:21 INFO mapreduce.Job: map 67% reduce 0%
17/03/18 04:45:23 INFO mapreduce.Job: map 100% reduce 0%
17/03/18 04:45:23 INFO mapreduce.Job: Job job_1489767532299_0005 completed successfully
17/03/18 04:45:24 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=370635
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=295
HDFS: Number of bytes written=80
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=163316
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=163316
Total vcore-seconds taken by all map tasks=163316
Total megabyte-seconds taken by all map tasks=167235584
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=295
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=3240
CPU time spent (ms)=8480

Physical memory (bytes) snapshot=356696064
Virtual memory (bytes) snapshot=2535596032
Total committed heap usage (bytes)=181862400
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=80
17/03/18 04:45:24 INFO mapreduce.ImportJobBase: Transferred 80 bytes in 91.6189 seconds (0.8732 bytes/sec)
17/03/18 04:45:24 INFO mapreduce.ImportJobBase: Retrieved 3 records.
[hadoop@djt002 sqoopRunCreate]$

 

 

 

 

 

 

 

 Sqoop Import 應用場景——控制並行度

  (1)控制並行度

    默認是4個,當然我這里數據量小,指定1個就行了。

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -rm  /sqoop/test/djt_user/part-m-*

 

 

 

 

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user \
> --target-dir /sqoop/test/djt_user \
> --delete-target-dir \
> -m 1

 

 

 

 

 

  

   在這里,可能會遇到這個問題。

Sqoop異常解決ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter問題

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*

 

 

 

 

 

 

 

 

 

 

Sqoop Import 應用場景——控制字段分隔符

    (1)控制字段分隔符

   注意,默認的控制分段分隔符是逗號,我們這里自定義。

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user \
> --target-dir /sqoop/test/djt_user \
> --delete-target-dir \
> -m 1 \
> --fields-terminated-by "@"

 

  這里,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,是在/sqoop/test/djt_user。

 

 

 

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*
1@王菲@female@36@歌手
2@謝霆鋒@male@30@歌手
3@周傑倫@male@33@導演
[hadoop@djt002 ~]$ 

 

 

 

 

 

 

 

 

  (2)手動增量導入

 

            我們加入,4、5和6。

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user \
> --target-dir /sqoop/test/djt_user \
> -m 1 \
> --fields-terminated-by "@" \
> --append \
> --check-column 'id' \
> --incremental append \
> --last-value 3

 

  這里,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,是在/sqoop/test/djt_user

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*
1@王菲@female@36@歌手
2@謝霆鋒@male@30@歌手
3@周傑倫@male@33@導演
4@王力宏@male@40@演員
5@張三@male@39@無業游民
6@李四@female@18@學生
[hadoop@djt002 ~]$ 

 

 

 

 

 

 

      (3)自動增量導入

[hadoop@djt002 sqoopRunCreate]$ sqoop job \
> --create job_import_djt_user \
> -- import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user \
> --target-dir /sqoop/test/djt_user \
> -m 1 \
> --fields-terminated-by "@" \
> --append \
> --check-column 'id' \
> --incremental append \
> --last-value 6


[hadoop@djt002 sqoopRunCreate]$ sqoop job --exec job_import_djt_user

   這里,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,是在/sqoop/test/djt_user

 

 

 

  刪除某個job

 [hadoop@djt002 sqoopRunCreate]$ sqoop job --delete job_import_djt_user

 

 

 

  查看當前可用的job

 [hadoop@djt002 sqoopRunCreate]$ sqoop job --list

 

 

 

 

   查看某個具體job的信息

 [hadoop@djt002 sqoopRunCreate]$  sqoop job --show job_import_djt_user

[hadoop@djt002 sqoopRunCreate]$ sqoop job --show
Warning: /usr/local/sqoop/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/03/18 06:50:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Missing argument for option: show
[hadoop@djt002 sqoopRunCreate]$ clear
[hadoop@djt002 sqoopRunCreate]$ sqoop job --show job_import_djt_user
Warning: /usr/local/sqoop/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/03/18 06:51:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase/hbase-1.2.3/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: job_import_djt_user
Tool: import
Options:
----------------------------
verbose = false
incremental.last.value = 10
db.connect.string = jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false

codegen.input.delimiters.field = 0
hbase.create.table = false
hdfs.append.dir = true
db.table = djt_user
codegen.input.delimiters.escape = 0
import.fetch.size = null
accumulo.create.table = false
codegen.input.delimiters.enclose.required = false
db.username = hive
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = id
codegen.input.delimiters.record = 0
db.password.file = /user/hadoop/.password
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false

codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
codegen.output.delimiters.field = 64
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
codegen.compile.dir = /tmp/sqoop-hadoop/compile/d81bf23cb3eb8eb11e7064a16df0b92b
direct.import = false
hdfs.target.dir = /sqoop/test/djt_user
hive.fail.table.exists = false
db.batch = false
[hadoop@djt002 sqoopRunCreate]$

 

 

 

 

 Sqoop Import 應用場景——啟動壓縮

   啟動壓縮

   默認是gzip壓縮,具體去看官網

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> -table djt_user \
> --target-dir /sqoop/test/djt_user \
> --delete-target-dir \
> -m 1 \
> --fields-terminated-by "@" \
> -z

   這里,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,是在/sqoop/test/djt_user

 

 

 

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*

 

 

 

 

 

Sqoop Import 應用場景——導入空值處理

  (1)導入空值處理

 

 

   先,不加空值處理,看是怎樣的。

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> -table djt_user \
> --target-dir /sqoop/test/djt_user \
> --delete-target-dir \
> -m 1 \
> --fields-terminated-by "@"
> 

   這里,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,是在/sqoop/test/djt_user

 

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*

 

 

 

   所以,一般需要對null進行轉換,即需對空值進行處理。比如年齡那列,要么給他假如是18歲定死,要么就是0等。

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> -table djt_user \
> --target-dir /sqoop/test/djt_user \
> --delete-target-dir \
> -m 1 \
> --fields-terminated-by "@" \
> --null-non-string "###" \
> --null-string "###"

  這里,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,是在/sqoop/test/djt_user 

 

   我這里,將空值null轉換成###,這個,大家可以根據自己的需要,可以轉換成其它的,不多贅述。自行去舉一反三。

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*

 

 

 

 

   下面呢,這個場景,比如,如下,我不需全部的字段導出,非空值的某部分字段呢,該如何操作啊?

Sqoop Import 應用場景——導入部分數據

  (1)使用–columns 

  即,指定某個或某些字段導入

 

 

   比如,我這里,指定只導入id和name,當然,你可以去指定更多,我這里只是個參考和帶入門的引子實例罷了。

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> -table djt_user \
> --columns id,name \ > --target-dir /sqoop/test/djt_user \
> --delete-target-dir \
> -m 1 \
> --fields-terminated-by "@" \
> --null-non-string "###" \
> --null-string "###"

  這里,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,是在/sqoop/test/djt_user 

 

 

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*
1@王菲
2@謝霆鋒
3@周傑倫
4@王力宏
5@張三
6@李四
7@王五
8@王六
9@小王
10@小林
[hadoop@djt002 ~]$ 

 

 

  (2)使用–where

   剛是導入指定的字段,也可以用篩選來導入達到目的。

  比如,我這里,只想導入sex=female的。

 

[hadoop@djt002 sqoopRunCreate]$ sqoop import \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user \
> --where "sex='female'" \ > --target-dir /sqoop/test/djt_user \
> --delete-target-dir \
> -m 1 \
> --fields-terminated-by "@" \
> --null-non-string "###" \
> --null-string "###"

   這里,djt_user是在MySQL里,通過Sqoop工具,導入到HDFS里,是在/sqoop/test/djt_user

 

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*
1@王菲@female@36@歌手
6@李四@female@18@學生
9@小王@female@24@hadoop運維
10@小林@female@30@###
[hadoop@djt002 ~]$ 

 

 

 

   (3)使用–query

   比如,導入比較復雜更實用。

 

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*
2@謝霆鋒@male@30@歌手
6@李四@female@18@學生
9@小王@female@24@hadoop運維
10@小林@female@30@###
[hadoop@djt002 ~]$ 

 

 

 

 

 

 

 

 

注意
  若,從MySQL數據庫導入數據到HDFS里,出現中斷情況了怎么辦?

  答:好比MapReduce作業丟失一樣,有個容錯機制。但是這里,我們不用擔心,任務中斷導致數據重復插入,這個不需擔心。

  它這里呢,要么就是全部導入才成功,要么就是一條都導不進不成功。

      即,Sqoop Import HDFS 里沒有“臟數據”的情況發生

 

 

 

 

 

 

 

MySQL里的數據通過Sqoop import HDFS(作為擴展)

  下面我們看一下 Sqoop 如何使用命令行來導入數據的,其命令行語法如下所示。

sqoop import \
--connect jdbc:mysql://192.168.80.128:3306/db_hadoop \
--username sqoop \
--password sqoop \
--table user \
--target-dir /junior/sqoop/ \ //可選,不指定目錄,數據默認導入到/user下
--where "sex='female'" \ //可選
--as-sequencefile \ //可選,不指定格式,數據格式默認為 Text 文本格式
--num-mappers 10 \ //可選,這個數值不宜太大
--null-string '\\N' \ //可選
--null-non-string '\\N' \ //可選

 

--connect:指定 JDBC URL。
--username/password:mysql 數據庫的用戶名。
--table:要讀取的數據庫表。
--target-dir:將數據導入到指定的 HDFS 目錄下,文件名稱如果不指定的話,會默認數據庫的表名稱。
--where:過濾從數據庫中要導入的數據。
--as-sequencefile:指定數據導入數據格式。
--num-mappers:指定 Map 任務的並發度。
--null-string,--null-non-string:同時使用可以將數據庫中的空字段轉化為'\N',因為數據庫中字段為 null,會占用很大的空間。

 

 

 

 

 

下面我們介紹幾種 Sqoop 數據導入的特殊應用(作為擴展)

1、Sqoop 每次導入數據的時候,不需要把以往的所有數據重新導入 HDFS,只需要把新增的數據導入 HDFS 即可,下面我們來看看如何導入新增數據。

sqoop import \ --connect jdbc:mysql://192.168.80.128:3306/db_hadoop \
--username sqoop \ --password sqoop \ --table user \ --incremental append \    //代表只導入增量數據
--check-column id \     //以主鍵id作為判斷條件
--last-value 999    //導入id大於999的新增數據

  上述三個組合使用,可以實現數據的增量導入。

 

 

 

2、Sqoop 數據導入過程中,直接輸入明碼存在安全隱患,我們可以通過下面兩種方式規避這種風險。
  1)-P:sqoop 命令行最后使用 -P,此時提示用戶輸入密碼,而且用戶輸入的密碼是看不見的,起到安全保護作用。密碼輸入正確后,才會執行 sqoop 命令。

sqoop import \
--connect jdbc:mysql://192.168.80.128:3306/db_hadoop \
--username sqoop \
--table user \
-P

 

 

 

  2)--password-file:指定一個密碼保存文件,讀取密碼。我們可以將這個文件設置為只有自己可讀的文件,防止密碼泄露。

sqoop import \
--connect jdbc:mysql://192.168.80.128:3306/db_hadoop \
--username sqoop \
--table user \
--password-file my-sqoop-password

 

 

 

 

 

 

 

 

 

二、通過Sqoop Export HDFS里的數據到MySQL
  它的功能是將數據從 HDFS 導入關系型數據庫表中,其流程圖如下所示。

  我們來分析一下 Sqoop 數據導出流程,首先用戶輸入一個 Sqoop export 命令,它會獲取關系型數據庫的 schema,
建立 Hadoop 字段與數據庫表字段的映射關系。 然后會將輸入命令轉化為基於 Map 的 MapReduce作業,
  這樣 MapReduce作業中有很多 Map 任務,它們並行的從 HDFS 讀取數據,並將整個數據拷貝到數據庫中。

 

  大家,必須要去看官網!

 

 

 

 Sqoop Export 應用場景——直接導出

    直接導出

  請去看我下面的這篇博客,對你有好處。我不多贅述。

SQLyog普通版與SQLyog企業版對比分析

 

 

 

CREATE TABLE djt_user_copy SELECT * FROM djt_user WHERE 1=2;

 

 

 

 

 

 

 

[hadoop@djt002 sqoopRunCreate]$ sqoop export \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user_copy \
> --export-dir /sqoop/test/djt_user \
> --input-fields-terminated-by "@"

   這里,HDFS里,是在/sqoop/test/djt_user通過Sqoop工具,導出到djt_user_copy是在MySQL里

 

 

 

   因為啊,之前,/sqoop/test/djt_user的數據如下

 

 

 

 

 

 

 Sqoop Export 應用場景——指定map數

  指定map數

    Map Task默認是4個

 

 

[hadoop@djt002 sqoopRunCreate]$ sqoop export \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user_copy \
> --export-dir /sqoop/test/djt_user \
> --input-fields-terminated-by "@" \
> -m 1

   這里,HDFS里,是在/sqoop/test/djt_user,通過Sqoop工具,導出到djt_user_copy是在MySQL里。

 

 

 

 

 

 

   

Sqoop Export 應用場景——插入和更新

  插入和更新

 

 

 

 

[hadoop@djt002 sqoopRunCreate]$ sqoop export \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user_copy \
> --export-dir /sqoop/test/djt_user \
> --input-fields-terminated-by "@" \
> -m 1 \
> --update-key id \
> --update-mode allowinsert

   這里,HDFS里,是在/sqoop/test/djt_user通過Sqoop工具,導出到djt_user_copy是在MySQL里

 

 

 

 

 

 

 

 

Sqoop Export 應用場景——事務處理

  事務處理

   比如,從HDFS里導出到MySQL。這個時候可能會出現意外情況,如出現中斷,則會出現“臟數據”重復情況。

則提供了這個事務處理。

 

      即 HDFS  ->   先導出到  中間表(成功才可以,后續導出) -> MySQL

我這里是,     /sqoop/test/djt_user (在HDFS里)    ->     djt_user_copy_tmp (在MySQL里)  ->    djt_user_copy (在MySQL里) 

   這里,HDFS里,是在/sqoop/test/djt_user,通過Sqoop工具,導出到djt_user_copy是在MySQL里。

 

 

  注意這個中間表,需要創建djt_user_copy_tmp

 

 

 

 

 

 

[hadoop@djt002 sqoopRunCreate]$ sqoop export \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user_copy \
> --staging-table djt_user_copy_tmp \
> --clear-staging-table \
> --export-dir /sqoop/test/djt_user \
> -input-fields-terminated-by "@"

   這里,HDFS里,是在/sqoop/test/djt_user通過Sqoop工具,先導出到中間表djt_user_copy_tmp是在MySQL里,再繼續導出到djt_user_copy是在MySQL里。

 

 

 

  因為,此刻HDFS里的

 

 

 

 

  

  再次做測試,假設,我現在,把MySQL里的djt_user數據導入到HDFS里的/sqoop/test/djt_user。

[hadoop@djt002 sqoopRunCreate]$ sqoop import --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' --username hive --password-file /user/hadoop/.password -table djt_user  --target-dir /sqoop/test/djt_user --delete-target-dir -m 1 --fields-terminated-by "@" --null-non-string "###" --null-string "###"

 

 

[hadoop@djt002 ~]$ $HADOOP_HOME/bin/hadoop fs -cat /sqoop/test/djt_user/part-m-*
1@王菲@female@36@歌手
2@謝霆鋒@male@30@歌手
3@周傑倫@male@33@導演
4@王力宏@male@40@演員
5@張三@male@39@無業游民
6@李四@female@18@學生
7@王五@male@34@Java開發工程師
8@王六@male@45@hadoop工程師
9@小王@female@24@hadoop運維
10@小林@female@30@###
[hadoop@djt002 ~]$ 

 

   然后,接着,我們把HDFS里的/sqoop/test/djt_user    導出到  MySQL里的 djt_user_copydjt_user_copy。

  說白了,就是再次做了一下 Sqoop Export 應用場景——事務處理。(自己好好理清思路去)

 即 HDFS  ->   先導出到  中間表(成功才可以,后續導出) -> MySQL

我這里是,      /sqoop/test/djt_user(在HDFS里)    ->     djt_user_copy_tmp (在MySQL里)  ->    djt_user_copydjt_user_copy (在MySQL里) 

 

 

 

 

[hadoop@djt002 sqoopRunCreate]$ sqoop export \
> --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' \
> --username hive \
> --password-file /user/hadoop/.password \
> --table djt_user_copy \
> --staging-table djt_user_copy_tmp \
> --clear-staging-table \
> --export-dir /sqoop/test/djt_user \
> -input-fields-terminated-by "@"

  這里,HDFS里,是在/sqoop/test/djt_user通過Sqoop工具,先導出到中間表djt_user_copy_tmp是在MySQL里,再繼續導出到djt_user_copy是在MySQL里 

 

 

   因為,HDFS里的

 

 

 

 

   得到,

 

 

 

 

 

 

 

 

 

 

Sqoop Export  HDFS 應用場景——字段不對應問題

  字段不對應問題

   因為,在Sqoop import時,我們有選擇性的導入某個字段或某些字段對吧,那么,同樣,對於Sqoop export也是一樣!

[hadoop@djt002 sqoopRunCreate]$ sqoop import --connect 'jdbc:mysql://192.168.80.200/hive?useUnicode=true&characterEncoding=utf-8' --username hive --password-file /user/hadoop/.password -table djt_user --columns name,sex,age,profile --target-dir /sqoop/test/djt_user --delete-target-dir -m 1 --fields-terminated-by "@" --null-non-string "###" --null-string "###"

  

 

 

  比如,HDFS里(的/sqoop/test/djt_user/)有4列,  數據庫里(的djt_user_copy)有5列(因為會多出自增鍵)。那么,如何來處理這個棘手問題呢?

 

 

 

 

  這樣來處理, 

  照樣sqoop export里也有 -columns name,sex,age,profile \

 

 

 

   我的這里,自增鍵呢?/

 

 

 

 

 

 

 

 

 

 

 

通過Sqoop Export HDFS里的數據到MySQL(作為擴展)

    下面我們看一下 Sqoop 如何使用命令行來導出數據的,其命令行語法如下所示。

sqoop export \
--connect jdbc:mysql://192.168.80.128:3306/db_hadoop \
--username sqoop \
--password sqoop \
--table user \
--export-dir user

--connect:指定 JDBC URL。
--username/password:mysql 數據庫的用戶名和密碼。
--table:要導入的數據庫表。
--export-dir:數據在 HDFS 上的存放目錄。

 

 

 

 

下面我們介紹幾種 Sqoop 數據導出的特殊應用(作為擴展)

  1、Sqoop export 將數據導入數據庫,一般情況下是一條一條導入的,這樣導入的效率非常低。這時我們可以使用 Sqoop export 的批量導入提高效率,其具體語法如下。

sqoop export \ --Dsqoop.export.records.per.statement=10 \ --connect jdbc:mysql://192.168.80.128:3306/db_hadoop \ --username sqoop \ --password sqoop \ --table user \ --export-dir user \ --batch

      --Dsqoop.export.records.per.statement:指定每次導入10條數據,--batch:指定是批量導入。

 

  2、在實際應用中還存在這樣一個問題,比如導入數據的時候,Map Task 執行失敗,
那么該 Map 任務會轉移到另外一個節點執行重新運行,這時候之前導入的數據又要重新導入一份,造成數據重復導入。
因為 Map Task 沒有回滾策略,一旦運行失敗,已經導入數據庫中的數據就無法恢復。
Sqoop export 提供了一種機制能保證原子性, 使用--staging-table 選項指定臨時導入的表。


Sqoop export 導出數據的時候會分為兩步:
  第一步,將數據導入數據庫中的臨時表,如果導入期間 Map Task 失敗,會刪除臨時表數據重新導入;
  第二步,確認所有 Map Task 任務成功后,會將臨時表名稱為指定的表名稱。

 

sqoop export \
--connect jdbc:mysql://192.168.80.128:3306/db_hadoop \
--username sqoop \
--password sqoop \
--table user \
--staging-table staging_user

 

 

 

 

 

  3、在 Sqoop 導出數據過程中,如果我們想更新已有數據,可以采取以下兩種方式。

        1)通過 --update-key id 更新已有數據。

sqoop export \
--connect jdbc:mysql://192.168.80.128:3306/db_hadoop \
--username sqoop \
--password sqoop \
--table user \
--update-key id

 

   2)  使用 --update-key id和--update-mode allowinsert 兩個選項的情況下,如果數據已經存在,則更新數據,如果數據不存在,則插入新數據記錄。

sqoop export \
--connect jdbc:mysql://192.168.80.128.:3306/db_hadoop \
--username sqoop \
--password sqoop \
--table user \
--update-key id \
--update-mode allowinsert

 

 

4、如果 HDFS 中的數據量比較大,很多字段並不需要,我們可以使用 --columns 來指定插入某幾列數據。

sqoop export \
--connect jdbc:mysql://192.168.80.128:3306/db_hadoop \
--username sqoop \
--password sqoop \
--table user \
--column username,sex

 

 

 

 

5、當導入的字段數據不存在或者為null的時候,我們使用--input-null-string和--input-null-non-string 來處理。

sqoop export \
--connect jdbc:mysql://129.168.80.128:3306/db_hadoop \
--username sqoop \
--password sqoop \
--table user \
--input-null-string '\\N' \
--input-null-non-string '\\N'

 

 

 

 

 

 

 

 

 

推薦博客

http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html (sqoop官網文檔)

http://blog.csdn.net/aaronhadoop/article/details/26713431


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM