[sqoop1.99.7] sqoop入門-下載、安裝、運行和常用命令


 

一、簡介

 

Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources. Relational databases are examples of structured data sources with well defined schema for the data they store. Cassandra, Hbase are examples of semi-structured data sources and HDFS is an example of unstructured data source that Sqoop can support.

Apache Sqoop 是設計來用於在結構化、半結構化和非結構化數據源之間有效轉換數據的工具之一。
關系型數據庫存儲了良好定義的結構化的模型數據。
Cassandra, Hbase 存儲的是半結構化的數據。
HDFS 存儲的是非結構化的數據。
這些都是Sqoop支持數據轉換的數據庫。

 

官網:

http://sqoop.apache.org/

 

版本:

Sqoop版本分Sqoop1和Sqoop2,其中Sqoop1目前最高釋出版本為1.4.6,Sqoop2最高釋出版本為1.99.7,Sqoop1與Sqoop2相互間不兼容,而且Sqoop2目的並不是作為產品,主要是致力於開發。再者,其對Hadoop的支持版本有些特別要求,比如Hadoop1和Hadoop0.x還有Hadoop2.x的兼容性等。在下載時一般要注意其兼容的Hadoop版本(Sqoop官網上我沒有看到相關具體的描述,只是通過下載的文件名辨別與Hadoop的兼容性)。

Sqoop進行數據轉移時必須依賴於Hadoop的MapReduce作業,所以Hadoop必須在環境中存在,且能被Sqoop訪問。
下載時直接選擇已編譯好的bin版本,直接用。也可以下源代碼到本地編譯安裝,確保有Java環境,因為Sqoop用Java編寫的。
1、sqoop1 穩定版本 sqoop 1.4.6 http://sqoop.apache.org/docs/1.4.6/index.html http://mirror.bit.edu.cn/apache/sqoop/1.4.6/ 下載文件名: sqoop-1.4.6.bin__hadoop-0.23.tar.gz sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 源碼:sqoop-1.4.6.tar.gz 2、sqoop2 最新版本 sqoop 1.99.7 http://sqoop.apache.org/docs/1.99.7/index.html http://mirror.bit.edu.cn/apache/sqoop/1.99.7/ 下載文件名: sqoop-1.99.7-bin-hadoop200.tar.gz 源碼:sqoop-1.99.7.tar.gz

 

二、安裝配置

下載版本:

sqoop-1.99.7-bin-hadoop200.tar.gz

 

安裝:直接解壓放在任意目錄即可。

tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz

mv sqoop-1.99.7-bin-hadoop200 sqoop1.99.7

 

 

sqoop目錄

bin:可執行腳本,一般使用sqoop都是通過這個目錄中的工具調用,是一些shell或batch腳本。

conf:存放配置文件、目前僅有兩個配置文件:sqoop_bootstrap.properties 和 sqoop.properties

docs:目前不清楚具體是什么,可能是幫助文檔,不過一般使用sqoop不會用到。

server:里面只有一個lib目錄,存了很多jar文件,是sqoop2 的server包。

shell:里面只有一個lib目錄,存了很多jar文件,sqoop2的shell包。

tools:里面只有一個lib目錄,存了很多jar文件,sqoop2的工具包。

 

配置

(1)安裝Java JDK

版本

[root@hadoop-allinone-200-123 hadoop-2.7.3]# java -version
java version "1.7.0_67"

 

JAVA_HOME

[root@hadoop-allinone conf]# echo $JAVA_HOME
/wdcloud/app/jdk1u7

 

(2)Hadoop環境

版本
[root@hadoop-allinone-200-123 bin]# ./hadoop version
Hadoop 2.7.3 HADOOP_HOME
[root@hadoop-allinone-200-123 hadoop-2.7.3]# pwd
/wdcloud/app/hadoop-2.7.3

 

(3)配置環境變量

添加一個系統環境變量,HADOOP_HOME,本例中設置為/home/hadoop/hadoop2.6。

無論是/etc/profile還是在/etc/profile.d中創建一個腳本導入變量,亦或是在~/.bashrc文件中寫,都可以:

在/etc/profile(全局環境變量)中加入hadoop環境變量
export HADOOP_HOME=/wdcloud/app/hadoop-2.7.3

[root@hadoop-allinone-200-123 hadoop-2.7.3]# source /etc/profile

[root@hadoop-allinone-200-123 hadoop-2.7.3]# echo $HADOOP_HOME
/wdcloud/app/hadoop-2.7.3
注意:配置這個變量主要是讓Sqoop能找到以下目錄的jar文件和Hadoop配置文件:
$HADOOP_HOME/share/hadoop/common
$HADOOP_HOME/share/hadoop/hdfs
$HADOOP_HOME/share/hadoop/mapreduce
$HADOOP_HOME/share/hadoop/yarn

官網上說名了可以單獨對各個組建進行配置,使用以下變量:

$HADOOP_COMMON_HOME = /wdcloud/app/hadoop-2.7.3/share/hadoop/common
$HADOOP_HDFS_HOME = /wdcloud/app/hadoop-2.7.3/share/hadoop/hdfs
$HADOOP_MAPRED_HOME = /wdcloud/app/hadoop-2.7.3/share/hadoop/mapreduce
$HADOOP_YARN_HOME = /wdcloud/app/hadoop-2.7.3/share/hadoop/yarn

若$HADOOP_HOME已經配置了,最好不要再配置下面的變量,可能會有些莫名錯誤。

 

配置sqoop根目錄和第三方jar引用路徑

[root@hadoop-allinone-200-123 hadoop-2.7.3]# vim /etc/profile

export SQOOP_HOME=/wdcloud/app/sqoop1.99.7
export SQOOP_SERVER_EXTRA_LIB=/wdcloud/app/sqoop1.99.7/extra 

  [root@hadoop-allinone-200-123 sqoop-1.99.7]# mkdir extra

把mysql的驅動jar文件復制到這個目錄下。

 

 

(4)配置Hadoop代理訪問

因為sqoop訪問Hadoop的MapReduce使用的是代理的方式,必須在Hadoop中配置所接受的proxy用戶和組。
找到Hadoop的core-site.xml配置文件(本例是$HADOOP_HOME/etc/hadoop/core-site.xml):

<property>
  <name>hadoop.proxyuser.$SERVER_USER.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.$SERVER_USER.groups</name>
  <value>*</value>
</property>
$SERVER_USER是運行Sqoop2 Server的系統用戶,本例我使用了hadoop用戶運行server,所以將之代替為hadoop。
注意:保證你的用戶id大於1000(可用id命令查看),否則作為系統變量運行時,可能需要其他配置,參照官網。

 

(5)sqoop核心配置文件

 

sqoop_bootstrap.properties

配置config支持類,這里一般使用默認值即可:

sqoop.config.provider=org.apache.sqoop.core.PropertiesConfigurationProvider  

 

sqoop.properties

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/wdcloud/app/hadoop-2.7.3/etc/hadoop  
  
org.apache.sqoop.security.authentication.type=SIMPLE  
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler  
org.apache.sqoop.security.authentication.anonymous=true  

注意:官方文檔上只說了配置上面第一項,mapreduce的配置文件路徑,但后來運行出現authentication異常,找到sqoop文檔描述security部分,發現sqoop2支持hadoop的simple和kerberos兩種驗證機制。所以配置了一個simple驗證,這個異常才消除。

 

三、運行

 驗證配置是否有效

bin/sqoop2-tool verify
[root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-tool verify  
Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
Sqoop home directory: /wdcloud/app/sqoop-1.99.7
Sqoop tool executor:
    Version: 1.99.7
    Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
    Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
20   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

 

開啟服務器

bin/sqoop2-server start  
[root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-server start  
Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
Sqoop home directory: /wdcloud/app/sqoop-1.99.7
Starting the Sqoop2 server...
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
22   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Sqoop2 server started.

 

#開啟服務器后生成了兩個目錄(在那個目錄下運行就在哪個目錄下生成)

[root@hadoop-allinone-200-123 sqoop-1.99.7]# ll | grep @
drwxr-xr-x 3 root root 23 Dec 18 22:19 @BASEDIR@
drwxr-xr-x 2 root root 58 Dec 18 22:23 @LOGDIR@


#查看sqoop運行日志:

[root@hadoop-allinone-200-123 sqoop-1.99.7]# ll \@LOGDIR\@/
total 136
-rw-r--r-- 1 root root   165 Dec 18 22:22 audit.log
-rw-r--r-- 1 root root   670 Dec 18 22:21 derbyrepo.log
-rw-r--r-- 1 root root 78957 Dec 18 22:22 sqoop.log

 

關閉服務器

bin/sqoop2-server stop
[root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-server stop
Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf Sqoop home directory: /wdcloud/app/sqoop-1.99.7 Stopping the Sqoop2 server... Sqoop2 server stopped.

 

 

開啟客戶端

bin/sqoop2-shell
[root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-shell  
Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
Sqoop home directory: /wdcloud/app/sqoop-1.99.7
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000> 

若成功會開啟sqoop的shell命令行提示符:sqoop:000>

 

至此,sqoop1.99.7的配置和啟動已經完成。

 

四、sqoop客戶端常用命令

 

使用sqoop前請確保hadoop服務和Sqoop2服務器均已啟動。其中Hadoop不僅要啟動hdfs(NameNode、DataNode),還要啟動yarn(NodeManager、ResourceManager),當然,一般還會有一個SecondaryNameNode,用於原始NameNode的備援進程。

[root@hadoop-allinone-200-123 sqoop-1.99.7]# jps
4352 ResourceManager
4195 SecondaryNameNode
2835 QuorumPeerMain
21167 HMaster
4451 NodeManager
2986 QuorumPeerMain
2803 QuorumPeerMain
4030 DataNode
21256 HRegionServer
3905 NameNode
5024 SqoopJettyServer
5186 Jps

sqoop2客戶端提供各種命令行交互接口,供用戶使用。sqoop2客戶端先連接Sqoop Server,將參數傳遞過去,再調用mapreduce進行數據導入到出作業。

 

配置sqoop server參數

[root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-shell 
Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
Sqoop home directory: /wdcloud/app/sqoop-1.99.7
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000>set server --host 192.168.200.123 --port 12000 --webapp sqoop
Server is set successfully

注意:當設置host port 和 webapp 時,--url可以忽略
如果使用--url,用法如下:
set server --url http://sqoop2.company.net:80/sqoop

port是默認值,最后一個--webapp官方文檔說是指定的sqoop jetty服務器名稱。

配置完畢后驗證服務器是否正確連接:

sqoop:000> show version --all 
client version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
server version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
API versions:
  [v1]

若server版本信息能正確顯示,則沒問題!能正確鏈接上。

 

 

查看幫助

Available commands:
  :exit    (:x  ) Exit the shell
  :history (:H  ) Display, manage and recall edit-line history
  help     (\h  ) Display this help message
  set      (\st ) Configure various client options and settings
  show     (\sh ) Display various objects and configuration options
  create   (\cr ) Create new object in Sqoop repository
  delete   (\d  ) Delete existing object in Sqoop repository
  update   (\up ) Update objects in Sqoop repository
  clone    (\cl ) Create new object based on existing one
  start    (\sta) Start job
  stop     (\stp) Stop job
  status   (\stu) Display status of a job
  enable   (\en ) Enable object in Sqoop repository
  disable  (\di ) Disable object in Sqoop repository
  grant    (\g  ) Grant access to roles and assign privileges
  revoke   (\r  ) Revoke access from roles and remove privileges

For help on a specific command type: help command
查看命令幫助:

sqoop:000> \st Usage: set [server|option|truststore] sqoop:000> \sh Usage: show [server|version|connector|driver|link|job|submission|option|role|principal|privilege] sqoop:000> \cr Usage: create [link|job|role] sqoop:000> \d Usage: delete [link|job|role] sqoop:000> \up Usage: update [link|job] sqoop:000> \cl Usage: clone [link|job] sqoop:000> \sta Usage: start [job] sqoop:000> \stp Usage: stop [job] sqoop:000> \stu Usage: status [job] sqoop:000> \en Usage: enable [link|job] sqoop:000> \di Usage: disable [link|job] sqoop:000> \g Usage: grant [role|privilege] sqoop:000> \r Usage: revoke [role|privilege]

 

例如:如果需要退出命令行交互工具,請輸入[:x]命令

sqoop:000> :x
[root@hadoop-allinone-200-123 sqoop-1.99.7]# 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM