sqoop的安裝與配置


最近需要將MySQL的數據導出到HDFS,所以搜到了sqoop2。跟sqoop1相比,sqoop2的好處是直接使用程序連接到集群上的sqoop,遠程操作。流程是需要先創建link也可以理解成要操作的對象,比如一個link是hdfs,一個link是mysql,有了link后需要創建job,創建job需要指定這兩個link進行交互,設置from和to的關系,然后執行job就可以了。

安裝:

安裝真是個大問題,問題簡直層出不窮,花了我整整一個晚上才把它勉強弄好,下面記錄一下安裝路上遇到的坑s。

首先,我安裝的是1.99.7最新版本的,下載地址

官方文檔可見:Apache Sqoop2

一、Hadoop安裝

hadoop安裝的具體操作可見該博客的第5節之后的內容:https://www.cnblogs.com/bjwu/p/9863634.html

注意⚠️,在配置core-site.xml的過程中,需要再添加一下兩個屬性:

<property>
  <name>hadoop.proxyuser.sqoop2.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.sqoop2.groups</name>
  <value>*</value>
</property>

並且,在配置文件container-executor.cfg中,記得添加:

allowed.system.users=sqoop2

二、Third party jars

第三方jars,由於我的項目需要,我只要導入mysql-connector-java就ok。在這里下載,解壓后取得jar文件,執行以下命令:

# Create directory for extra jars
mkdir -p /var/lib/sqoop2/

# Copy all your JDBC drivers to this directory
cp mysql-jdbc*.jar /var/lib/sqoop2/

三、環境變量

.bash_profile中添加環境變量

export SQOOP_HOME=/usr/lib/sqoop 
export SQOOP_SERVER_EXTRA_LIB=/var/lib/sqoop2/
export PATH=$PATH:$SQOOP_HOME/bin

四、配置服務器

這里問題就來了,看到官網上是這樣寫的:

Second configuration file called sqoop.properties contains remaining configuration properties that can affect Sqoop server. The configuration file is very well documented, so check if all configuration properties fits your environment. Default or very little tweaking should be sufficient in most common cases.

然而,只是默認的配置還真不行:

打開sqoop.properties,將以下第一行改為你自己的目錄,再加上另外三行:

官方文檔上只說了配置上面第一項,mapreduce的配置文件路徑,但后來運行出現authentication異常,找到sqoop文檔描述security部分,發現sqoop2支持hadoop的simple和kerberos兩種驗證機制。所以配置了一個simple驗證,這個異常才消除。

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=$HADOOP_HOME/etc/hadoop

org.apache.sqoop.security.authentication.type=SIMPLE  
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler  
org.apache.sqoop.security.authentication.anonymous=true  

當然在這個過程中,可能遇到若干個問題,比如

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/configuration/Configuration

你可以試試如下方法:

cp -R $HADOOP_HOME/share/hadoop/common/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/common/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/hdfs/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/hdfs/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/mapreduce/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/mapreduce/lib/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/yarn/* $SQOOP_HOME/server/lib/
cp -R $HADOOP_HOME/share/hadoop/yarn/lib/* $SQOOP_HOME/server/lib/

五、啟動

配置完后第一次啟動前需要進行配置初始化,即:

$ sqoop2-tool upgrade
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.UpgradeTool
2019-01-10 22:31:06,509 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

真香!之后,可以檢測是否配置一切都正確:

$ sqoop2-tool verify 
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
2019-01-10 22:31:42,317 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2019-01-10 22:31:42,326 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

啟動服務器:

$ sqoop2-server start  
Setting conf dir: /usr/lib/sqoop/bin/../conf
Sqoop home directory: /usr/lib/sqoop
Starting the Sqoop2 server...
2019-01-10 22:37:22,806 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2019-01-10 22:37:22,816 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
Sqoop2 server started.

六、換個思路

好吧,說了這么多,我還是換成sqoop1了,因為sqoop2的操作及真正完全沒有bug真是有點小復雜,學習成本有點高。

sqoop1的安裝網上教程多的是。就說一點,在運行sqoop1程序的時候,導入maven的依賴有點多:

反正我因為各種Exception放了以下這么多的庫😢:

<dependency>
	<groupId>org.apache.sqoop</groupId>
	<artifactId>sqoop</artifactId>
	<version>1.4.7</version>
	<scope>system</scope>
	<systemPath>${basedir}/lib/sqoop-1.4.7.jar</systemPath>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-mapreduce-client-core</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
	<version>2.6.5</version>
</dependency>
<dependency>
	<groupId>org.apache.avro</groupId>
	<artifactId>avro</artifactId>
	<version>1.8.2</version>
</dependency>

Reference:

  1. https://stackoverflow.com/questions/41405072/sqoop-integration-with-hadoop-throw-classnotfoundexception
  2. https://sqoop.apache.org/docs/1.99.7/admin/Installation.html
  3. http://brianoneill.blogspot.com/2014/10/sqoop-1993-w-hadoop-2-installation.html
  4. https://www.yiibai.com/sqoop/sqoop_installation.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM