linux上使用kettle遷移數據


接上一篇在本機用windows 跑kettle遷移,鑒於其中一張表每天增量200W左右,本地遷移速度太慢,加上vpn連接服務器不穩定經常斷,kettle並不支持斷點續傳,決定把windows上的kettle配置放在linux環境上跑。

一:linux安裝jdk

參考:https://www.cnblogs.com/nothingonyou/p/11936850.html

二:liunx部署kettle

kettle是直接部署在了要轉換的mysql服務器上,避免了中間的傳輸網絡消耗,把windows下已解壓的文件,壓縮后上傳至linux相關目錄,如圖:

1.建立kettle目錄,並解壓文件

[root@localhost opt]# mkdir kettle
[root@localhost opt]# mv data-integration.zip kettle/
[root@localhost opt]# cd kettle
[root@localhost kettle]# unzip data-integration.zip
[root@localhost kettle]# rm -rf data-integration.zip 
[root@localhost kettle]# cd data-integration/
[root@localhost data-integration]# chmod +x *.sh

2.測試是否安裝成功

[root@localhost /]# cd /opt/kettle/data-integration/
[root@localhost data-integration]# ./kitchen.sh 

出現下面信息,表明安裝成功:

[root@localhost data-integration]# ./kitchen.sh 
#######################################################################
WARNING:  no libwebkitgtk-1.0 detected, some features will be unavailable
    Consider installing the package with apt-get or yum.
    e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
#######################################################################
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=51200m; support was removed in 8.0
Options:
  -rep            = Repository name
  -user           = Repository username
  -pass           = Repository password
  -job            = The name of the job to launch
  -dir            = The directory (dont forget the leading /)
  -file           = The filename (Job XML) to launch
  -level          = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
  -logfile        = The logging file to write to
  -listdir        = List the directories in the repository
  -listjobs       = List the jobs in the specified directory
  -listrep        = List the available repositories
  -norep          = Do not log into the repository
  -version        = show the version, revision and build date
  -param          = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
  -listparam      = List information concerning the defined parameters in the specified job.
  -export         = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
  -custom         = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
  -maxloglines    = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
  -maxlogtimeout  = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)

備注:

kitchen.sh:用來執行job作業
pan.sh:用來執行ktr轉換

三:腳本調用kettle程序

1.創建kettle工作目錄

[root@localhost opt]# mkdir -p /opt/kettle/kettle_file/job
[root@localhost opt]# mkdir -p /opt/kettle/kettle_file/transition
[root@localhost opt]# mkdir -p /opt/kettle/kettle_sh
[root@localhost opt]# mkdir -p /opt/kettle/kettle_log

2.在/opt/kettle/kettle_sh目錄下創建執行文件vim o2m.sh

#!/bin/sh
cd /opt/kettle/data-integration/
export JAVA_HOME=/opt/apps/java/jdk1.8.0_191
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
./pan.sh -file=/opt/kettle/kettle_file/transition/o2m.ktr >>/opt/kettle/kettle_log/o2m_$(date +%Y%m%d).log &

3.修改執行權限

chmod +x o2m.sh

4.配置好windows 的ktr文件,測試成功后上傳至linux對應的目錄transition下

 

 5.linux上oracle和mysql的驅動目錄和windows不一樣,放在:

/opt/kettle/data-integration/libswt/linux/x86_64
[root@localhost x86_64]# pwd
/opt/kettle/data-integration/libswt/linux/x86_64
[root@localhost x86_64]# ll
總用量 7444
-rw-r--r--. 1 root root  992808 11月 26 09:03 mysql-connector-java-5.1.41-bin.jar
-rw-r--r--. 1 root root 2001778 11月 26 09:03 mysql-connector-java-6.0.6.jar
-rw-r--r--. 1 root root 2739670 11月 26 09:04 ojdbc6.jar
-rw-r--r--. 1 root root 1880133 5月  16 2017 swt.jar

6.執行sh腳本文件

./o2m.sh &

7.查看log觀察轉換情況:

[root@localhost kettle_log]# tail -20f 1127_20191128.log
2019/11/28 14:55:26 - 表輸出.4 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.0 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.5 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.1 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.2 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.3 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.7 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - Pan - 完成!
2019/11/28 14:55:26 - Pan - 開始=2019/11/28 14:52:41.492, 停止=2019/11/28 14:55:26.329
2019/11/28 14:55:26 - Pan - Processing ended after 2 minutes and 44 seconds (164 seconds total).
2019/11/28 14:55:26 - 2 -  
2019/11/28 14:55:26 - 2 - 進程 表輸入.0 成功結束, 處理了 4084464 行. ( 24905 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.0 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.1 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.2 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.3 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.4 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.5 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.6 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.7 成功結束, 處理了 510558 行. ( 3113 行/秒)

可以看到,400W+的數據只用了3分鍾不到便完成了轉換,速度得到了大大提升。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM