接上一篇在本機用windows 跑kettle遷移,鑒於其中一張表每天增量200W左右,本地遷移速度太慢,加上vpn連接服務器不穩定經常斷,kettle並不支持斷點續傳,決定把windows上的kettle配置放在linux環境上跑。
一:linux安裝jdk
參考:https://www.cnblogs.com/nothingonyou/p/11936850.html
二:liunx部署kettle
kettle是直接部署在了要轉換的mysql服務器上,避免了中間的傳輸網絡消耗,把windows下已解壓的文件,壓縮后上傳至linux相關目錄,如圖:
1.建立kettle目錄,並解壓文件
[root@localhost opt]# mkdir kettle
[root@localhost opt]# mv data-integration.zip kettle/
[root@localhost opt]# cd kettle
[root@localhost kettle]# unzip data-integration.zip
[root@localhost kettle]# rm -rf data-integration.zip
[root@localhost kettle]# cd data-integration/
[root@localhost data-integration]# chmod +x *.sh
2.測試是否安裝成功
[root@localhost /]# cd /opt/kettle/data-integration/
[root@localhost data-integration]# ./kitchen.sh
出現下面信息,表明安裝成功:
[root@localhost data-integration]# ./kitchen.sh
#######################################################################
WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable
Consider installing the package with apt-get or yum.
e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
#######################################################################
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=51200m; support was removed in 8.0
Options:
-rep = Repository name
-user = Repository username
-pass = Repository password
-job = The name of the job to launch
-dir = The directory (dont forget the leading /)
-file = The filename (Job XML) to launch
-level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
-logfile = The logging file to write to
-listdir = List the directories in the repository
-listjobs = List the jobs in the specified directory
-listrep = List the available repositories
-norep = Do not log into the repository
-version = show the version, revision and build date
-param = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
-listparam = List information concerning the defined parameters in the specified job.
-export = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
-custom = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
-maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
-maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
備注:
kitchen.sh:用來執行job作業
pan.sh:用來執行ktr轉換
三:腳本調用kettle程序
1.創建kettle工作目錄
[root@localhost opt]# mkdir -p /opt/kettle/kettle_file/job
[root@localhost opt]# mkdir -p /opt/kettle/kettle_file/transition
[root@localhost opt]# mkdir -p /opt/kettle/kettle_sh
[root@localhost opt]# mkdir -p /opt/kettle/kettle_log
2.在/opt/kettle/kettle_sh目錄下創建執行文件vim o2m.sh
#!/bin/sh
cd /opt/kettle/data-integration/
export JAVA_HOME=/opt/apps/java/jdk1.8.0_191
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
./pan.sh -file=/opt/kettle/kettle_file/transition/o2m.ktr >>/opt/kettle/kettle_log/o2m_$(date +%Y%m%d).log &
3.修改執行權限
chmod +x o2m.sh
4.配置好windows 的ktr文件,測試成功后上傳至linux對應的目錄transition下
5.linux上oracle和mysql的驅動目錄和windows不一樣,放在:
/opt/kettle/data-integration/libswt/linux/x86_64
[root@localhost x86_64]# pwd
/opt/kettle/data-integration/libswt/linux/x86_64
[root@localhost x86_64]# ll
總用量 7444
-rw-r--r--. 1 root root 992808 11月 26 09:03 mysql-connector-java-5.1.41-bin.jar
-rw-r--r--. 1 root root 2001778 11月 26 09:03 mysql-connector-java-6.0.6.jar
-rw-r--r--. 1 root root 2739670 11月 26 09:04 ojdbc6.jar
-rw-r--r--. 1 root root 1880133 5月 16 2017 swt.jar
6.執行sh腳本文件
./o2m.sh &
7.查看log觀察轉換情況:
[root@localhost kettle_log]# tail -20f 1127_20191128.log
2019/11/28 14:55:26 - 表輸出.4 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.0 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.5 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.1 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.2 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.3 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - 表輸出.7 - 完成處理 (I=0, O=510558, R=510558, W=510558, U=0, E=0)
2019/11/28 14:55:26 - Pan - 完成!
2019/11/28 14:55:26 - Pan - 開始=2019/11/28 14:52:41.492, 停止=2019/11/28 14:55:26.329
2019/11/28 14:55:26 - Pan - Processing ended after 2 minutes and 44 seconds (164 seconds total).
2019/11/28 14:55:26 - 2 -
2019/11/28 14:55:26 - 2 - 進程 表輸入.0 成功結束, 處理了 4084464 行. ( 24905 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.0 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.1 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.2 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.3 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.4 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.5 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.6 成功結束, 處理了 510558 行. ( 3113 行/秒)
2019/11/28 14:55:26 - 2 - 進程 表輸出.7 成功結束, 處理了 510558 行. ( 3113 行/秒)
可以看到,400W+的數據只用了3分鍾不到便完成了轉換,速度得到了大大提升。