Kettle——shell交互命令
在kettle上開發了job或transform可以以單獨的文件存在,也可以存放在資源庫中。調用這些程序可以通過shell腳本調用,記錄下:
資源庫中的job:
./kitchen.sh -rep ZYFS_REP -user admin -pass admin -param:file_name=/home/etluser/etl_data/test/etl_test.csv -dir /test -job JB_ETL_TEST
單個文件的job:
./kitchen.sh -file /home/rdb/JB_QFPD.kjb
單個文件的transform:
./pan.sh -file /home/rdb//TR_QFPD.ktr
kitchen.sh 相關參數說明
Options: -rep = Repository name -user = Repository username -pass = Repository password -job = The name of the job to launch -dir = The directory (dont forget the leading /) -file = The filename (Job XML) to launch -level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing) -logfile = The logging file to write to -listdir = List the directories in the repository -listjobs = List the jobs in the specified directory -listrep = List the available repositories -norep = Do not log into the repository -version = show the version, revision and build date -param = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv -listparam = List information concerning the defined parameters in the specified job. -export = Exports all linked resources of the specified job. The argument is the name of a ZIP file. -custom = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>,for example: -custom:COLOR=Red -maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default) -maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
pan.sh 相關參數說明
Options: -rep = 資源庫名稱 -user = 資源庫用戶名 -pass = 資源庫密碼 -trans = 要啟動的轉換名稱 -dir = 目錄(不要忘了前綴 /) -file = 要啟動的文件名(轉換所在的 XML 文件) -level = 日志等級 (基本, 詳細, 調試, 行級, 錯誤, 沒有) -logfile = 要寫入的日志文件 -listdir = 列出資源庫里的目錄 -listtrans = 列出指定目錄下的轉換 -listrep = 列出可用資源庫 -exprep = 將資源庫里的所有對象導出到 XML 文件中 -norep = 不要將日志寫到資源庫中 -safemode = 安全模式下運行: 有額外的檢查 -version = 顯示版本,校訂和構建日期 -param = Set a named parameter <NAME>=<VALUE>. For example -param:FOO=bar -listparam = List information concerning the defined named parameters in the specified transformation. -metrics = Gather metrics during execution -maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default) -maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
數據庫資源庫和文件資源庫的比較:
使用數據庫來管理,更容易跨平台和共享,但是在版本控制方面不如文件資源庫好,並且數據庫需要走網絡,網絡連接的異常也會導致job失敗
使用文件資源庫比較麻煩的是跨平台,一般和svn等版本控制的工具結合使用。