oozie說明(本文參考多處,自己留看)


Oozie概述:

  Oozie是一個基於Hadoop工作流引擎,也可以稱為調度器,它以xml的形式寫調度流程,可以調度mr,pig,hive,shell,jar,spark等等。在實際工作中,遇到對數據進行一連串的操作的時候很實用,不需要自己寫一些處理代碼了,只需要定義好各個action,然后把他們串在一個工作流里面就可以自動執行了。對於大數據的分析工作非常有用. (以下介紹內容基於Oozie 4.1.0版本)

 Oozie有幾個主要概念:

  workflow :工作流 ,順序執行流程節點,支持fork(分支多個節點),join(合並多個節點為一個)。

  coordinator :多個workflow可以組成一個coordinator,可以把前幾個workflow的輸出作為后一個workflow的輸入,也可以定義workflow的觸發條件,來做定時觸發。

  bundle: 是對一堆coordinator的抽象, 可綁定多個coordinator。

  job.properties:定義環境變量。

oozie安裝 略

生命周期:

在Oozie中,工作流的狀態可能存在如下幾種:

狀態

含義說明

PREP

一個工作流Job第一次創建將處於PREP狀態,表示工作流Job已經定義,但是沒有運行。

RUNNING

當一個已經被創建的工作流Job開始執行的時候,就處於RUNNING狀態。它不會達到結束狀態,只能因為出錯而結束,或者被掛起。

SUSPENDED

一個RUNNING狀態的工作流Job會變成SUSPENDED狀態,而且它會一直處於該狀態,除非這個工作流Job被重新開始執行或者被殺死。

SUCCEEDED

當一個RUNNING狀態的工作流Job到達了end節點,它就變成了SUCCEEDED最終完成狀態。

KILLED

當一個工作流Job處於被創建后的狀態,或者處於RUNNING、SUSPENDED狀態時,被殺死,則工作流Job的狀態變為KILLED狀態。

FAILED

當一個工作流Job不可預期的錯誤失敗而終止,就會變成FAILED狀態。

上述各種狀態存在相應的轉移(工作流程因為某些事件,可能從一個狀態跳轉到另一個狀態),其中合法的狀態轉移有如下幾種,如下表所示:

轉移前狀態

轉移后狀態集合

未啟動

PREP

PREP

RUNNING、KILLED

RUNNING

SUSPENDED、SUCCEEDED、KILLED、FAILED

SUSPENDED

RUNNING、KILLED

明確上述給出的狀態轉移空間以后,可以根據實際需要更加靈活地來控制工作流Job的運行。

 

oozie格式:

1.workflow:

 Oozie定義了一種基於XML的hPDL (Hadoop Process Definition Language)來描述workflow的DAG。在workflow中定義了控制流節點(Control Flow Nodes)、動作節點(Action Nodes)

其中,控制流節點定義了流程的開始和結束(start、end),以及控制流程的執行路徑(Execution Path),如decision、fork、join等;而動作節點包括Hadoop任務、SSH、HTTP、eMail和Oozie子流程等。

Action Node定義了基本的工作任務節點。

語法:

 

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">

  ...

    <start to="[NODE-NAME]"/>

   <action name="[NODE-NAME]">

          ....
     <ok to="[NODE-NAME]"/>

        <error to="[NODE-NAME]"/>

    </action> 
   <kill name="[NODE-NAME]"> <message>[MESSAGE-TO-LOG]</message> </kill>  

   <end name="[NODE-NAME]"/>

</workflow-app>

 

1.1 Map-Reduce Action

map-reduce動作會在工作流Job中啟動一個MapReduce Job任務運行,我們可以詳細配置這個MapReduce Job。另外,可以通過map-reduce元素的子元素來配置一些其他的任務,如streaming、pipes、file、archive等等。

語法:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">

    ...

    <action name="[NODE-NAME]">

        <map-reduce>

            <job-tracker>[JOB-TRACKER]</job-tracker>

            <name-node>[NAME-NODE]</name-node>

            <prepare>

                <delete path="[PATH]"/>

                ...

                <mkdir path="[PATH]"/>

                ...

            </prepare>

            <streaming>

                <mapper>[MAPPER-PROCESS]</mapper>

                <reducer>[REDUCER-PROCESS]</reducer>

                <record-reader>[RECORD-READER-CLASS]</record-reader>

                <record-reader-mapping>[NAME=VALUE]</record-reader-mapping>

                ...

                <env>[NAME=VALUE]</env>

                ...

            </streaming>

                                     <!-- Either streaming or pipes can be specified for an action, not both -->

            <pipes>

                <map>[MAPPER]</map>

                <reduce>[REDUCER]</reducer>

                <inputformat>[INPUTFORMAT]</inputformat>

                <partitioner>[PARTITIONER]</partitioner>

                <writer>[OUTPUTFORMAT]</writer>

                <program>[EXECUTABLE]</program>

            </pipes>

            <job-xml>[JOB-XML-FILE]</job-xml>

            <configuration>

                <property>

                    <name>[PROPERTY-NAME]</name>

                    <value>[PROPERTY-VALUE]</value>

                </property>

                ...

            </configuration>

            <file>[FILE-PATH]</file>

            ...

            <archive>[FILE-PATH]</archive>

            ...

        </map-reduce>        <ok to="[NODE-NAME]"/>

        <error to="[NODE-NAME]"/>

    </action>

    ...

</workflow-app>

官網給出的例子:

<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">

    ...

    <action name="myfirstHadoopJob">

        <map-reduce>

            <job-tracker>foo:8021</job-tracker>

            <name-node>bar:8020</name-node>

            <prepare>

                <delete path="hdfs://foo:8020/usr/tucu/output-data"/>

            </prepare>

            <job-xml>/myfirstjob.xml</job-xml>

            <configuration>

                <property>

                    <name>mapred.input.dir</name>

                    <value>/usr/tucu/input-data</value>

                </property>

                <property>

                    <name>mapred.output.dir</name>

                    <value>/usr/tucu/input-data</value>

                </property>

                <property>

                    <name>mapred.reduce.tasks</name>

                    <value>${firstJobReducers}</value>

                </property>

                <property>

                    <name>oozie.action.external.stats.write</name>

                    <value>true</value>

                </property>

            </configuration>

        </map-reduce>

        <ok to="myNextAction"/>

        <error to="errorCleanup"/>

    </action>

    ...

</workflow-app>

 

1.2 Ssh Action

該動作主要是通過ssh登錄到一台主機,能夠執行一組shell命令.

注意: SSH actions在 Oozie schema 0.1中使用, 在Oozie schema 0.2已被刪除.

       ssh action將一個shell命令作為一個遠程安全的shell在遠程主機后台啟動. 工作流工作將等到遠程shell命令完成后再繼續下一個動作。shell命令必須存在於遠程計算機中,必須通過命令路徑執行它。

語法:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">

    ...

    <action name="[NODE-NAME]">

        <ssh>

            <host>[USER]@[HOST]</host>

            <command>[SHELL]</command>

            <args>[ARGUMENTS]</args>

            ...

            <capture-output/>

        </ssh>

        <ok to="[NODE-NAME]"/>

        <error to="[NODE-NAME]"/>

    </action>

    ...

</workflow-app>

 

官網給出的例子:

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">

    ...

    <action name="myssjob">

        <ssh>

            <host>foo@bar.com<host>

            <command>uploaddata</command>

            <args>jdbc:derby://bar.com:1527/myDB</args>

            <args>hdfs://foobar.com:8020/usr/tucu/myData</args>

        </ssh>

        <ok to="myotherjob"/>

        <error to="errorcleanup"/>

    </action>

    ...

</workflow-app>

 

 

1.3 Java Action

Oozie支持Java action ,Java action 會自動執行workflow任務中制定的java類中的 public static void main(String[] args)方法,會在hadoop集群上以單mapper task的形式執行一個map-reduce job.

 

workflow任務會等待當前java程序執行完繼續執行下一個action,這意味着我們可以寫多個action以此來調用多個類.  當java類正確執行退出后,將會進入ok控制流;當發生異常時,將會進入error控制流。

語法:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">

    ...

    <action name="[NODE-NAME]">

        <java>

            <job-tracker>[JOB-TRACKER]</job-tracker>

            <name-node>[NAME-NODE]</name-node>

            <prepare>

               <delete path="[PATH]"/>

               ...

               <mkdir path="[PATH]"/>

               ...

            </prepare>

            <job-xml>[JOB-XML]</job-xml>

            <configuration>

                <property>

                    <name>[PROPERTY-NAME]</name>

                    <value>[PROPERTY-VALUE]</value>

                </property>

                ...

            </configuration>

            <main-class>[MAIN-CLASS]</main-class>

                                     <java-opts>[JAVA-STARTUP-OPTS]</java-opts>

                                     <arg>ARGUMENT</arg>

            ...

            <file>[FILE-PATH]</file>

            ...

            <archive>[FILE-PATH]</archive>

            ...

            <capture-output />

        </java>

        <ok to="[NODE-NAME]"/>

        <error to="[NODE-NAME]"/>

    </action>

    ...

</workflow-app>

官網給出的例子:

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">

    ...

    <action name="myfirstjavajob">

        <java>

            <job-tracker>foo:8021</job-tracker>

            <name-node>bar:8020</name-node>

            <prepare>

                <delete path="${jobOutput}"/>

            </prepare>

            <configuration>

                <property>

                    <name>mapred.queue.name</name>

                    <value>default</value>

                </property>

            </configuration>

            <main-class>org.apache.oozie.MyFirstMainClass</main-class>

            <java-opts>-Dblah</java-opts>

                                     <arg>argument1</arg>

                                     <arg>argument2</arg>

        </java>

        <ok to="myotherjob"/>

        <error to="errorcleanup"/>

    </action>

    ...

</workflow-app>

1.4 shell action

Shell動作可以執行Shell命令,並通過配置命令所需要的參數。

語法:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.4">

 

 ...

 

 <action name="[NODE-NAME]">

 

     <shell xmlns="uri:oozie:shell-action:0.2">

 

         <job-tracker>[JOB-TRACKER]</job-tracker>

 

         <name-node>[NAME-NODE]</name-node>

 

         <prepare>

 

             <delete path="[PATH]" />

 

             ...

 

             <mkdir path="[PATH]" />

 

             ...

 

         </prepare>

 

         <configuration>

 

             <property>

 

                 <name>[PROPERTY-NAME]</name>

 

                 <value>[PROPERTY-VALUE]</value>

 

             </property>

 

             ...

 

         </configuration>

 

         <exec>[SHELL-COMMAND]</exec>

 

         <argument>[ARGUMENT-VALUE]</argument>

 

         <capture-output />

 

     </shell>

 

     <ok to="[NODE-NAME]" />

 

     <error to="[NODE-NAME]" />

 

</action>

 ...

 

</workflow-app>

1.5 Spark action

    Oozie支持Spark action,不過支持的不是特別好。提交spark任務時,需要加載spark-assembly jar。

語法:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.3">

    ...

    <action name="[NODE-NAME]">

        <spark xmlns="uri:oozie:spark-action:0.1">

            <job-tracker>[JOB-TRACKER]</job-tracker>

            <name-node>[NAME-NODE]</name-node>

            <prepare>

               <delete path="[PATH]"/>

               ...

               <mkdir path="[PATH]"/>

               ...

            </prepare>

            <job-xml>[SPARK SETTINGS FILE]</job-xml>

            <configuration>

                <property>

                    <name>[PROPERTY-NAME]</name>

                    <value>[PROPERTY-VALUE]</value>

                </property>

                ...

            </configuration>

            <master>[SPARK MASTER URL]</master>

            <mode>[SPARK MODE]</mode>

            <name>[SPARK JOB NAME]</name>

            <class>[SPARK MAIN CLASS]</class>

            <jar>[SPARK DEPENDENCIES JAR / PYTHON FILE]</jar>

            <spark-opts>[SPARK-OPTIONS]</spark-opts>

            <arg>[ARG-VALUE]</arg>

                ...

            <arg>[ARG-VALUE]</arg>

            ...

        </spark>

        <ok to="[NODE-NAME]"/>

        <error to="[NODE-NAME]"/>

    </action>

    ...

</workflow-app>

官網給的例子:

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">

    ...

    <action name="myfirstsparkjob">

        <spark xmlns="uri:oozie:spark-action:0.1">

            <job-tracker>foo:8021</job-tracker>

            <name-node>bar:8020</name-node>

            <prepare>

                <delete path="${jobOutput}"/>

            </prepare>

            <configuration>

                <property>

                    <name>mapred.compress.map.output</name>

                    <value>true</value>

                </property>

            </configuration>

            <master>local[*]</master>

            <mode>client</mode>

            <name>Spark Example</name>

            <class>org.apache.spark.examples.mllib.JavaALS</class>

            <jar>/lib/spark-examples_2.10-1.1.0.jar</jar>

            <spark-opts>--executor-memory 20G --num-executors 50

             --conf spark.executor.extraJavaOptions="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"</spark-opts>

            <arg>inputpath=hdfs://localhost/input/file.txt</arg>

            <arg>value=2</arg>

        </spark>

        <ok to="myotherjob"/>

        <error to="errorcleanup"/>

    </action>

    ...

</workflow-app>

2.coordinator.xml

語法:

 

<coordinator-app name="[NAME]" frequency="[FREQUENCY]"

                    start="[DATETIME]" end="[DATETIME]" timezone="[TIMEZONE]"

                    xmlns="uri:oozie:coordinator:0.1">   

#frequency:執行頻率,小於五分鍾要修改配置 start,end:開始與結束時間,若想跟北京時間一樣也要修改配置文件,並修改時間格式

 

      <controls>

        <timeout>[TIME_PERIOD]</timeout>

        <concurrency>[CONCURRENCY]</concurrency>

        <execution>[EXECUTION_STRATEGY]</execution>

      </controls>

.

      <datasets>    

        <include>[SHARED_DATASETS]</include>

        ...

.

        <!-- Synchronous datasets --> #---數據生成目錄

        <dataset name="[NAME]" frequency="[FREQUENCY]"

                 initial-instance="[DATETIME]" timezone="[TIMEZONE]">

          <uri-template>[URI_TEMPLATE]</uri-template>

        </dataset>

        ...

.

      </datasets>

.

      <input-events>    #----定義了數據觸發條件

        <data-in name="[NAME]" dataset="[DATASET]">

          <instance>[INSTANCE]</instance>

          ...

        </data-in>

        ...

        <data-in name="[NAME]" dataset="[DATASET]">

          <start-instance>[INSTANCE]</start-instance>

          <end-instance>[INSTANCE]</end-instance>

        </data-in>

        ...

      </input-events>

      <output-events>

         <data-out name="[NAME]" dataset="[DATASET]">

           <instance>[INSTANCE]</instance>

         </data-out>

         ...

      </output-events>

      <action>

        <workflow>

          <app-path>[WF-APPLICATION-PATH]</app-path>    #---workflow.xml所在hdfs目錄

          <configuration>

            <property>    #----定義傳給workflow的參數

              <name>[PROPERTY-NAME]</name>

              <value>[PROPERTY-VALUE]</value>

            </property>

            ...

         </configuration>

       </workflow>

      </action>

   </coordinator-app>

 

官網給出的例子:

 

<coordinator-app name="hello-coord" frequency="${coord:days(1)}"

                    start="2009-01-02T08:00Z" end="2009-01-02T08:00Z"

                    timezone="America/Los_Angeles"

                    xmlns="uri:oozie:coordinator:0.1">

      <datasets>

        <dataset name="logs" frequency="${coord:days(1)}"

                 initial-instance="2009-01-02T08:00Z" timezone="America/Los_Angeles">

          <uri-template>hdfs://bar:8020/app/logs/${YEAR}${MONTH}/${DAY}/data</uri-template>

        </dataset>

        <dataset name="siteAccessStats" frequency="${coord:days(1)}"

                 initial-instance="2009-01-02T08:00Z" timezone="America/Los_Angeles">

          <uri-template>hdfs://bar:8020/app/stats/${YEAR}/${MONTH}/${DAY}/data</uri-template>

        </dataset>

      </datasets>

      <input-events>    

        <data-in name="input" dataset="logs">

          <instance>2009-01-02T08:00Z</instance>

        </data-in>

      </input-events>

      <output-events>

         <data-out name="output" dataset="siteAccessStats">

           <instance>2009-01-02T08:00Z</instance>

         </data-out>

      </output-events>

      <action>

        <workflow>

          <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>   

          <configuration>

            <property>   

              <name>wfInput</name>

              <value>${coord:dataIn('input')}</value>

            </property>

            <property>

              <name>wfOutput</name>

              <value>${coord:dataOut('output')}</value>

            </property>

         </configuration>

       </workflow>

      </action>

   </coordinator-app>

 

 

3.bundle.xml

 

語法:

 

<bundle-app name=[NAME]  xmlns='uri:oozie:bundle:0.1'>

  <controls>

       <kick-off-time>[DATETIME]</kick-off-time>    #運行時間

  </controls>

   <coordinator name=[NAME] >

       <app-path>[COORD-APPLICATION-PATH]</app-path> # coordinator.xml所在目錄

          <configuration>                 #傳給coordinator應用的參數

            <property>

              <name>[PROPERTY-NAME]</name>  

              <value>[PROPERTY-VALUE]</value>

            </property>

            ...

         </configuration>

   </coordinator>

   ...

</bundle-app> 

 

官網給出的例子(綁定兩個coordinator):

 

<bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'>

  <controls>

       <kick-off-time>${kickOffTime}</kick-off-time>

  </controls>

   <coordinator name='coordJobFromBundle1' >

       <app-path>${appPath}</app-path>

       <configuration>

         <property>

              <name>startTime1</name>

              <value>${START_TIME}</value>

          </property>

         <property>

              <name>endTime1</name>

              <value>${END_TIME}</value>

          </property>

      </configuration>

   </coordinator>

   <coordinator name='coordJobFromBundle2' >

       <app-path>${appPath2}</app-path>

       <configuration>

         <property>

              <name>startTime2</name>

              <value>${START_TIME2}</value>

          </property>

         <property>

              <name>endTime2</name>

              <value>${END_TIME2}</value>

          </property>

      </configuration>

   </coordinator>

</bundle-app>

 

4,.job.properties:

 

nameNode               hdfs://xxx:8020    hdfs地址

jobTracker             xxx5:8034          jobTracker 地址

queueName              default            oozie隊列

examplesRoot            examples           全局目錄

oozie.usr.system.libpath    true           是否加載用戶lib庫

oozie.libpath            share/lib/user    用戶lib庫

oozie.wf.appication.path   ${nameNode}/user/${user.name}/... oozie流程所在hdfs地址

 

workflow:oozie.wf.application.path

coordinator:oozie.coord.application.path

bundle:oozie.bundle.application.path

Oozie使用:

寫一個oozie,有兩個是必要的:job.properties 和 workflow.xml(coordinator.xml,bundle.xml)

如果想讓任務可以定時自動運行,那么需要寫coordinator.xml。

如果想綁定多個coordinator.xml,那么需要寫bundle.xml。

Oozie實例:

我們工作時的(簡略版)實例:(本次以spark action為例)

bundle.xml:

 

<bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'

xmlns='uri:oozie:bundle:0.2'>

    <coordinator name='coordJobFromBundle1' >

       <app-path>${appPath}</app-path>  

   </coordinator>

   <coordinator name='coordJobFromBundle2' >

       <app-path>${appPath2}</app-path>

   </coordinator>

 

</bundle-app>

 

coordinator.xml:

 

<coordinator-app name="cron-coord" frequency="${coord:minutes(6)}" start="${start}"

end="${end}" timezone="Asia/Shanghai" xmlns="uri:oozie:coordinator:0.2">

    <action>

        <workflow>

            <app-path>${workflowAppUri}</app-path>

            <configuration>

                <property>

                    <name>jobTracker</name>

                    <value>${jobTracker}</value>

                </property>

                <property>

                    <name>nameNode</name>

                    <value>${nameNode}</value>

                </property>

                <property>

                    <name>queueName</name>

                    <value>${queueName}</value>

                </property>

                <property>

                    <name>mainClass</name>

                    <value>com.ocn.itv.rinse.ErrorCollectRinse</value>

                </property>

                <property>

                    <name>mainClass2</name>

                    <value>com.ocn.itv.rinse.UserCollectRinse</value>

                </property>

                <property>

                    <name>jarName</name>

                    <value>ocn-itv-spark-3.0.3-rc1.jar</value>

                </property>

            </configuration>

        </workflow>

    </action>

</coordinator-app>

 

workflow.xml:

 

<workflow-app  name="spark-example1" xmlns="uri:oozie:workflow:0.5"> 

    <start to="forking"/>

    <fork name="forking">

        <path start="firstparalleljob"/>

        <path start="secondparalleljob"/>

    </fork>   

    <action name="firstparalleljob">

        <spark xmlns="uri:oozie:spark-action:0.2"> 

            <job-tracker>${jobTracker}</job-tracker> 

            <name-node>${nameNode}</name-node>

            <configuration> 

                <property> 

                    <name>mapred.job.queue.name</name> 

                    <value>${queueName}</value> 

                </property>                 

            </configuration>           

            <master>yarn-cluster</master>

            <mode>cluster</mode>

            <name>Spark Example</name>

            <class>${mainClass}</class>           

            <jar>${jarName}</jar>

            <spark-opts>${sparkopts}</spark-opts>

            <arg>${input}</arg>           

        </spark >  

        <ok to="joining"/>

        <error to="fail"/>   

    </action>

    <action name="secondparalleljob">

         <spark xmlns="uri:oozie:spark-action:0.2"> 

            <job-tracker>${jobTracker}</job-tracker> 

            <name-node>${nameNode}</name-node>

            <configuration> 

                <property> 

                    <name>mapred.job.queue.name</name> 

                    <value>${queueName}</value> 

                </property>                 

            </configuration>           

            <master>yarn-cluster</master>

            <mode>cluster</mode>

            <name>Spark Example2</name>

            <class>${mainClass2}</class>           

            <jar>${jarName}</jar>

            <spark-opts>${sparkopts}</spark-opts>

            <arg>${input}</arg>           

        </spark > 

        <ok to="joining"/>

        <error to="fail"/>   

    </action>  

    <join name="joining" to="end"/>

      <kill name="fail"> 

       <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 

    </kill> 

   <end name="end"/> 

</workflow-app>

 

job.properties

 

nameNode=hdfs://hgdp-001:8020     #hsfs端口地址

jobTracker=hgdp-001:8032        #resourceManager的端口

queueName=default            #oozie隊列

input=2017-05-09             #輸入參數

hdfspath=user/root           #自定義目錄

examplesRoot=ocn-itv-oozie      #自定義全局目錄

oozie.use.system.libpath=True    #是否啟動系統lib庫

sparkopts=--executor-memory 1G    #參數設置

start=2017-09-04T00:05+0800    #coordinator任務開始時間

end=2017-09-04T00:36+0800      #coordinator任務結束時間

start2=2017-09-01T00:06+0800

end2=2017-09-04T00:36+0800

oozie.libpath=${nameNode}/${hdfspath}/${examplesRoot}/lib/          #用戶自定義lib庫(存放jar包)

workflowAppUri=${nameNode}/${hdfspath}/${examplesRoot}/wf/spark/fork/

workflowAppUri2=${nameNode}/${hdfspath}/${examplesRoot}/wf/spark/single/  #coordinator定時調度對應的workflow.xml所在目錄

appPath=${nameNode}/${hdfspath}/${examplesRoot}/cd/single/

appPath2=${nameNode}/${hdfspath}/${examplesRoot}/cd/single1/        #bundle調用對應的coordinator.xml所在目錄

oozie.bundle.application.path=${nameNode}/${hdfspath}/${examplesRoot}/bd/bd1/    #bundle.xml所在目錄

#一個bundle調用多個coordinator

 

 

 

最后運行:

 

  啟動任務:oozie job -config  job.properties  -run  -oozie http://192.168.2.11 (地址):11000/oozie

 

 

 需要注意的地方:

一.  coordinator中timezone的時區配置

Cloudera oozie默認時區是UTC,在開發oozie任務時必須在期望執行的時間上減去8小時,不方便。可以修改時區的配置操作。

1.在oozie的配置文件中添加如下屬性:

<property>

 <name>oozie.processing.timezone</name>

 <value>GMT+0800</value>

</property>

2.如果使用了hue,進入Oozie web ui,選擇Settings,然后在Timezone里選擇CST(Asia/Shanghai)

3.coordinator中的timeone設置為:timezone="Asia/Shanghai"

4.修改時間格式,例如:2017-09-05T15:16+0800

二.oozie.xx.application.path

oozie.xx.application.path在job.properties里只能有一個。

workflow:oozie.wf.application.path

coordinator:oozie.coord.application.path

bundle:oozie.bundle.application.path

三.命名及存放位置問題

其中workflow.xml,coordinator.xml,bundle.xml名字都不可以修改,要放到hdfs目錄中,而job.properties名字可以修改,放在本地即可。

四.關於workflow.xml 中action的問題:

可以寫多個action依次執行,如下示例所示:

 

 

<workflow-app  name="java-example1" xmlns="uri:oozie:workflow:0.5"> 

    <start to="java-Action"/> 

    <action name="java-Action">

     ....

        <ok to="java-Action2"/>

        <error to="fail"/>   

    </action>

    <action name="java-Action2">

       ....

        <ok to="end"/>

        <error to="fail"/>   

    </action>  

      <kill name="fail"> 

       <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 

    </kill> 

   <end name="end"/> 

</workflow-app>

 

 

也可以設置多個任務並發執行,需要添加fork和join節點,fork節點把任務切分成多個並行任務,join則合並多個並行任務。fork和join節點必須是成對出現的。join節點合並的任務,必須是通一個fork出來的子任務才行。示例如下:

 

<workflow-app  name="java-example1" xmlns="uri:oozie:workflow:0.5"> 

    <start to="forking"/>

    <fork name="forking">

        <path start="firstparalleljob"/>

        <path start="secondparalleljob"/>

    </fork>   

    <action name="firstparalleljob">

            .....

        <ok to="joining"/>

        <error to="fail"/>    

    </action>

    <action name="secondparalleljob">

            ....

        <ok to="joining"/>

        <error to="fail"/>   

    </action>  

    <join name="joining" to="end"/>

      <kill name="fail"> 

       <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 

    </kill> 

   <end name="end"/> 

</workflow-app>

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM