Oozie分布式任務的工作流——郵件篇


在大數據的當下,各種spark和hadoop的框架層出不窮。各種高端的計算框架,分布式任務如亂花般迷眼。你是否有這種困惑!——有了許多的分布式任務,但是每天需要固定時間跑任務,自己寫個調度,既不穩定,又沒有可靠的通知。

想要了解Oozie的基礎知識,可以參考這里

那么你應該是在找——Oozie。

Oozie是一款支持分布式任務調度的開源框架,它支持很多的分布式任務,比如map reduce,spark,sqoop,pig甚至shell等等。你可以以各種方式調度它們,把它們組成工作流。每個工作流節點可以串行也可以並行執行。

如果你定義好了一系列的任務,就可以開啟工作流,設置一個coordinator調度器進行定時的調度了。

有了這些工作以后,還需要一個很重要的環節—— 就是郵件提醒。不管是任務執行成功還是失敗,都可以發送郵件提醒。這樣每天晚上收到任務成功的消息,就可以安心睡覺了。

因此,本篇就帶你來看看如何在Oozie中使用Email。

郵箱服務

Email Action

在Oozie中每個工作流的環節都被設計成一個Action,email就是其中的一個Action.

Email action可以在oozie中發送信息,在email action中必須指定接收的地址,主題subject和內容body。在接收地址參數中支持使用逗號分隔,添加多個郵箱地址。

email action是同步執行的,因此必須等到郵件發出后,這個action才算完成,才能執行下一個action。

email action里面的所有參數都可以使用EL表達式。

語法規則

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="[NODE-NAME]">
        <email xmlns="uri:oozie:email-action:0.2">
            <to>[COMMA-SEPARATED-TO-ADDRESSES]</to>
            <cc>[COMMA-SEPARATED-CC-ADDRESSES]</cc> <!-- cc is optional -->
            <subject>[SUBJECT]</subject>
            <body>[BODY]</body>
            <content_type>[CONTENT-TYPE]</content_type> <!-- content_type is optional -->
            <attachment>[COMMA-SEPARATED-HDFS-FILE-PATHS]</attachment> <!-- attachment is optional -->
        </email>
        <ok to="[NODE-NAME]"/>
        <error to="[NODE-NAME]"/>
    </action>
    ...
</workflow-app>

to和cc命令指定了誰來接收郵件。可以通過逗號分隔來指定多個郵箱地址。to是必填項,cc是可選的。

主題subject和正文body用於指定郵件的標題和正文,email-action:0.2支持text/html這種格式的正文,默認是普通的文本"text/plain"

attachment用於在郵件中添加一個hdfs文件的附件,也可以通過逗號分隔符指定多個附件。如果路徑聲明的不全,那么也會被當做hdfs中的文件。本地文件是不能添加到附件中的。

配置

email action需要在oozie-site.xml中配置SMTP服務器配置。下面是需要配置的值:

oozie.email.smtp.host

這個值是SMTP服務器的地址,默認是loalhost

oozie.email.smtp.port

是SMTP服務器的端口號,默認是25.

oozie.email.from.address

發送郵件的地址,默認是oozie@localhost

oozie.email.smtp.auth

是否開啟認證,默認不開啟

oozie.email.smtp.username

如果開啟認證,登錄的用戶名,默認是空

oozie.email.smtp.password

如果開啟認證,用戶對應的密碼,默認是空

PS. 在linux可以通過find -name oozie-site.xml在當前目錄下查找。在我們的CDH版本中這個文件在./etc/oozie/conf.dist/oozie-site.xml

遇到的問題

很多人會遇到無法發郵件的問題,首先是要開啟SMTP服務,查看是否開啟可以使用telnet localhost 25

另外,如果使用的是企業郵箱,需要注意發件人的格式,必須符合企業郵箱的設置。並且收件人只能是企業郵箱的地址。

在Cloudera Mnager中的配置如下圖:

樣例

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="an-email">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>bob@initech.com,the.other.bob@initech.com</to>
            <cc>will@initech.com</cc>
            <subject>Email notifications for ${wf:id()}</subject>
            <body>The wf ${wf:id()} successfully completed.</body>
        </email>
        <ok to="myotherjob"/>
        <error to="errorcleanup"/>
    </action>
    ...
</workflow-app>

上面的例子中,郵件發給了bob,the.other.bob以及抄送給will,並指定了郵件的標題和正文以及workflow的id。

附錄

為了更多的了解Oozie,這里直接給出了Oozie相關的重要配置

oozie-site.xml配置

<?xml version="1.0"?>
<configuration>
    <!--oozie-default.xml文件是默認的配置-->
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
        <value>*</value>
    </property>
</configuration>

oozie-defualt.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<configuration>

    <!-- ************************** VERY IMPORTANT  ************************** -->
    <!-- This file is in the Oozie configuration directory only for reference. -->
    <!-- It is not loaded by Oozie, Oozie uses its own privatecopy.            -->
    <!-- ************************** VERY IMPORTANT  ************************** -->

    <property>
        <name>oozie.output.compression.codec</name>
        <value>gz</value>
        <description>
            The name of the compression codec to use.
            where codec class implements the interface org.apache.oozie.compression.CompressionCodec.
            If oozie.compression.codecs is not specified, gz codec implementation is used by default.
        </description>
    </property>

    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
        <value>false</value>
        <description>
            which specify the oozie.mapreduce.uber.jar configuration property will fail.
        </description>
    </property>

    <property>
        <name>oozie.processing.timezone</name>
        <value>UTC</value>
        <description>
            is changed, note that GMT(+/-)#### timezones do not observe DST changes.
        </description>
    </property>

    <!-- Base Oozie URL: <SCHEME>://<HOST>:<PORT>/<CONTEXT> -->

    <property>
        <name>oozie.base.url</name>
        <value>http://localhost:8080/oozie</value>
        <description>
             Base Oozie URL.
        </description>
    </property>

    <!-- Services -->

    <property>
        <name>oozie.system.id</name>
        <value>oozie-${user.name}</value>
        <description>
            The Oozie system ID.
        </description>
    </property>

    <property>
        <name>oozie.systemmode</name>
        <value>NORMAL</value>
        <description>
            System mode for  Oozie at startup.
        </description>
    </property>

    <property>
        <name>oozie.delete.runtime.dir.on.shutdown</name>
        <value>true</value>
        <description>
            If the runtime directory should be kept after Oozie shutdowns down.
        </description>
    </property>

    <property>
        <name>oozie.services</name>
        <value>
            org.apache.oozie.service.SchedulerService,
            org.apache.oozie.service.InstrumentationService,
            org.apache.oozie.service.MemoryLocksService,
            org.apache.oozie.service.UUIDService,
            org.apache.oozie.service.ELService,
            org.apache.oozie.service.AuthorizationService,
            org.apache.oozie.service.UserGroupInformationService,
            org.apache.oozie.service.HadoopAccessorService,
/email
            IMPORTANT: if the StoreServicePasswordService is active, it will reset this value with the
value given in
                       the console.
        </description>
    </property>

    <property>
        <name>oozie.service.JPAService.pool.max.active.conn</name>
        <value>10</value>
        <description>
             Max number of connections.
        </description>
    </property>

   <!-- SchemaService -->

    <property>
        <name>oozie.service.SchemaService.wf.schemas</name>
        <value>
            oozie-workflow-0.1.xsd,oozie-workflow-0.2.xsd,oozie-workflow-0.2.5.xsd,oozie-workflow-0.3.x
sd,oozie-workflow-0.4.xsd,
            oozie-workflow-0.4.5.xsd,oozie-workflow-0.5.xsd,
            shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,
            email-action-0.1.xsd,email-action-0.2.xsd,
            hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,hive-action
-0.6.xsd,
            sqoop-action-0.2.xsd,sqoop-action-0.3.xsd,sqoop-action-0.4.xsd,
            ssh-action-0.1.xsd,ssh-action-0.2.xsd,
            distcp-action-0.1.xsd,distcp-action-0.2.xsd,
            oozie-sla-0.1.xsd,oozie-sla-0.2.xsd,
            hive2-action-0.1.xsd, hive2-action-0.2.xsd,
            spark-action-0.1.xsd,spark-action-0.2.xsd
        </value>
        <description>
            List of schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.wf.ext.schemas</name>
        <value> </value>
        <description>
            List of additional schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.coord.schemas</name>
/email
        <description>
             Base console URL for a workflow job.
        </description>
    </property>


    <!-- ActionService -->

    <property>
        <name>oozie.service.ActionService.executor.classes</name>
        <value>
            org.apache.oozie.action.decision.DecisionActionExecutor,
            org.apache.oozie.action.hadoop.JavaActionExecutor,
            org.apache.oozie.action.hadoop.FsActionExecutor,
            org.apache.oozie.action.hadoop.MapReduceActionExecutor,
            org.apache.oozie.action.hadoop.PigActionExecutor,
            org.apache.oozie.action.hadoop.HiveActionExecutor,
            org.apache.oozie.action.hadoop.ShellActionExecutor,
            org.apache.oozie.action.hadoop.SqoopActionExecutor,
            org.apache.oozie.action.hadoop.DistcpActionExecutor,
            org.apache.oozie.action.hadoop.Hive2ActionExecutor,
            org.apache.oozie.action.ssh.SshActionExecutor,
            org.apache.oozie.action.oozie.SubWorkflowActionExecutor,
            org.apache.oozie.action.email.EmailActionExecutor,
            org.apache.oozie.action.hadoop.SparkActionExecutor
        </value>
        <description>
            List of ActionExecutors classes (separated by commas).
            Only action types with associated executors can be used in workflows.
        </description>
    </property>

    <property>
        <name>oozie.service.ActionService.executor.ext.classes</name>
        <value> </value>
        <description>
            List of ActionExecutors extension classes (separated by commas). Only action types with ass
ociated
            executors can be used in workflows. This property is a convenience property to add extensio
ns to the built
            in executors without having to include all the built in ones.
        </description>
    </property>

    <!-- ActionCheckerService -->

    <property>
        <name>oozie.service.ActionCheckerService.action.check.interval</name>
/email
        <description>
            Comma separated AUTHORITY=SPARK_CONF_DIR, where AUTHORITY is the HOST:PORT of
            the ResourceManager of a YARN cluster. The wildcard '*' configuration is
            used when there is no exact match for an authority. The SPARK_CONF_DIR contains
            the relevant spark-defaults.conf properties file. If the path is relative is looked within
            the Oozie configuration directory; though the path can be absolute.  This is only used
            when the Spark master is set to either "yarn-client" or "yarn-cluster".
        </description>
    </property>

    <property>
        <name>oozie.service.SparkConfigurationService.spark.configurations.ignore.spark.yarn.jar</name>
        <value>true</value>
        <description>
            If true, Oozie will ignore the "spark.yarn.jar" property from any Spark configurations spec
ified in
            oozie.service.SparkConfigurationService.spark.configurations.  If false, Oozie will not ign
ore it.  It is recommended
            to leave this as true because it can interfere with the jars in the Spark sharelib.
        </description>
    </property>

    <property>
        <name>oozie.email.attachment.enabled</name>
        <value>true</value>
        <description>
            This value determines whether to support email attachment of a file on HDFS.
            Set it false if there is any security concern.
        </description>
    </property>

    <property>
        <name>oozie.actions.default.name-node</name>
        <value> </value>
        <description>
            The default value to use for the &lt;name-node&gt; element in applicable action types.  Thi
s value will be used when
            neither the action itself nor the global section specifies a &lt;name-node&gt;.  As expecte
d, it should be of the form
            "hdfs://HOST:PORT".
        </description>
    </property>

    <property>
        <name>oozie.actions.default.job-tracker</name>
        <value> </value>
        <description>
@                                                                                                      
search hit BOTTOM, continuing at TOP
            IMPORTANT: if the StoreServicePasswordService is active, it will reset this value with the
value given in
                       the console.
        </description>
    </property>

    <property>
        <name>oozie.service.JPAService.pool.max.active.conn</name>
        <value>10</value>
        <description>
             Max number of connections.
        </description>
    </property>

   <!-- SchemaService -->

    <property>
        <name>oozie.service.SchemaService.wf.schemas</name>
        <value>
            oozie-workflow-0.1.xsd,oozie-workflow-0.2.xsd,oozie-workflow-0.2.5.xsd,oozie-workflow-0.3.x
sd,oozie-workflow-0.4.xsd,
            oozie-workflow-0.4.5.xsd,oozie-workflow-0.5.xsd,
            shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,
            email-action-0.1.xsd,email-action-0.2.xsd,
            hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,hive-action
-0.6.xsd,
            sqoop-action-0.2.xsd,sqoop-action-0.3.xsd,sqoop-action-0.4.xsd,
            ssh-action-0.1.xsd,ssh-action-0.2.xsd,
            distcp-action-0.1.xsd,distcp-action-0.2.xsd,
            oozie-sla-0.1.xsd,oozie-sla-0.2.xsd,
            hive2-action-0.1.xsd, hive2-action-0.2.xsd,
            spark-action-0.1.xsd,spark-action-0.2.xsd
        </value>
        <description>
            List of schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.wf.ext.schemas</name>
        <value> </value>
        <description>
            List of additional schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.coord.schemas</name>
/email
        <description>
             Base console URL for a workflow job.
        </description>
    </property>


    <!-- ActionService -->

    <property>
        <name>oozie.service.ActionService.executor.classes</name>
        <value>
            org.apache.oozie.action.decision.DecisionActionExecutor,
            org.apache.oozie.action.hadoop.JavaActionExecutor,
            org.apache.oozie.action.hadoop.FsActionExecutor,
            org.apache.oozie.action.hadoop.MapReduceActionExecutor,
            org.apache.oozie.action.hadoop.PigActionExecutor,
            org.apache.oozie.action.hadoop.HiveActionExecutor,
            org.apache.oozie.action.hadoop.ShellActionExecutor,
            org.apache.oozie.action.hadoop.SqoopActionExecutor,
            org.apache.oozie.action.hadoop.DistcpActionExecutor,
            org.apache.oozie.action.hadoop.Hive2ActionExecutor,
            org.apache.oozie.action.ssh.SshActionExecutor,
            org.apache.oozie.action.oozie.SubWorkflowActionExecutor,
            org.apache.oozie.action.email.EmailActionExecutor,
            org.apache.oozie.action.hadoop.SparkActionExecutor
        </value>
        <description>
            List of ActionExecutors classes (separated by commas).
            Only action types with associated executors can be used in workflows.
        </description>
    </property>

    <property>
        <name>oozie.service.ActionService.executor.ext.classes</name>
        <value> </value>
        <description>
            List of ActionExecutors extension classes (separated by commas). Only action types with ass
ociated
            executors can be used in workflows. This property is a convenience property to add extensio
ns to the built
            in executors without having to include all the built in ones.
        </description>
    </property>

    <!-- ActionCheckerService -->

    <property>
        <name>oozie.service.ActionCheckerService.action.check.interval</name>
/email
        <description>
            used when there is no exact match for an authority. The SPARK_CONF_DIR contains
            the relevant spark-defaults.conf properties file. If the path is relative is looked within
            the Oozie configuration directory; though the path can be absolute.  This is only used
            when the Spark master is set to either "yarn-client" or "yarn-cluster".
        </description>
    </property>

    <property>
        <name>oozie.service.SparkConfigurationService.spark.configurations.ignore.spark.yarn.jar</name>
        <value>true</value>
        <description>
            If true, Oozie will ignore the "spark.yarn.jar" property from any Spark configurations spec
ified in
            oozie.service.SparkConfigurationService.spark.configurations.  If false, Oozie will not ign
ore it.  It is recommended
            to leave this as true because it can interfere with the jars in the Spark sharelib.
        </description>
    </property>

    <property>
        <name>oozie.email.attachment.enabled</name>
        <value>true</value>
        <description>
            This value determines whether to support email attachment of a file on HDFS.
            Set it false if there is any security concern.
        </description>
    </property>

    <property>
        <name>oozie.actions.default.name-node</name>
        <value> </value>
        <description>
            The default value to use for the &lt;name-node&gt; element in applicable action types.  Thi
s value will be used when
            neither the action itself nor the global section specifies a &lt;name-node&gt;.  As expecte
d, it should be of the form
            "hdfs://HOST:PORT".
        </description>
    </property>

    <property>
        <name>oozie.actions.default.job-tracker</name>
        <value> </value>
        <description>
            The default value to use for the &lt;job-tracker&gt; element in applicable action types.  T
his value will be used when
            neither the action itself nor the global section specifies a &lt;job-tracker&gt;.  As expec
ted, it should be of the form
            "HOST:PORT".
        </description>
    </property>

</configuration>


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM