DataStage 錯誤集(持續更新)


DataStage 錯誤集(持續更新)

DataStage序列文章

DataStage 一、安裝
DataStage 二、InfoSphere Information Server進程的啟動和停止
DataStage 三、配置ODBC

1 執行dsadmin命令時報錯

$ dsadmin
exec(): 0509-036 Cannot load program dsadmin because of the following errors:
        0509-022 Cannot load module /opt/IBM/InformationServer/Server/DSEngine/lib/libvmdsapi.so.
        0509-150   Dependent module libACS_client_cpp.a(shr.so) could not be loaded.
        0509-022 Cannot load module libACS_client_cpp.a(shr.so).
        0509-026 System error: A file or directory in the path name does not exist.
        0509-022 Cannot load module dsadmin.
        0509-150   Dependent module /opt/IBM/InformationServer/Server/DSEngine/lib/libvmdsapi.so could not be loaded.
        0509-022 Cannot load module .

1.1 錯誤描述

在AIX6.0命令行上執行dsadmin命令時報錯無法加載相關聯的.so文件,當時DS環境變量已設置


#DataStage
export DSHOME=/opt/IBM/InformationServer/Server/DSEngine
#parallel engine
export APT_ORCHHOME=/opt/IBM/InformationServer/Server/PXEngine
#parallel engine
export APT_CONFIG_FILE=/opt/IBM/InformationServer/Server/Configurations/default.apt
export PATH=$PATH:$DSHOME/bin:$APT_ORCHHOME/bin
#AIX LIBPATH,linux LD_LIBRARY_PATH
export LIBPATH=$LIBPATH:$DSHOME/lib:$APT_ORCHHOME/lib
#ASBHome
export ASBHOME=/opt/IBM/InformationServer/ASBNode
#environment
$DSHOME/dsenv

使用ldd檢查報如下錯誤

$ ldd /opt/IBM/InformationServer/Server/DSEngine/lib/libvmdsapi.so
/opt/IBM/InformationServer/Server/DSEngine/lib/libvmdsapi.so needs:
         /lib/libc.a(shr_64.o)
         /lib/libpthread.a(shr_xpg5_64.o)
Cannot find libACS_client_cpp.a(shr.so) 
Cannot find libACS_common_cpp.a(shr.so) 
Cannot find libinvocation_cpp.a(shr.so) 
Cannot find libxmogrt-xlC6.a 
Cannot find libIISCrypto.so 
         /lib/libC.a(shr_64.o)
         /lib/libC.a(ansi_64.o)
         /unix
         /lib/libcrypt.a(shr_64.o)
         /lib/libC.a(ansicore_64.o)
         /lib/libC.a(shrcore_64.o)
         /lib/libC.a(shr3_64.o)
         /lib/libC.a(shr2_64.o)

找不到相關的庫,但在某個子目錄下發現有這些文件存在

$ ls -l /opt/IBM/InformationServer/ASBNode/lib/cpp/         
-rwxr-xr-x    1 root     system      4117562 Nov 09 2013  libACS_client_cpp.a
-rwxr-xr-x    1 root     system     54572316 Nov 09 2013  libACS_common_cpp.a
-rwxr-xr-x    1 root     system      2010742 Nov 09 2013  libASB_agent_config_client_cpp.a
-rwxr-xr-x    1 root     system     64048316 Nov 09 2013  libinvocation_cpp.a

在命令行中輸出某些dsenv文件里面的環境變量值時沒有任何輸出。

1.2 解決方法

依據上面的錯誤判定是環境配置問題,在文檔的介紹中$DSHOME/dsenv 是個非常重要的文件,在profile要引用,可是我已經引用了,只是沒有生效,原因就在於沒有正確引用,再次檢查dsenv文件的引用時發現少了前綴".";

#environment
$DSHOME/dsenv

把它改寫為

#environment
. $DSHOME/dsenv

那怎么知道環境變量是否生效呢?簡單的方法就是查詢當前的環境是否有UDTHOME和UDTBIN兩個變量設置,這兩個變量在8.5、8.7、9.1的dsenv中都是有設置的。

#if [ -z "$UDTHOME" ]
#then
UDTHOME=/opt/IBM/InformationServer/Server/DSEngine/ud41 ; export UDTHOME
UDTBIN=/opt/IBM/InformationServer/Server/DSEngine/ud41/bin ; export UDTBIN
#fi

2 關閉WAS時報錯

/opt/IBM/InformationServer/ASBServer/bin/MetadataServer.sh  stop
ADMU0116I: Tool information is being logged in file
           /opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1/stopServer.log
ADMU0128I: Starting tool with the InfoSphere profile
ADMU3100I: Reading configuration for server: server1

ADMU0509I: The server "server1" cannot be reached. It appears to be stopped.
ADMU0211I: Error details may be seen in the file:
           /opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1/stopServer.log

2.1 錯誤描述

在關閉WAS時無法關閉Application server,查看日志文件

FFDC Incident emitted on /opt/IBM/WebSphere/AppServer/bin/./client_ffdc/ffdc.4012701407048567577.txt com.ibm.websphere.
management.AdminClientFactory.createAdminClient 275
[1/21/15 10:09:16:236 GMT+08:00] 00000001 WsServerStop  E   ADMU3002E: Exception attempting to process server server1
[1/21/15 10:09:16:236 GMT+08:00] 00000001 WsServerStop  E   ADMU3007E: Exception com.ibm.websphere.management.exception.Conne
ctorException: com.ibm.websphere.management.exception.ConnectorException: ADMC0016E: The system cannot create a SOAP connecto
r to connect to host nhdbtest07 at port 8881.
[1/21/15 10:09:16:237 GMT+08:00] 00000001 WsServerStop  A   ADMU3007E: Exception com.ibm.websphere.management.exception.Conne
ctorException: com.ibm.websphere.management.exception.ConnectorException: ADMC0016E: The system cannot create a SOAP connecto
r to connect to host nhdbtest07 at port 8881.
        at com.ibm.ws.management.connector.ConnectorHelper.createConnector(ConnectorHelper.java:606)
        at com.ibm.ws.management.tools.WsServerStop.runTool(WsServerStop.java:372)
        at com.ibm.ws.management.tools.AdminTool.executeUtility(AdminTool.java:269)
        at com.ibm.ws.management.tools.WsServerStop.main(WsServerStop.java:112)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at com.ibm.wsspi.bootstrap.WSLauncher.launchMain(WSLauncher.java:234)
        at com.ibm.wsspi.bootstrap.WSLauncher.main(WSLauncher.java:95)
        at com.ibm.wsspi.bootstrap.WSLauncher.run(WSLauncher.java:76)

        java.security.cert.CertificateExpiredException: NotAfter: Wed Sep 09 10:51:29 GMT+08:00 2015]
        at com.ibm.ws.management.connector.soap.SOAPConnectorClient.reconnect(SOAPConnectorClient.java:422)
        at com.ibm.ws.management.connector.soap.SOAPConnectorClient.<init>(SOAPConnectorClient.java:222)
        ... 40 more
Caused by: [SOAPException: faultCode=SOAP-ENV:Client; msg=Error opening socket: javax.net.ssl.SSLHandshakeException: com.ibm.
jsse2.util.h: PKIX path validation failed: java.security.cert.CertPathValidatorException: The certificate expired at Wed Sep
09 10:51:29 GMT+08:00 2015; internal cause is:
        java.security.cert.CertificateExpiredException: NotAfter: Wed Sep 09 10:51:29 GMT+08:00 2015; targetException=java.la
ng.IllegalArgumentException: Error opening socket: javax.net.ssl.SSLHandshakeException: com.ibm.jsse2.util.h: PKIX path valid
ation failed: java.security.cert.CertPathValidatorException: The certificate expired at Wed Sep 09 10:51:29 GMT+08:00 2015; i
nternal cause is:
        java.security.cert.CertificateExpiredException: NotAfter: Wed Sep 09 10:51:29 GMT+08:00 2015]
        at org.apache.soap.transport.http.SOAPHTTPConnection.send(SOAPHTTPConnection.java:475)
        at org.apache.soap.rpc.Call.WASinvoke(Call.java:451)
        at com.ibm.ws.management.connector.soap.SOAPConnectorClient$4.run(SOAPConnectorClient.java:372)
        at com.ibm.ws.security.util.AccessController.doPrivileged(AccessController.java:118)
        at com.ibm.ws.management.connector.soap.SOAPConnectorClient.reconnect(SOAPConnectorClient.java:365)
        ... 41 more

[10/9/15 20:09:02:685 GMT+08:00] 00000001 AdminTool     A   ADMU0509I: The server "server1" cannot be reached. It appears to
be stopped.
[10/9/15 20:09:02:685 GMT+08:00] 00000001 AdminTool     A   ADMU0211I: Error details may be seen in the file: /opt/IBM/W
ebSphere/AppServer/profiles/InfoSphere/logs/server1/stopServer.log

2.3 解決辦法

進入項目server的目錄,通常路徑是這樣的:/opt/IBM/WebSphere/AppServer/profiles/InfoSphere/bin,然后在/opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1目錄下查看SystemErr.log、SystemOut.log等日志;注意路徑可能不同,使用命令時會有日志路徑輸出。遇到過的錯誤有

java.sql.SQLException: [IBM][Oracle JDBC Driver][Oracle]ORA-28001:
 the password has expired

3 導入表定義信息時報錯

 An unexpected exception occurred accessing the repository: 
 <JavaException>
  <Type>com/ascential/asb/cas/shared/ConnectorServiceException</Type>
  <Message><![CDATA[An unexpected exception occurred accessing the repository: ]]></Message>
  <StackTrace><![CDATA[com.ascential.asb.cas.shared.ConnectorServiceException: An unexpected exception occurred accessing the repository: at com.ascential.asb.cas.shared.ConnectorAccessServiceBeanSupport.persist(ConnectorAccessServiceBeanSupport.java:5345) at com.ascential.asb.cas.shared.ConnectorAccessServiceBeanSupport.discoverSchema(ConnectorAccessServiceBeanSupport.java:3549) at com.ascential.asb.cas.service.impl.ConnectorAccessServiceBean.discoverSchema(ConnectorAccessServiceBean.java:3177) at com.ascential.asb.cas.service.EJSRemoteStatelessConnectorAccess_6ccddb18.discoverSchema(Unknown Source) at com.ascential.asb.cas.service._EJSRemoteStatelessConnectorAccess_6ccddb18_Tie.discoverSchema__com_ascential_asb_cas_shared_util_ConnectionHandle__com_ascential_xmeta_emf_util_EObjectMemento__CORBA_WStringValue__boolean__boolean__boolean__boolean__CORBA_WStringValue(_EJSRemoteStatelessConnectorAccess_6ccddb18_Tie.java:820) at com.ascential.asb.cas.service._EJSRemoteStatelessConnectorAccess_6ccddb18_Tie._invoke(_EJSRemoteStatelessConnectorAccess_6ccddb18_Tie.java:355) at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:669) at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:523) at com.ibm.rmi.iiop.ORB.process(ORB.java:523) at com.ibm.CORBA.iiop.ORB.process(ORB.java:1575) at com.ibm.rmi.iiop.Connection.doRequestWork(Connection.java:2992) at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2875) at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:64) at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1783) 

3.1 錯誤描述

在ds項目中導入Oracle表定義信息時報錯(注意不是使用ODBC導入),ODBC此時可以導入,檢查ds日志沒有任何錯誤,最后ConnectorServiceException的信息,官方文檔中給出的解釋是在嘗試導入表定義信息時操作的用戶沒有足夠的權限執行該操作。

3.2 解決方法

導致這個錯誤發生的原因是當前用戶沒有足夠的權限;登錄到console(http://hostname:9080/ibm/iis/console),單擊Administrator選項,選擇Users and Groups=>Users,然后在右中部選擇用戶,單擊右邊的Open User選項,授權Common Metadata Administrator和Common Metadata Importer權限給該用戶並保存,然后客戶端重新登錄即可。

4 編譯JOB報錯

4.1 編譯JOB報錯一、缺少編譯器組件

Output from transformer compilation follows:
##I IIS-DSEE-TFCN-00001 16:58:21(000) <main_program> 
IBM InfoSphere DataStage Enterprise Edition 9.1.0.6791 
Copyright (c) 2001, 2005-2012 IBM Corporation. All rights reserved

##I IIS-DSEE-TFCN-00006 16:58:21(001) <main_program> conductor uname: -s=AIX; -r=1; -v=6; -n=nhdbtest07; -m=00F725214C00
##I IIS-DSEE-TOSH-00002 16:58:21(002) <main_program> orchgeneral: loaded
##I IIS-DSEE-TOSH-00002 16:58:21(003) <main_program> orchsort: loaded
##I IIS-DSEE-TOSH-00002 16:58:21(004) <main_program> orchstats: loaded
##W IIS-DSEE-TOSH-00049 16:58:21(007) <main_program> Parameter specified but not used in flow: DSPXWorkingDir
##E IIS-DSEE-TBLD-00076 16:58:21(009) <main_program> Error when checking composite operator: Subprocess command failed with exit status 32,256.
##E IIS-DSEE-TFSR-00019 16:58:21(010) <main_program> Could not check all operators because of previous error(s)
##W IIS-DSEE-TFTM-00012 16:58:21(011) <transform> Error when checking composite operator: The number of reject datasets "0" is less than the number of input datasets "1".
##W IIS-DSEE-TBLD-00000 16:58:21(012) <main_program> Error when checking composite operator: Output from subprocess: sh: /usr/vacpp/bin/xlC_r:  not found.

##I IIS-DSEE-TBLD-00079 16:58:21(013) <transform> Error when checking composite operator: /usr/vacpp/bin/xlC_r   -O   -I/opt/IBM/InformationServer/Server/PXEngine/include -O -q64 -qtbtable=full -c /opt/IBM/dsprojects/dstest/RT_BP7.O/V0S9_JoinDataFromTabToTable_Tran_Joined.C -o /opt/IBM/dsprojects/dstest/RT_BP7.O/V0S9_JoinDataFromTabToTable_Tran_Joined.tmp.o.
##E IIS-DSEE-TCOS-00029 16:58:21(014) <main_program> Creation of a step finished with status = FAILED. (JoinDataFromTabToTable.Tran_Joined)

*** Internal Generated Transformer Code follows:
0001: //
0002: // Generated file to implement the V0S9_JoinDataFromTabToTable_Tran_Joined transform operator.
0003: //
0004: 
0005: // define our input/output link names
0006: inputname 0 DSLink15;
0007: outputname 0 Select_tran;
0008: 
0009: initialize {
0010:   // define our control variables
0011:   int8 RowRejected0;
0012:   int8 NullSetVar0;
0013: 
0014: }
0015: 
0016: mainloop {
0017: 
0018:   // initialise the rejected row variable
0019:   RowRejected0 = 1;
0020: 
0021:   // evaluate columns (no constraints) for link: Select_tran
0022:   Select_tran.OBJECT_ID = DSLink15.DATA_OBJECT_ID;
0023:   writerecord 0;
0024:   RowRejected0 = 0;
0025: }
0026: 
0027: finish {
0028: }
0029: 
*** End of Internal Generated Transformer Code

4.1 錯誤描述

在AIX6.0的DS上編譯一個含有Transformer stage的parallel job時在Transformer stage上發生了改錯誤,官網的解釋是機器上沒有安裝XLC編輯器,當時檢查安裝表情況輸出如下

$lslpp -l |grep -i xlC
  xlC.aix61.rte             11.1.0.1  COMMITTED  XL C/C++ Runtime for AIX 6.1 
  xlC.cpp                    9.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.msg.en_US.cpp          9.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.msg.en_US.rte         11.1.0.1  COMMITTED  XL C/C++ Runtime
  xlC.rte                   11.1.0.1  COMMITTED  XL C/C++ Runtime 
  xlC.sup.aix50.rte          9.0.0.1  COMMITTED  XL C/C++ Runtime for AIX 5.2
$lslpp -l ipfx.rte
lslpp: 0504-132  Fileset ipfx.rte not installed.
$lslpp -ch|grep vac

並且沒有可執行文件(/usr/vacpp/bin/xlC_r),改文件默認配置為ds編譯器,在創建好的項目環境中可以看到如下配置

APT_COMPILEOPT:-O -q64 -qtbtable=full -c    
APT_COMPILER:/usr/vacpp/bin/xlC_r
APT_LINKER:/usr/vacpp/bin/xlC_r
APT_LINKOPT:-G -q64

4.2 解決方法

下載包XL_C_C_plus_plus_for_AIX_V11.1包,解壓進入XL_C_C_plus_plus_for_AIX_V11.1/usr/sys/inst.images目錄,然后執行smitty installp進行安裝。

4.3 安裝后正常顯示

$lslpp -l |grep -i xlC
  xlC.adt.include           11.1.0.0  COMMITTED  C Set ++ Application
  xlC.aix61.rte             11.1.0.1  COMMITTED  XL C/C++ Runtime for AIX 6.1 
  xlC.cpp                    9.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.msg.en_US.cpp          9.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.msg.en_US.rte         11.1.0.1  COMMITTED  XL C/C++ Runtime
  xlC.rte                   11.1.0.1  COMMITTED  XL C/C++ Runtime 
  xlC.sup.aix50.rte          9.0.0.1  COMMITTED  XL C/C++ Runtime for AIX 5.2
$lslpp -l ipfx.rte
lslpp: 0504-132  Fileset ipfx.rte not installed.
[nhsjjhetl01:root]lslpp -ch|grep vac
/usr/lib/objrepos:vac.Bnd:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;59
/usr/lib/objrepos:vac.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;10
/usr/lib/objrepos:vac.aix50.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;59
/usr/lib/objrepos:vac.aix52.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;59
/usr/lib/objrepos:vac.aix53.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;58
/usr/lib/objrepos:vac.html.common.search:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;10
/usr/lib/objrepos:vac.html.en_US.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;09
/usr/lib/objrepos:vac.html.ja_JP.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;09
/usr/lib/objrepos:vac.html.zh_CN.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;08
/usr/lib/objrepos:vac.include:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;30
/usr/lib/objrepos:vac.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;28
/usr/lib/objrepos:vac.lic:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;29
/usr/lib/objrepos:vac.licAgreement:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;08
/usr/lib/objrepos:vac.man.EN_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;04
/usr/lib/objrepos:vac.man.ZH_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;00
/usr/lib/objrepos:vac.man.Zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;01
/usr/lib/objrepos:vac.man.en_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;05
/usr/lib/objrepos:vac.man.zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;02
/usr/lib/objrepos:vac.msg.en_US.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;18
/usr/lib/objrepos:vac.ndi:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;07
/usr/lib/objrepos:vac.pdf.en_US.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;52
/usr/lib/objrepos:vac.pdf.zh_CN.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;51
/usr/lib/objrepos:vacpp.Bnd:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;50
/usr/lib/objrepos:vacpp.cmp.aix50.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;39
/usr/lib/objrepos:vacpp.cmp.aix50.tools:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;39
/usr/lib/objrepos:vacpp.cmp.aix52.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;39
/usr/lib/objrepos:vacpp.cmp.aix52.tools:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;39
/usr/lib/objrepos:vacpp.cmp.aix53.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;38
/usr/lib/objrepos:vacpp.cmp.aix53.tools:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;38
/usr/lib/objrepos:vacpp.cmp.core:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;22
/usr/lib/objrepos:vacpp.cmp.include:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;40
/usr/lib/objrepos:vacpp.cmp.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;40
/usr/lib/objrepos:vacpp.cmp.rte:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;40
/usr/lib/objrepos:vacpp.cmp.tools:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;40
/usr/lib/objrepos:vacpp.html.common:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;38
/usr/lib/objrepos:vacpp.html.en_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;37
/usr/lib/objrepos:vacpp.html.ja_JP:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;36
/usr/lib/objrepos:vacpp.html.zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;36
/usr/lib/objrepos:vacpp.lic:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;35
/usr/lib/objrepos:vacpp.licAgreement:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;34
/usr/lib/objrepos:vacpp.man.EN_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;32
/usr/lib/objrepos:vacpp.man.ZH_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;30
/usr/lib/objrepos:vacpp.man.Zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;28
/usr/lib/objrepos:vacpp.man.en_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;26
/usr/lib/objrepos:vacpp.man.zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;24
/usr/lib/objrepos:vacpp.memdbg.aix50.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix50.rte:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix52.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix52.rte:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix53.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix53.rte:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;21
/usr/lib/objrepos:vacpp.memdbg.rte:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;21
/usr/lib/objrepos:vacpp.msg.en_US.cmp.core:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;44
/usr/lib/objrepos:vacpp.msg.en_US.cmp.tools:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;20
/usr/lib/objrepos:vacpp.ndi:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;22
/usr/lib/objrepos:vacpp.pdf.en_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;20
/usr/lib/objrepos:vacpp.pdf.zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;20
/usr/lib/objrepos:vacpp.samples.ansicl:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;31
/etc/objrepos:vac.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;17
/etc/objrepos:vacpp.cmp.core:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;26

4.2 編譯JOB報錯二、JOB狀態異常

4.2.1 錯誤描述

在重新編譯一個異常停止的JOB時報錯,編譯前JOB因為Hung住無任何的操作,也無法通過正常方式或通過Dire工作停止,最后在后台中kill了該進程,並刪除了$PH$下的ds運行時文件。之后查看JOB的狀態為:"Crashed";

4.2.2 解決方法

進入DSHOME目錄,進入uvsh命令行,鍵入LIST.READU EVERY查詢當前ds lock;

# ./dsenv
# uvsh
DataStage Command Language 9.1 Licensed Materials - Property of IBM
(c) Copyright IBM Corp. 1997, 2012 All Rights Reserved.
DSEngine logged on: Thursday, October 29, 2015 15:02

>LIST.READU EVERY

Active Group Locks:                                    Record Group Group Group
Device.... Inode....  Netnode Userno  Lmode G-Address.  Locks ...RD ...SH ...EX
      2064   1208748        0  56802  14 IN       8000      1     0     0     0 
      2064   1198252        0     11  23 IN        800      1     0     0     0 
      2064   1228935        0  36929  44 IN          1      1     0     0     0 

Active Record Locks:
Device.... Inode....  Netnode Userno  Lmode        Pid Login Id Item-ID.............
      2064   1208748        0  36929  14 RL      28607 dsadm    RT_CONFIG11  
      2064   1198252        0  36929  23 RL      28607 dsadm    dstage1&!DS.ADMIN!&  
      2064   1198252        0  64317  23 RL       1219 dsadm    dstage1&!DS.ADMIN!&  
      2064   1198252        0  56802  23 RL       8734 dsadm    dstage1&!DS.ADMIN!&  
      2064   1228935        0  36929  44 RU      28607 dsadm    ClusterMergeDataFromTabToSeqFile.fifo  

找到相關的lock信息,然后鍵入LOGTO UV,再通過UNLOCK INODE #Inode USER #User ALL命令釋放lock,這里的#Inode表示上面查詢到的Inode....列信息,#User表示Userno列信息;

>LOGTO UV
>UNLOCK INODE 1228935 USER 36929 ALL
Clearing Record locks.
Clearing GROUP locks.
Clearing FILE locks.
>LIST.READU EVERY

Active Group Locks:                                    Record Group Group Group
Device.... Inode....  Netnode Userno  Lmode G-Address.  Locks ...RD ...SH ...EX
      2064   1208748        0  56802  14 IN       8000      1     0     0     0 
      2064   1198252        0     11  23 IN        800      1     0     0     0 

Active Record Locks:
Device.... Inode....  Netnode Userno  Lmode        Pid Login Id Item-ID.............
      2064   1208748        0  36929  14 RL      28607 dsadm    RT_CONFIG11  
      2064   1198252        0  36929  23 RL      28607 dsadm    dstage1&!DS.ADMIN!&  
      2064   1198252        0  64317  23 RL       1219 dsadm    dstage1&!DS.ADMIN!&  
      2064   1198252        0  56802  23 RL       8734 dsadm    dstage1&!DS.ADMIN!&        

在釋放某個lock后同樣可以通過LIST.READU EVERY查詢當前鎖信息。注意:如果你的JOB中包含多個Stage,並且Stage的操作很復雜,這種情況下可能造成ds產生很多個額外的lock,這些lock的Item-ID內容有可能不是JOB名稱,可能像上面的(dstage1&!DS.ADMIN!&)一樣,這時如果你只釋放了帶JOB名的那個索依據解決不了該問題,要解決問題你必須還得釋放其它的額外鎖,so be carefully。
然后嘗試重新編譯job,如果還是不行

# uv -admin -info

Details for DataStage Engine release 9.1.0.0 instance "ade"
===============================================================================
Install history   : Installed by root (admin:dsadm) on 2015-10-26T15:17:42.766
Instance tag      : ade
Engine status     : Running w/active nls
Engine location   : /disk2/IBM/EngineTier/Server/DSEngine
Binary location   : /disk2/IBM/EngineTier/Server/DSEngine/bin
Impersonation     : Enabled
Administrator     : dsadm
Autostart mode    : enabled
Autostart link(s) : /etc/rc.d/init.d/ds.rc
                  : /etc/rc.d/rc2.d/S999ds.rc
                  : /etc/rc.d/rc3.d/S999ds.rc
                  : /etc/rc.d/rc4.d/S999ds.rc
                  : /etc/rc.d/rc5.d/S999ds.rc
Startup script    : /disk2/IBM/EngineTier/Server/DSEngine/sample/ds.rc
Cache Segments    :0 active
User Segments     :3 active

3 phantom printer segments!
 DSnum Uid       Pid   Ppid  C Stime Tty      Time     Command
 52053 dsadm    13483 13482  0 Oct29 ?        00:00:04 dsapi_slave 7 6 0 4
 52169 dsadm    13367 13123  0 Oct29 ?        00:00:00 phantom DSD.RUN ClusterMer
 52413 dsadm    13123 13122  0 Oct29 ?        00:02:13 dsapi_slave 7 6 0 4
# kill -9 13367

kill 掉DSD的進程和dsapi_slave進程。這樣做通常會導致進程異常終止,並且job的狀態為:Crashed;

# dsjob -jobinfo dstage1 ClusterMergeDataFromTabToSeqFile
Job Status      : CRASHED (96)
Job Controller  : not available
Job Start Time  : Thu Oct 29 15:22:49 2015
Job Wave Number : 1
User Status     : not available
Job Control     : 1
Interim Status  : NOT RUNNING (99)
Invocation ID   : not available
Last Run Time   : Fri Oct 30 09:08:37 2015
Job Process ID  : 0
Invocation List : ClusterMergeDataFromTabToSeqFile
Job Restartable : 0

Status code = 0 

CRASHED在ds中代表很多含義,有可能是JOB異常終止,有可能是編譯失敗,有可能是內部錯誤;這時可以通過重置job來讓job回到初始化狀態;

#  dsjob -run -mode RESET dstage1 ClusterMergeDataFromTabToSeqFile    

Status code = 0 

# dsjob -jobinfo dstage1 ClusterMergeDataFromTabToSeqFile
Job Status      : RESET (21)
Job Controller  : not available
Job Start Time  : Fri Oct 30 09:37:53 2015
Job Wave Number : 0
User Status     : not available
Job Control     : 0
Interim Status  : NOT RUNNING (99)
Invocation ID   : not available
Last Run Time   : Fri Oct 30 09:37:53 2015
Job Process ID  : 0
Invocation List : ClusterMergeDataFromTabToSeqFile
Job Restartable : 0

Status code = 0 

完成RESET后,你可以嘗試編譯或VALIDATE操作,如果還是不能解決問題,請重啟Engine。

# dsjob -run -mode VALIDATE dstage1 ClusterMergeDataFromTabToSeqFile     

Status code = 0 
# dsjob -jobinfo dstage1 ClusterMergeDataFromTabToSeqFile
Job Status      : RUNNING (0)
Job Controller  : not available
Job Start Time  : Fri Oct 30 09:42:24 2015
Job Wave Number : 1
User Status     : not available
Job Control     : 0
Interim Status  : NOT RUNNING (99)
Invocation ID   : not available
Last Run Time   : Thu Jan  1 08:00:00 1970
Job Process ID  : 25353
Invocation List : ClusterMergeDataFromTabToSeqFile
Job Restartable : 0

Status code = 0 

5 Agent attach出錯

5.1 錯誤描述

在通過Connector import導入Oracle表定義信息時報錯31531 not available,檢查AIX6.0端口信息;

$netstat -Ana|grep 31531
f1000e00088eb3b8 tcp        0      0  *.31531               *.*                   LISTEN
f1000e0001048bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33436    CLOSE_WAIT
f1000e0000e1cbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33438    CLOSE_WAIT
f1000e0000b75bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33440    CLOSE_WAIT
f1000e000114ebb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33442    CLOSE_WAIT
f1000e0000b813b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33444    CLOSE_WAIT
f1000e0000b61bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33446    CLOSE_WAIT
f1000e0000ad9bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33449    CLOSE_WAIT
f1000e0000d583b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33452    CLOSE_WAIT
f1000e0000c09bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33454    CLOSE_WAIT
f1000e0000af23b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33456    CLOSE_WAIT
f1000e0000c1ebb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33458    CLOSE_WAIT
f1000e00010813b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33460    CLOSE_WAIT
f1000e0000e493b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33462    CLOSE_WAIT
f1000e0000f553b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33464    CLOSE_WAIT
f1000e0000f87bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33468    CLOSE_WAIT
f1000e0000ad0bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33470    CLOSE_WAIT
f1000e0000cd6bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33472    CLOSE_WAIT
f1000e0000d9abb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33474    CLOSE_WAIT
f1000e0000a793b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33477    CLOSE_WAIT
f1000e0000e5f3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33479    CLOSE_WAIT
f1000e0000f173b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33482    CLOSE_WAIT
f1000e0000b45bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33484    CLOSE_WAIT
f1000e0000dd23b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33486    CLOSE_WAIT
f1000e0000095bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33488    CLOSE_WAIT
f1000e0000ac03b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33490    CLOSE_WAIT
f1000e000011c3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33492    CLOSE_WAIT
f1000e0000b24bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33495    CLOSE_WAIT
f1000e0000c18bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33497    CLOSE_WAIT
f1000e0000d0c3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33499    CLOSE_WAIT
f1000e0000a7e3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33501    CLOSE_WAIT
f1000e00000c8bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33503    CLOSE_WAIT
f1000e0000b013b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33505    CLOSE_WAIT
f1000e0000a93bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33507    CLOSE_WAIT
f1000e0001094bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33509    CLOSE_WAIT
f1000e0000b313b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33511    CLOSE_WAIT
f1000e0000c16bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33513    CLOSE_WAIT
f1000e0000cd23b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33515    CLOSE_WAIT
f1000e0000ae6bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33517    CLOSE_WAIT
f1000e00001023b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33519    CLOSE_WAIT
f1000e0000b9c3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33521    CLOSE_WAIT
f1000e00011d13b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33523    CLOSE_WAIT
f1000e0000d0f3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33525    CLOSE_WAIT
f1000e0000c84bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33528    CLOSE_WAIT
f1000e0000fdebb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33530    CLOSE_WAIT
f1000e0000fc2bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33532    CLOSE_WAIT
f1000e00000c93b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33534    CLOSE_WAIT
f1000e0000ae43b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33536    CLOSE_WAIT
f1000e0000fd73b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33538    CLOSE_WAIT
f1000e00000bbbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33540    CLOSE_WAIT
f1000e0000c103b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33542    CLOSE_WAIT
f1000e000119dbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33544    CLOSE_WAIT
f1000e0000cca3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33546    CLOSE_WAIT
f1000e00000aabb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33548    CLOSE_WAIT
f1000e0000d8abb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33550    CLOSE_WAIT
f1000e0001040bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33552    CLOSE_WAIT
f1000e0000e983b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33555    CLOSE_WAIT
f1000e0000a7dbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33557    CLOSE_WAIT
f1000e0000c43bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33559    CLOSE_WAIT
f1000e0000b8c3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33561    CLOSE_WAIT
f1000e0000a64bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33563    CLOSE_WAIT
f1000e0000b4f3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33565    CLOSE_WAIT
f1000e0000d5fbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33567    CLOSE_WAIT
f1000e0000199bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33571    CLOSE_WAIT
f1000e0000f56bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33573    CLOSE_WAIT
f1000e00091bfbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33575    CLOSE_WAIT
f1000e0000b17bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33577    CLOSE_WAIT
f1000e0001204bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33579    CLOSE_WAIT
f1000e0000ec4bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33581    CLOSE_WAIT
f1000e0000f143b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33584    CLOSE_WAIT
f1000e0001096bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33586    CLOSE_WAIT
f1000e0000ab4bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33588    CLOSE_WAIT
f1000e0000f9ebb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33590    CLOSE_WAIT
f1000e0000134bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33592    CLOSE_WAIT
f1000e00010dcbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33594    CLOSE_WAIT
f1000e0000fd3bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33597    CLOSE_WAIT
f1000e00000b7bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33601    CLOSE_WAIT
f1000e00010d7bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33603    CLOSE_WAIT
f1000e0008fba3b8 tcp        0      0  192.168.1.12.35035    192.168.1.12.31531    SYN_SENT
f1000e0000b64bb8 tcp        0      0  192.168.1.12.35038    192.168.1.12.31531    SYN_SENT
f1000e0008ee1bb8 tcp        0      0  192.168.1.12.35041    192.168.1.12.31531    SYN_SENT
f1000e000913c3b8 tcp        0      0  192.168.1.12.35044    192.168.1.12.31531    SYN_SENT
f1000e0000fde3b8 tcp        0      0  192.168.1.12.35047    192.168.1.12.31531    SYN_SENT
f1000e0000e17bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.35050    CLOSE_WAIT
f1000e0000bcf3b8 tcp        0      0  192.168.1.12.35050    192.168.1.12.31531    FIN_WAIT_2
f1000e00091f43b8 tcp        0      0  192.168.1.12.35053    192.168.1.12.31531    SYN_SENT
f1000e00090113b8 tcp        0      0  192.168.1.12.35056    192.168.1.12.31531    SYN_SENT

發現有很多狀態為CLOSE_WAIT的進程,用rmsock檢查會發現有些狀態為CLOSE_WAIT的進程已經不存在了,這些連接已經關閉;

$rmsock f1000e0000f973b8 tcpcb
socket 0xf97008 is removed.

但某些原因導致它發生CLOSE_WAIT,比如客戶端出錯程序異常退出、客戶端與服務端網絡連接異常斷開。

5.2 處理方法

在AIX上可以通過rmsock(Removes a socket that does not have a file descriptor)remove 空的進程,如果進程非空可以通過進程ID查詢該進程的信息,然后kill。

rmsock f1000e00088eb3b8 tcpcb
The socket 0xf1000e00088eb008 is being held by proccess 13041736 (RunAgent).
$ps -ef|grep 13041736
    root 13041736 15270118  62 15:13:02  pts/1 60:33 /opt/IBM/InformationServer/ASBNode/bin/RunAgent -Xbootclasspath/a:conf -Djava.ext.dirs=apps/jre/lib/ext:lib/java:eclipse/plugins:eclipse/plugins/com.ibm.isf.client -Djava.class.path=conf -Djava.security.auth.login.config=/opt/IBM/InformationServer/ASBNode/eclipse/plugins/com.ibm.isf.client/auth.conf -Dcom.ibm.CORBA.ConfigURL=file:/opt/IBM/InformationServer/ASBNode/eclipse/plugins/com.ibm.isf.client/sas.client.props -Dcom.ibm.SSL.ConfigURL=file:/opt/IBM/InformationServer/ASBNode/eclipse/plugins/com.ibm.isf.client/ssl.client.props -Dcom.ibm.CORBA.enableClientCallbacks=true -Dcom.ibm.CORBA.FragmentSize=128000 -class com/ascential/asb/agent/impl/AgentImpl run
[nhdbtest07:root]kill -9 13041736

狀態為CLOSE_WAIT的進程清除后連接正常了。

netstat -Ana|grep 31531
f1000e0008f1fbb8 tcp        0      0  *.31531               *.*                   LISTEN
f1000e00011d33b8 tcp        0      0  192.168.1.12.35538    192.168.1.12.31531    TIME_WAIT

6 JOB運行時找不到可執行文件

JOB運行時找不到libccora11g.so和libclntsh.so.11.1

Error loading connector library libccora11g.so. libclntsh.so.11.1: cannot open shared object file: No such file or directory 

6.1 錯誤描述

當運行包含Oracle Connector或其它操作Oracle數據庫的stage的JOB時報錯;在stage上測試可能是成功的。

6.2 解決方法

1)首先在Oracle用戶下確認可以正常連接和訪問數據庫;
2)確認dsenv下的Oracle環境變量配置無誤;
3)確認$ORACLE_HOME/lib目錄下是否有libccora11g.so文件或link;
4)如果以上都沒有問題,在Engine安裝目錄下找到libccora11g.so文件,該文件通常是在EngineTier/Server/StagingArea/Installed/OracleConnector/Server/linux/libccora11g.so,然后將該文件軟link到$ORACLE_HOME/lib目錄下;

# find /disk2/IBM/EngineTier -name "libccora11g.so"

# ln -s /disk2/IBM/EngineTier/Server/StagingArea/Installed/OracleConnector/Server/linux/libccora11g.so $ORACLE_HOME/lib

接着找到install.liborchoracle文件,編輯給文件找到如下內容:

install_driver() {
  case $version in
     9 ) VER='9i';;
    10 ) VER='10g';;
     0 ) return;;
  esac

如果你使用的數據庫是11G就把內容改為:

install_driver() {
  case $version in
     9 ) VER='9i';;
    10|11) VER='10g';;
     0 ) return;;
  esac

然后保存退出並執行該文件。

7 JOB運行時異常

main_program: Fatal Error: The set of available nodes for op2 (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc, nulls={value=first}}}}(0)).
is empty.  This set is influenced by calls to addNodeConstraint(),
addResourceConstraint() and setAvailableNodes().  If none of these
functions have been called on this operator, then the default node
pool must be empty.
This step has 5 datasets:
ds0: {op0[] (sequential Select_department)
      eOther(APT_HashPartitioner { key={ value=DEPTNO, 
        subArgs={ desc }
      }
})>eCollectAny
      op3[] (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc}}}(1))}
ds1: {op1[] (parallel Select_employee)
      eOther(APT_HashPartitioner { key={ value=DEPTNO, 
        subArgs={ desc }
      }
})>eCollectAny
      op2[] (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc, nulls={value=first}}}}(0))}
ds2: {op2[] (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc, nulls={value=first}}}}(0))
      [pp] eSame>eCollectAny
      op4[] (parallel Merge_2)}
ds3: {op3[] (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc}}}(1))
      [pp] eSame>eCollectAny
      op4[] (parallel Merge_2)}
ds4: {op4[] (parallel Merge_2)
      >eCollectAny
      op5[] (sequential APT_RealFileExportOperator1 in Sequential_File_3)}
It has 6 operators:
op0[] {(sequential Select_department)
    }
op1[] {(parallel Select_employee)
    }
op2[] {(parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc, nulls={value=first}}}}(0))
    }
op3[] {(parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc}}}(1))
    }
op4[] {(parallel Merge_2)
    }
op5[] {(sequential APT_RealFileExportOperator1 in Sequential_File_3)
    }

7.1 錯誤描述

我創建了一個這樣的JOB:
Oracle_Connector1 --> Merge --> Sequential_File
Oracle_Connector2 --^
並且JOB運行在兩個節點的集群環境中,在非集群環境下JOB成功運行。

7.2 錯誤分析

仔細查看上面的錯誤內容發現在op0(Oracle_Connector1)、op1(Oracle_Connector2)之后ds自動產生了op2和op3操作(parallel inserted tsort operator),但在運行過程中又有一個集群環境中的操作產生了排序操作導致了報錯,也就是說ds在運行時會自動將數據進行排序;so 檢查APT_CONFIG_FILE配置文件發現有個節點pools設置為了"io";

{
  node "node1"
   {
      fastname "dsconductor01"
      pools "conductor"
      resource disk "/tmp/ds/resource" {pools ""}
      resource scratchdisk "/tmp/ds/scratch" {pools ""}
   }

     node "node2"
   {
      fastname "dscompute01"
      pools "io"
     resource disk "/tmp/ds/resource" {pools ""}
      resource scratchdisk "/tmp/ds/scratch" {pools ""}
   }
}

pools設置為"io"表示該節點有較好的io功能。

7.3 解決方法

有兩種方法可以解決這個問題:
1) 將APT_CONFIG_FILE配置文件中的io節點設置為默認的ds pool(pools "")節點,但這樣做顯示的去除了某些較好、有用、可以顯著提高集群性能的資源。
2) 在該JOB中配置參數APT_NO_SORT_INSERTION值為True。但是這樣做存在一定的風險,比如某些情況下我們並不知道ds什么時候會對數據進行排序,如果這樣做就等於顯示的告訴了它不用自動排序了,這些數據已經排好序了,但實際上有些stage之前是需要它自動排序的,比如join、merge,導致的后果就是數據會不正確,引發其它類型的錯誤等。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM