Hadoop(1)---運行Hadoop自帶的wordcount出錯問題。


    在hadoop2.9.0版本中,對namenode、yarn做了ha,隨后在某一台namenode節點上運行自帶的wordcount程序出現偶發性的錯誤(有時成功,有時失敗),錯誤信息如下:

18/08/16 17:02:42 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/08/16 17:02:42 INFO input.FileInputFormat: Total input files to process : 1
18/08/16 17:02:42 INFO mapreduce.JobSubmitter: number of splits:1
18/08/16 17:02:42 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
18/08/16 17:02:42 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/08/16 17:02:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534406793739_0005
18/08/16 17:02:42 INFO impl.YarnClientImpl: Submitted application application_1534406793739_0005
18/08/16 17:02:43 INFO mapreduce.Job: The url to track the job: http://HLJRslog2:8088/proxy/application_1534406793739_0005/
18/08/16 17:02:43 INFO mapreduce.Job: Running job: job_1534406793739_0005
18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 running in uber mode : false
18/08/16 17:02:54 INFO mapreduce.Job: map 0% reduce 0%
18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 failed with state FAILED due to: Application application_1534406793739_0005 failed 2 times due to AM Container for appattempt_1534406793739_0005_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2018-08-16 17:02:48.561]Exception from container-launch.
Container id: container_e27_1534406793739_0005_02_000001
Exit code: 1
[2018-08-16 17:02:48.562]
[2018-08-16 17:02:48.574]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

[2018-08-16 17:02:48.575]
[2018-08-16 17:02:48.575]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

分析與解決:

網上對類似問題解決辦法,主要就是添加對應的classpath,測試了一遍,都不好使,說明上訴問題並不是classpath造成的,出錯的時候也查看了classpath,都有對應的值,這里貼一下添加classpath的方法。

1、# yarn classpath    注:查看對應的classpath的值

/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/etc/hadoop:/data1/hadoop/hadoop/share/hadoop/common/lib/*:/data1/hadoop/hadoop/share/hadoop/common/*:/data1/hadoop/hadoop/share/hadoop/hdfs:/data1/hadoop/hadoop/share/hadoop/hdfs/lib/*:/data1/hadoop/hadoop/share/hadoop/hdfs/*:/data1/hadoop/hadoop/share/hadoop/yarn:/data1/hadoop/hadoop/share/hadoop/yarn/lib/*:/data1/hadoop/hadoop/share/hadoop/yarn/*:/data1/hadoop/hadoop/share/hadoop/mapreduce/lib/*:/data1/hadoop/hadoop/share/hadoop/mapreduce/*:/data1/hadoop/hadoop/contrib/capacity-scheduler/*.jar:/data1/hadoop/hadoop/share/hadoop/yarn/*:/data1/hadoop/hadoop/share/hadoop/yarn/lib/*

如果是上述類變量為空,可以通過下面三個步驟添加classpath。

2.修改mapred.site.xml

添加:

<property> 
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>

 

3.yarn.site.xml

添加:

 

<property>
    <name>yarn.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>

 

 4.修改環境變量

#vim ~/.bashrc

在文件最后添加下述環境變量:

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

 5. source ~/.bashrc

 

解決報錯問題:

從日志可以看出,發現是由於跑AM的container退出了,並沒有為任務去RM獲取資源,懷疑是AM和RM通信有問題;一台是備RM,一台活動的RM,在yarn內部,當MR去活動的RM為任務獲取資源的時候當然沒問題,但是去備RM獲取時就會出現這個問題了。

修改vim yarn-site.xml

<property>
<!-- 客戶端通過該地址向RM提交對應用程序操作 -->
<name>yarn.resourcemanager.address.rm1</name>
<value>master:8032</value>
</property>
<property>
<!--ResourceManager 對ApplicationMaster暴露的訪問地址。ApplicationMaster通過該地址向RM申請資源、釋放資源等。 -->
<name>yarn.resourcemanager.scheduler.address.rm1</name>  
<value>master:8030</value>
</property>
<property>
<!-- RM HTTP訪問地址,查看集群信息-->
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:8088</value>
</property>
<property>
<!-- NodeManager通過該地址交換信息 -->
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>master:8031</value>
</property>
<property>
<!--管理員通過該地址向RM發送管理命令 -->
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>master:23142</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>slave1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>slave1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>slave1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>slave1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>slave1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>slave1:23142</value>
</property>
-->

注:標紅的地方就是AM向RM申請資源的rpc端口,出錯問題就在這里。

 

       紅框里面是我在rm1機器(也就是master)上的yarn文件添加的;當然,如果是在slave1里面添加的話就是添加紅框上面以.rm1結尾的那幾行,其實,說白點,就是要在yarn-site.xml這個配置文件里面添加所有resourcemanager機器的通信主機與端口。然后拷貝到其他機器,重新啟動yarn。最后在跑wordcount或者其他程序沒在出錯。其實這就是由於MR與RM通信的問題,所以在配置yarn-site.xml文件的時候,最好把主備的通信端口都配置到改文件,防止出錯。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM