hadoop2.7.x運行wordcount程序卡住在INFO mapreduce.Job: Running job:job _1469603958907_0002

本文轉載自查看原文 2016-07-27 16:09 9652 wordcount/ 0.0.0.0/ 卡住/ Running Job/ Retry/ Hadoop2.x

一、拋出問題　　

　　Hadoop集群（全分布式）配置好后，運行wordcount程序測試，發現每次運行都會卡住在Running job處，然后程序就呈現出卡死的狀態。

　　wordcount運行命令：[hadoop@master hadoop-2.7.2]$ /opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /wc/mytemp/123 /wc/mytemp/output

　　現象截圖如下：卡死在紅線部分：

二、解決方法

　　1、因為小白一枚，到網上找了很多教程，集中說法如下：

　　　　（1）有的說，是防火牆或者selinux沒關閉，然后，就去一一查看，發現全部關閉

　　　　（2）有的說，是因為/etc/hosts文件中的127.0.0.1等多余的ip地址沒刪除或者沒注釋調

　　　　（3）有的人說，查看日志（what？小白哪知道哪個日志），然后不了了之。

　　2、解決辦法：　　

　　小白解決問題總是會花費很多時間的，因此半天就這樣沒了，很對不起公司的工資啊，現將解決辦法一一列出。

　　（1）第一步：因為Running job發生的問題，在hadoop 中我們要想到mapreduce發生的問題，在Hadoop2.x系列中MapReduce是通過yarn進行管理的，因此我們查看yarn-hadoop-nodemanager-slave01.log 日志，該日志在slave節點的￥{HADOOP_HOME}/logs下面

終端執行shell指令：yarn-hadoop-nodemanager-slave01.log

查看到日志截圖如下：

2016-07-27 03:30:51,041 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:52,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:53,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:54,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:55,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:56,050 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:31:27,053 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

（2）大概的解釋一下意思

　　就是說每次Client試圖連接0.0.0.0/0.0.0.0:8031失敗，那么導致這個原因，應該能想到是配置問題，然后復制這段信息進行百度，嘗試了幾個，終於參考了此博客（解決Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is... ）解決了本文的問題，將下述代碼添加到yare-site.xml中：（注意我將master、slave01、slave02這個文件都修改了，是不是只修改master就可以，不清楚，但是初步判斷應該全部修改）

<property>  
    <name>yarn.resourcemanager.address</name>  
    <value>master:8032</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.scheduler.address</name>  
    <value>master:8030</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.resource-tracker.address</name>  
    <value>master:8031</value>  
  </property>

然后插入后的效果如圖：

（3）問題解決

再次運行wordcount程序成功：

[hadoop@master hadoop-2.7.2]$ /opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /wc/mytemp/123 /wc/mytemp/output
16/07/27 03:33:29 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.95.100:8032
16/07/27 03:33:31 INFO input.FileInputFormat: Total input paths to process : 1
16/07/27 03:33:31 INFO mapreduce.JobSubmitter: number of splits:1
16/07/27 03:33:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469604761767_0001
16/07/27 03:33:32 INFO impl.YarnClientImpl: Submitted application application_1469604761767_0001
16/07/27 03:33:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1469604761767_0001/
16/07/27 03:33:32 INFO mapreduce.Job: Running job: job_1469604761767_0001
16/07/27 03:33:47 INFO mapreduce.Job: Job job_1469604761767_0001 running in uber mode : false
16/07/27 03:33:47 INFO mapreduce.Job: map 0% reduce 0%
16/07/27 03:33:55 INFO mapreduce.Job: map 100% reduce 0%
16/07/27 03:34:08 INFO mapreduce.Job: map 100% reduce 100%
16/07/27 03:34:08 INFO mapreduce.Job: Job job_1469604761767_0001 completed successfully
16/07/27 03:34:08 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=1291
                FILE: Number of bytes written=237185
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1498
                HDFS: Number of bytes written=1035
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6738
                Total time spent by all reduces in occupied slots (ms)=9139
                Total time spent by all map tasks (ms)=6738
                Total time spent by all reduce tasks (ms)=9139
                Total vcore-milliseconds taken by all map tasks=6738

用如下命令可以查看統計結果：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hadoop運行任務時一直卡在: INFO mapreduce.Job: Running job mapreduce.Job: Running job: job_1553100392548_0001 MapReduce任務運行到running job卡住 hadoop job -kill 與 yarn application -kii（作業卡了或作業重復提交或MapReduce任務運行到running job卡住） MapReduce執行卡在Running job不動 Job Hadoop: MapReduce2多個job串行處理 hadoop運行原理之Job運行(二) Job提交及初始化 Hadoop MapReduce編程 API入門系列之多個Job迭代式MapReduce運行（十二） hadoop命令 -- job相關