hadoop2.7.x運行wordcount程序卡住在INFO mapreduce.Job: Running job:job _1469603958907_0002


一、拋出問題  

  Hadoop集群(全分布式)配置好后,運行wordcount程序測試,發現每次運行都會卡住在Running job處,然后程序就呈現出卡死的狀態。

  wordcount運行命令:[hadoop@master hadoop-2.7.2]$ /opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar  wordcount  /wc/mytemp/123 /wc/mytemp/output

  現象截圖如下:卡死在紅線部分:

      

二、解決方法

  1、因為小白一枚,到網上找了很多教程,集中說法如下:

    (1)有的說,是防火牆或者selinux沒關閉,然后,就去一一查看,發現全部關閉

    (2)有的說,是因為/etc/hosts文件中的127.0.0.1等多余的ip地址沒刪除或者沒注釋調

    (3)有的人說,查看日志(what?小白哪知道哪個日志),然后不了了之。

  2、解決辦法:  

  小白解決問題總是會花費很多時間的,因此半天就這樣沒了,很對不起公司的工資啊,現將解決辦法一一列出。

  (1)第一步:因為Running job發生的問題,在hadoop 中我們要想到mapreduce發生的問題,在Hadoop2.x系列中MapReduce是通過yarn進行管理的,因此我們查看yarn-hadoop-nodemanager-slave01.log 日志,該日志在slave節點的¥{HADOOP_HOME}/logs下面

終端執行shell指令:yarn-hadoop-nodemanager-slave01.log

查看到日志截圖如下:

2016-07-27 03:30:51,041 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:52,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:53,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:54,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:55,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:56,050 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:31:27,053 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

(2)大概的解釋一下意思

  就是說每次Client試圖連接0.0.0.0/0.0.0.0:8031失敗,那么導致這個原因,應該能想到是配置問題,然后復制這段信息進行百度,嘗試了幾個,終於參考了此博客(解決Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is... )解決了本文的問題,將下述代碼添加到yare-site.xml中:(注意我將master、slave01、slave02這個文件都修改了,是不是只修改master就可以,不清楚,但是初步判斷應該全部修改

  

<property>  
    <name>yarn.resourcemanager.address</name>  
    <value>master:8032</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.scheduler.address</name>  
    <value>master:8030</value>  
  </property>  
  <property>  
    <name>yarn.resourcemanager.resource-tracker.address</name>  
    <value>master:8031</value>  
  </property> 

 

然后插入后的效果如圖:

 

(3)問題解決

再次運行wordcount程序成功:

[hadoop@master hadoop-2.7.2]$ /opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar  wordcount  /wc/mytemp/123 /wc/mytemp/output
16/07/27 03:33:29 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.95.100:8032
16/07/27 03:33:31 INFO input.FileInputFormat: Total input paths to process : 1
16/07/27 03:33:31 INFO mapreduce.JobSubmitter: number of splits:1
16/07/27 03:33:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469604761767_0001
16/07/27 03:33:32 INFO impl.YarnClientImpl: Submitted application application_1469604761767_0001
16/07/27 03:33:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1469604761767_0001/
16/07/27 03:33:32 INFO mapreduce.Job: Running job: job_1469604761767_0001
16/07/27 03:33:47 INFO mapreduce.Job: Job job_1469604761767_0001 running in uber mode : false
16/07/27 03:33:47 INFO mapreduce.Job:  map 0% reduce 0%
16/07/27 03:33:55 INFO mapreduce.Job:  map 100% reduce 0%
16/07/27 03:34:08 INFO mapreduce.Job:  map 100% reduce 100%
16/07/27 03:34:08 INFO mapreduce.Job: Job job_1469604761767_0001 completed successfully
16/07/27 03:34:08 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=1291
                FILE: Number of bytes written=237185
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1498
                HDFS: Number of bytes written=1035
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6738
                Total time spent by all reduces in occupied slots (ms)=9139
                Total time spent by all map tasks (ms)=6738
                Total time spent by all reduce tasks (ms)=9139
                Total vcore-milliseconds taken by all map tasks=6738

用如下命令可以查看統計結果:

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM