一、拋出問題
Hadoop集群(全分布式)配置好后,運行wordcount程序測試,發現每次運行都會卡住在Running job處,然后程序就呈現出卡死的狀態。
wordcount運行命令:[hadoop@master hadoop-2.7.2]$ /opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /wc/mytemp/123 /wc/mytemp/output
現象截圖如下:卡死在紅線部分:
二、解決方法
1、因為小白一枚,到網上找了很多教程,集中說法如下:
(1)有的說,是防火牆或者selinux沒關閉,然后,就去一一查看,發現全部關閉
(2)有的說,是因為/etc/hosts文件中的127.0.0.1等多余的ip地址沒刪除或者沒注釋調
(3)有的人說,查看日志(what?小白哪知道哪個日志),然后不了了之。
2、解決辦法:
小白解決問題總是會花費很多時間的,因此半天就這樣沒了,很對不起公司的工資啊,現將解決辦法一一列出。
(1)第一步:因為Running job發生的問題,在hadoop 中我們要想到mapreduce發生的問題,在Hadoop2.x系列中MapReduce是通過yarn進行管理的,因此我們查看yarn-hadoop-nodemanager-slave01.log 日志,該日志在slave節點的¥{HADOOP_HOME}/logs下面
終端執行shell指令:yarn-hadoop-nodemanager-slave01.log
查看到日志截圖如下:
2016-07-27 03:30:51,041 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:52,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:53,046 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:54,047 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:55,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:30:56,050 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-07-27 03:31:27,053 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
(2)大概的解釋一下意思
就是說每次Client試圖連接0.0.0.0/0.0.0.0:8031失敗,那么導致這個原因,應該能想到是配置問題,然后復制這段信息進行百度,嘗試了幾個,終於參考了此博客(解決Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is... )解決了本文的問題,將下述代碼添加到yare-site.xml中:(注意我將master、slave01、slave02這個文件都修改了,是不是只修改master就可以,不清楚,但是初步判斷應該全部修改)
<property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property>
然后插入后的效果如圖:
(3)問題解決
再次運行wordcount程序成功:
[hadoop@master hadoop-2.7.2]$ /opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /wc/mytemp/123 /wc/mytemp/output
16/07/27 03:33:29 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.95.100:8032
16/07/27 03:33:31 INFO input.FileInputFormat: Total input paths to process : 1
16/07/27 03:33:31 INFO mapreduce.JobSubmitter: number of splits:1
16/07/27 03:33:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469604761767_0001
16/07/27 03:33:32 INFO impl.YarnClientImpl: Submitted application application_1469604761767_0001
16/07/27 03:33:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1469604761767_0001/
16/07/27 03:33:32 INFO mapreduce.Job: Running job: job_1469604761767_0001
16/07/27 03:33:47 INFO mapreduce.Job: Job job_1469604761767_0001 running in uber mode : false
16/07/27 03:33:47 INFO mapreduce.Job: map 0% reduce 0%
16/07/27 03:33:55 INFO mapreduce.Job: map 100% reduce 0%
16/07/27 03:34:08 INFO mapreduce.Job: map 100% reduce 100%
16/07/27 03:34:08 INFO mapreduce.Job: Job job_1469604761767_0001 completed successfully
16/07/27 03:34:08 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=1291
FILE: Number of bytes written=237185
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1498
HDFS: Number of bytes written=1035
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6738
Total time spent by all reduces in occupied slots (ms)=9139
Total time spent by all map tasks (ms)=6738
Total time spent by all reduce tasks (ms)=9139
Total vcore-milliseconds taken by all map tasks=6738
用如下命令可以查看統計結果: