1:Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
Answer:
程序里面需要打開多個文件,進行分析,系統一般默認數量是1024,(用ulimit -a可以看到)對於正常使用是夠了,但是對於程序來講,就太少了。
修改辦法:
修改2個文件。
/etc/security/limits.conf
vi /etc/security/limits.conf
加上:
* soft nofile 102400
* hard nofile 409600
$cd /etc/pam.d/
$sudo vi login
添加 session required /lib/security/pam_limits.so
針對第一個問題我糾正下答案:
這是reduce預處理階段shuffle時獲取已完成的map的輸出失敗次數超過上限造成的,上限默認為5。引起此問題的方式可能會有很多種,比如網絡連接不正常,連接超時,帶寬較差以及端口阻塞等。。。通常框架內網絡情況較好是不會出現此錯誤的。
2:Too many fetch-failures
Answer:
出現這個問題主要是結點間的連通不夠全面。
1) 檢查 、/etc/hosts
要求本機ip 對應 服務器名
要求要包含所有的服務器ip + 服務器名
2) 檢查 .ssh/authorized_keys
要求包含所有服務器(包括其自身)的public key
3:處理速度特別的慢 出現map很快 但是reduce很慢 而且反復出現 reduce=0%
Answer:
結合第二點,然后
修改 conf/hadoop-env.sh 中的export HADOOP_HEAPSIZE=4000
4:能夠啟動datanode,但無法訪問,也無法結束的錯誤
在重新格式化一個新的分布式文件時,需要將你NameNode上所配置的dfs.name.dir這一namenode用來存放NameNode 持久存儲名字空間及事務日志的本地文件系統路徑刪除,同時將各DataNode上的dfs.data.dir的路徑 DataNode 存放塊數據的本地文件系統路徑的目錄也刪除。如本此配置就是在NameNode上刪除/home/hadoop/NameData,在DataNode上刪除/home/hadoop/DataNode1和/home/hadoop/DataNode2。這是因為Hadoop在格式化一個新的分布式文件系統時,每個存儲的名字空間都對應了建立時間的那個版本(可以查看/home/hadoop /NameData/current目錄下的VERSION文件,上面記錄了版本信息),在重新格式化新的分布式系統文件時,最好先刪除NameData 目錄。必須刪除各DataNode的dfs.data.dir。這樣才可以使namedode和datanode記錄的信息版本對應。
注意:刪除是個很危險的動作,不能確認的情況下不能刪除!!做好刪除的文件等通通備份!!
5:java.io.IOException: Could not obtain block: blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_log/src_20090724_log
出現這種情況大多是結點斷了,沒有連接上。
6:java.lang.OutOfMemoryError: Java heap space
出現這種異常,明顯是jvm內存不夠得原因,要修改所有的datanode的jvm內存大小。
Java -Xms1024m -Xmx4096m
一般jvm的最大內存使用應該為總內存大小的一半,我們使用的8G內存,所以設置為4096m,這一值可能依舊不是最優的值。
7: Namenode in safe mode
解決方法
bin/hadoop dfsadmin -safemode leave
8:java.net.NoRouteToHostException: No route to host
j解決方法:
sudo /etc/init.d/iptables stop
9:更改namenode后,在hive中運行select 依舊指向之前的namenode地址
這是因為:When youcreate a table, hive actually stores the location of the table (e.g.
hdfs://ip:port/user/root/...) in the SDS and DBS tables in the metastore . So when I bring up a new cluster the master has a new IP, but hive's metastore is still pointing to the locations within the old
cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master
所以要將metastore中的之前出現的namenode地址全部更換為現有的namenode地址
10:Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put).
解決方法:
Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster.
If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s).
If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin's df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.
11:Your DataNodes won't start, and you see something like this in logs/*datanode*:
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data
原因:
Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS.
解決方法:
You need to do something like this:
bin/stop-all.sh
rm -Rf /tmp/hadoop-your-username/*
bin/hadoop namenode -format
12:You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won't work.
原因:
You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.
解決方法:
Use absolute paths like this from the tutorial:
bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar /
-mapper $HOME/proj/hadoop/multifetch.py /
-reducer $HOME/proj/hadoop/reducer.py /
-input urls/* /
-output titles
9:更改namenode后,在hive中運行select 依舊指向之前的namenode地址
這是因為:When youcreate a table, hive actually stores the location of the table (e.g.
hdfs://ip:port/user/root/...) in the SDS and DBS tables in the metastore . So when I bring up a new cluster the master has a new IP, but hive's metastore is still pointing to the locations within the old
cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master
所以要將metastore中的之前出現的namenode地址全部更換為現有的namenode地址
13: 2009-01-08 10:02:40,709 ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ""PARTITIONS"" in Catalog "" Schema "". JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "org.jpox.autoCreateTables"
原因:就是因為在 hive-default.xml 里把 org.jpox.fixedDatastore 設置成 true 了
14:09/08/31 18:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010
> 09/08/31 18:25:45 INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001
> 09/08/31 18:25:51 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.16:50010
> 09/08/31 18:25:51 INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001
> 09/08/31 18:25:57 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.11:50010
> 09/08/31 18:25:57 INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001
> 09/08/31 18:26:03 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.16:50010
> 09/08/31 18:26:03 INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001
> 09/08/31 18:26:09 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable
to create new block.
> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)
> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)
>
> 09/08/31 18:26:09 WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001
bad datanode[2] nodes == null
> 09/08/31 18:26:09 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/umer/8GB_input"
- Aborting...
> put: Bad connect ack with firstBadLink 192.168.1.16:50010
解決方法:
I have resolved the issue:
What i did:
1) '/etc/init.d/iptables stop' -->stopped firewall
2) SELINUX=disabled in '/etc/selinux/config' file.-->disabled selinux
I worked for me after these two changes
15、把IP換成主機名,datanode 掛不上 解決方法: 把temp文件刪除,重啟hadoop集群就行了 是因為多次部署,造成temp文件與namenode不一致的原因 (注:來源於 群聊天記錄 感謝 beyi 童鞋) |
解決jline.ConsoleReader.readLine在Windows上不生效問題方法
在CliDriver.java的main()函數中,有一條語句reader.readLine,用來讀取標准輸入,但在Windows平台上該語句總是返回null,這個reader是一個實例jline.ConsoleReader實例,給Windows Eclipse調試帶來不便。
我們可以通過使用java.util.Scanner.Scanner來替代它,將原來的
while ((line=reader.readLine(curPrompt+"> ")) != null)
復制代碼
替換為:
Scanner sc = new Scanner(System.in);
while ((line=sc.nextLine()) != null)
復制代碼
重新編譯發布,即可正常從標准輸入讀取輸入的SQL語句了。
Windows eclispe調試hive報does not have a scheme錯誤可能原因
1、Hive配置文件中的“hive.metastore.local”配置項值為false,需要將它修改為true,因為是單機版
2、沒有設置HIVE_HOME環境變量,或設置錯誤
3、“does not have a scheme”很可能是因為找不到“hive-default.xml”。使用Eclipse調試Hive時,遇到找不到hive-default.xml的解決方法:http://bbs.hadoopor.com/thread-292-1-1.html
16、bin/hadoop jps后報如下異常:
Exception in thread "main" java.lang.NullPointerException
at sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127)
at sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133)
at sun.tools.jps.Jps.main(Jps.java:45)
原因為:
系統根目錄/tmp文件夾被刪除了。重新建立/tmp文件夾即可。
bin/hive
中出現 unable to create log directory /tmp/...也可能是這個原因
17、Problem: Storage directory not exist
2010-02-09 21:37:49,890 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = yijian/192.168.0.13
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.1
STARTUP_MSG: build = http://svn.apache.org/repos/asf/ ... /release-0.20.1-rc1 -r 810220; compiled by 'oom' on Tue Sep 1 20:55:56 UTC 2009
************************************************************/
2010-02-09 21:37:52,093 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=8888
2010-02-09 21:37:52,125 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: 127.0.0.1/127.0.0.1:8888
2010-02-09 21:37:52,140 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-02-09 21:37:52,156 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2010-02-09 21:37:53,000 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=jian,None,root,Administrators,Users
2010-02-09 21:37:53,000 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-02-09 21:37:53,000 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2010-02-09 21:37:53,031 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2010-02-09 21:37:53,046 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2010-02-09 21:37:53,203 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directoryD:/hadoop/run/dfs_name_dir does not exist.
2010-02-09 21:37:53,203 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory D:/hadoop/run/dfs_name_dir is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
2010-02-09 21:37:53,234 INFO org.apache.hadoop.ipc.Server: Stopping server on 8888
2010-02-09 21:37:53,234 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory D:/hadoop/run/dfs_name_dir is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
2010-02-09 21:37:53,250 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at yijian/192.168.0.13
************************************************************/
solution: 是因為存儲目錄D:/hadoop/run/dfs_name_dir不存在,所以只需要手動創建好這個目錄即可。
18、Problem: NameNode is not formatted
2010-02-09 21:52:49,343 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = yijian/192.168.0.13
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.1
STARTUP_MSG: build = http://svn.apache.org/repos/asf/ ... /release-0.20.1-rc1 -r 810220; compiled by 'oom' on Tue Sep 1 20:55:56 UTC 2009
************************************************************/
2010-02-09 21:52:49,531 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=8888
2010-02-09 21:52:49,531 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: 127.0.0.1/127.0.0.1:8888
2010-02-09 21:52:49,546 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-02-09 21:52:49,546 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2010-02-09 21:52:50,250 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=jian,None,root,Administrators,Users
2010-02-09 21:52:50,250 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-02-09 21:52:50,250 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2010-02-09 21:52:50,265 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2010-02-09 21:52:50,265 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2010-02-09 21:52:50,359 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:317)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
2010-02-09 21:52:50,359 INFO org.apache.hadoop.ipc.Server: Stopping server on 8888
2010-02-09 21:52:50,359 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:317)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
2010-02-09 21:52:50,359 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at yijian/192.168.0.13
************************************************************/
solution: 是因為HDFS還沒有格式化,只需要運行hadoop namenode -format一下,然后再啟動即可
19、中文問題
從url中解析出中文,但hadoop中打印出來仍是亂碼?我們曾經以為hadoop是不支持中文的,后來經過查看源代碼,發現hadoop僅僅是不支持以gbk格式輸出中文而己。
這是TextOutputFormat.class中的代碼,hadoop默認的輸出都是繼承自FileOutputFormat來的,FileOutputFormat的兩個子類一個是基於二進制流的輸出,一個就是基於文本的輸出TextOutputFormat。
public class TextOutputFormat<K, V> extends FileOutputFormat<K, V> {
protected static class LineRecordWriter<K, V>
implements RecordWriter<K, V> {
private static final String utf8 = “UTF-8″;//這里被寫死成了utf-8
private static final byte[] newline;
static {
try {
newline = “/n”.getBytes(utf8);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
}
}
…
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
this.out = out;
try {
this.keyValueSeparator = keyValueSeparator.getBytes(utf8);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
}
}
…
private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0, to.getLength());//這里也需要修改
} else {
out.write(o.toString().getBytes(utf8));
}
}
…
}
可以看出hadoop默認的輸出寫死為utf-8,因此如果decode中文正確,那么將Linux客戶端的character設為utf-8是可以看到中文的。因為hadoop用utf-8的格式輸出了中文。
因為大多數數據庫是用gbk來定義字段的,如果想讓hadoop用gbk格式輸出中文以兼容數據庫怎么辦?
我們可以定義一個新的類:
public class GbkOutputFormat<K, V> extends FileOutputFormat<K, V> {
protected static class LineRecordWriter<K, V>
implements RecordWriter<K, V> {
//寫成gbk即可
private static final String gbk = “gbk”;
private static final byte[] newline;
static {
try {
newline = “/n”.getBytes(gbk);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
}
}
…
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
this.out = out;
try {
this.keyValueSeparator = keyValueSeparator.getBytes(gbk);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
}
}
…
private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
// Text to = (Text) o;
// out.write(to.getBytes(), 0, to.getLength());
// } else {
out.write(o.toString().getBytes(gbk));
}
}
…
}
然后在mapreduce代碼中加入conf1.setOutputFormat(GbkOutputFormat.class)
即可以gbk格式輸出中文。
20、某次正常運行mapreduce實例時,拋出錯誤
java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
java.io.IOException: Could not get block locations. Aborting…
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
經查明,問題原因是linux機器打開了過多的文件導致。用命令ulimit -n可以發現linux默認的文件打開數目為1024,修改/ect/security/limit.conf,增加hadoop soft 65535
再重新運行程序(最好所有的datanode都修改),問題解決
21、運行一段時間后hadoop不能stop-all.sh的問題,顯示報錯
no tasktracker to stop ,no datanode to stop
問題的原因是hadoop在stop的時候依據的是datanode上的mapred和dfs進程號。而默認的進程號保存在/tmp下,linux默認會每隔一段時間(一般是一個月或者7天左右)去刪除這個目錄下的文件。因此刪掉hadoop-hadoop-jobtracker.pid和hadoop-hadoop-namenode.pid兩個文件后,namenode自然就找不到datanode上的這兩個進程了。
在配置文件中的export HADOOP_PID_DIR可以解決這個問題
22、問題:
Incompatible namespaceIDs in /usr/local/hadoop/dfs/data: namenode namespaceID = 405233244966; datanode namespaceID = 33333244
原因:
在每次執行hadoop namenode -format時,都會為NameNode生成namespaceID,,但是在hadoop.tmp.dir目錄下的DataNode還是保留上次的namespaceID,因為namespaceID的不一致,而導致DataNode無法啟動,所以只要在每次執行hadoop namenode -format之前,先刪除hadoop.tmp.dir目錄就可以啟動成功。請注意是刪除hadoop.tmp.dir對應的本地目錄,而不是HDFS目錄。