異常信息:
java.net.UnknownHostException: unknown host: xxx-host
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:244)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1234)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:49)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:234)
症狀:
1. 在提交Job機器上ping xxx-host 正常。
2. 在提交Job機器上hadoop fs -ls hdfs://xxx-host:9000/xxx-path 正常
3. Job能正確讀取輸入數據:“INFO input.FileInputFormat: Total input paths to process : 80”
4. 然后立馬就開始報上述的異常信息。
解決方法:
這類問題集中在hosts文件上,包括集群中所有機器的hosts文件
1. 集群中有的機器沒有配置xxx-host (我的問題就是這么解決的)
2. 可能由於編碼問題,導致某個xxx-host失效