本文講述使用IntelliJ IDEA時遇到Hadoop提示input path does not exist(輸入路徑不存在)的解決過程。
環境:Mac OS X 10.9.5, IntelliJ IDEA 13.1.4, Hadoop 1.2.1
Hadoop放在虛擬機中,宿主機通過SSH連接,IDE和數據文件在宿主機。
這是自學Hadoop的第三天。以前做過點.NET開發,Mac、IntelliJ IDEA、Hadoop、CentOS對我而言,相當陌生。第一份Hadoop代碼就遇到了問題。
以下代碼摘自《Hadoop In Action》第4章第1份代碼。
1 public class MyJob extends Configured implements Tool { 2 public static class MapClass extends MapReduceBase 3 implements Mapper<Text, Text, Text, Text> { 4 @Override 5 public void map(Text key, Text value, OutputCollector<Text, Text> output, Reporter reporter) 6 throws IOException { 7 output.collect(value, key); 8 } 9 } 10 11 12 public static class Reduce extends MapReduceBase 13 implements Reducer<Text, Text, Text, Text> { 14 @Override 15 public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { 16 String csv = ""; 17 while (values.hasNext()) { 18 if (csv.length() > 0) { 19 csv += ", "; 20 } 21 csv += values.next().toString(); 22 } 23 output.collect(key, new Text(csv)); 24 } 25 } 26 27 @Override 28 public int run(String[] args) throws Exception { 29 Configuration configuration = getConf(); 30 31 JobConf job = new JobConf(configuration, MyJob.class); 32 33 Path in = new Path(args[0]); 34 Path out = new Path(args[1]); 35 36 FileInputFormat.setInputPaths(job, in); 37 FileOutputFormat.setOutputPath(job, out); 38 39 job.setJobName("MyJob"); 40 job.setMapperClass(MapClass.class); 41 job.setReducerClass(Reduce.class); 42 43 job.setInputFormat(KeyValueTextInputFormat.class); 44 job.setOutputFormat(TextOutputFormat.class); 45 job.setOutputKeyClass(Text.class); 46 job.setOutputValueClass(Text.class); 47 job.set("key.value.separator.in.input.line", ","); 48 49 JobClient.runJob(job); 50 51 return 0; 52 } 53 54 public static void main(String[] args) { 55 try { 56 int res = ToolRunner.run(new Configuration(), new MyJob(), args); 57 System.exit(res); 58 } catch (Exception e) { 59 e.printStackTrace(); 60 } 61 } 62 }
主函數做了異常處理,其余和原書一致。
直接在IDEA中執行代碼,數據文件目錄和書上不同,故命令行參數和原書略有差別,如下:
/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt output
IDEA的配置如圖

數據文件路徑如圖

以上配置無拼寫錯誤。然后,我很高興地按下'Run MyJob.main()' ,准備等結果,繼續跟着書走。
悲劇了,IDEA輸出input path does not exist。輸入路徑是/Users/michael/IdeaProjects/Hadoop/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt,這不是Working directory拼上我給的第一個參數么,怎么回事。
整份代碼,就run方法中用了Path,應該是這邊的問題。
在FileOutputFormat.setOutputPath(job, out);后面加上System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());發現輸入路徑真的被合並到工作路徑下了。怪不得報錯呢(StackOverflow中有人說是我的數據文件沒提交到Hadoop才會報這個錯誤)。
現在,可以判斷問題是FileInputFormat.setInputPaths(job, in);導致的。進源碼看看它是怎么工作的。
/** * Set the array of {@link Path}s as the list of inputs * for the map-reduce job. * * @param conf Configuration of the job. * @param inputPaths the {@link Path}s of the input directories/files * for the map-reduce job. */ public static void setInputPaths(JobConf conf, Path... inputPaths) { Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]); StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString())); for(int i = 1; i < inputPaths.length;i++) { str.append(StringUtils.COMMA_STR); path = new Path(conf.getWorkingDirectory(), inputPaths[i]); str.append(StringUtils.escapeString(path.toString())); } conf.set("mapred.input.dir", str.toString()); }
可以看到,源碼第一句就是合並conf和inputPaths。 既然合並了工作路徑,那就把它去掉好了。
在FileInputFormat.setInputPaths(job, in);前保存合並前結果
Path workingDirectoryBak = job.getWorkingDirectory();
再設置為根目錄
job.setWorkingDirectory(new Path("/"));
然后在它后面設置回來
job.setWorkingDirectory(workingDirectoryBak);
加上輸出,確認操作結果
System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());
新代碼如下,mac下的輸入法不好用,直接中式英語寫注釋
1 public int run(String[] args) throws Exception { 2 Configuration configuration = getConf(); 3 4 JobConf job = new JobConf(configuration, MyJob.class); 5 6 Path in = new Path(args[0]); 7 Path out = new Path(args[1]); 8 9 // backup current directory, namely /Users/michael/IdeaProjects/Hadoop where source located 10 Path workingDirectoryBak = job.getWorkingDirectory(); 11 // set to root dir 12 job.setWorkingDirectory(new Path("/")); 13 // let it combine root and input path 14 FileInputFormat.setInputPaths(job, in); 15 // set it back 16 job.setWorkingDirectory(workingDirectoryBak); 17 // print to confirm 18 System.out.println(FileInputFormat.getInputPaths(job)[0].toUri()); 19 20 FileOutputFormat.setOutputPath(job, out); 21 22 job.setJobName("MyJob"); 23 job.setMapperClass(MapClass.class); 24 job.setReducerClass(Reduce.class); 25 26 job.setInputFormat(KeyValueTextInputFormat.class); 27 job.setOutputFormat(TextOutputFormat.class); 28 job.setOutputKeyClass(Text.class); 29 job.setOutputValueClass(Text.class); 30 job.set("key.value.separator.in.input.line", ","); 31 32 JobClient.runJob(job); 33 34 return 0; 35 }
再試一次,正常,將近1分鍾執行完,配置差就是這樣。

