Hadoop on Mac with IntelliJ IDEA - 1 解決input path does not exist問題

本文轉載自查看原文 2014-09-25 14:01 3114 Hadoop

本文講述使用IntelliJ IDEA時遇到Hadoop提示input path does not exist（輸入路徑不存在）的解決過程。

環境：Mac OS X 10.9.5, IntelliJ IDEA 13.1.4, Hadoop 1.2.1

Hadoop放在虛擬機中，宿主機通過SSH連接，IDE和數據文件在宿主機。

這是自學Hadoop的第三天。以前做過點.NET開發，Mac、IntelliJ IDEA、Hadoop、CentOS對我而言，相當陌生。第一份Hadoop代碼就遇到了問題。

以下代碼摘自《Hadoop In Action》第4章第1份代碼。

 1 public class MyJob extends Configured implements Tool {
 2     public static class MapClass extends MapReduceBase
 3             implements Mapper<Text, Text, Text, Text> {
 4         @Override
 5         public void map(Text key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
 6                 throws IOException {
 7             output.collect(value, key);
 8         }
 9     }
10 
11 
12     public static class Reduce extends MapReduceBase
13             implements Reducer<Text, Text, Text, Text> {
14         @Override
15         public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
16             String csv = "";
17             while (values.hasNext()) {
18                 if (csv.length() > 0) {
19                     csv += ", ";
20                 }
21                 csv += values.next().toString();
22             }
23             output.collect(key, new Text(csv));
24         }
25     }
26 
27     @Override
28     public int run(String[] args) throws Exception {
29         Configuration configuration = getConf();
30 
31         JobConf job = new JobConf(configuration, MyJob.class);
32 
33         Path in = new Path(args[0]);
34         Path out = new Path(args[1]);
35 
36         FileInputFormat.setInputPaths(job, in);
37         FileOutputFormat.setOutputPath(job, out);
38 
39         job.setJobName("MyJob");
40         job.setMapperClass(MapClass.class);
41         job.setReducerClass(Reduce.class);
42 
43         job.setInputFormat(KeyValueTextInputFormat.class);
44         job.setOutputFormat(TextOutputFormat.class);
45         job.setOutputKeyClass(Text.class);
46         job.setOutputValueClass(Text.class);
47         job.set("key.value.separator.in.input.line", ",");
48 
49         JobClient.runJob(job);
50 
51         return 0;
52     }
53 
54     public static void main(String[] args) {
55         try {
56             int res = ToolRunner.run(new Configuration(), new MyJob(), args);
57             System.exit(res);
58         } catch (Exception e) {
59             e.printStackTrace();
60         }
61     }
62 }

主函數做了異常處理，其余和原書一致。

直接在IDEA中執行代碼，數據文件目錄和書上不同，故命令行參數和原書略有差別，如下：

/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt output

IDEA的配置如圖

數據文件路徑如圖

以上配置無拼寫錯誤。然后，我很高興地按下'Run MyJob.main()' ，准備等結果，繼續跟着書走。

悲劇了，IDEA輸出input path does not exist。輸入路徑是/Users/michael/IdeaProjects/Hadoop/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt，這不是Working directory拼上我給的第一個參數么，怎么回事。

整份代碼，就run方法中用了Path，應該是這邊的問題。

在FileOutputFormat.setOutputPath(job, out);后面加上System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());發現輸入路徑真的被合並到工作路徑下了。怪不得報錯呢（StackOverflow中有人說是我的數據文件沒提交到Hadoop才會報這個錯誤）。

現在，可以判斷問題是FileInputFormat.setInputPaths(job, in);導致的。進源碼看看它是怎么工作的。

  /**
   * Set the array of {@link Path}s as the list of inputs
   * for the map-reduce job.
   * 
   * @param conf Configuration of the job. 
   * @param inputPaths the {@link Path}s of the input directories/files 
   * for the map-reduce job.
   */ 
  public static void setInputPaths(JobConf conf, Path... inputPaths) {
    Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
    StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
    for(int i = 1; i < inputPaths.length;i++) {
      str.append(StringUtils.COMMA_STR);
      path = new Path(conf.getWorkingDirectory(), inputPaths[i]);
      str.append(StringUtils.escapeString(path.toString()));
    }
    conf.set("mapred.input.dir", str.toString());
  }

可以看到，源碼第一句就是合並conf和inputPaths。既然合並了工作路徑，那就把它去掉好了。

在FileInputFormat.setInputPaths(job, in);前保存合並前結果

　　Path workingDirectoryBak = job.getWorkingDirectory();

再設置為根目錄

　　job.setWorkingDirectory(new Path("/"));

然后在它后面設置回來

　　job.setWorkingDirectory(workingDirectoryBak);

加上輸出，確認操作結果

　　System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());

新代碼如下，mac下的輸入法不好用，直接中式英語寫注釋

 1 public int run(String[] args) throws Exception {
 2         Configuration configuration = getConf();
 3 
 4         JobConf job = new JobConf(configuration, MyJob.class);
 5 
 6         Path in = new Path(args[0]);
 7         Path out = new Path(args[1]);
 8 
 9         // backup current directory, namely /Users/michael/IdeaProjects/Hadoop where source located
10         Path workingDirectoryBak = job.getWorkingDirectory();
11         // set to root dir
12         job.setWorkingDirectory(new Path("/"));
13         // let it combine root and input path
14         FileInputFormat.setInputPaths(job, in);
15         // set it back
16         job.setWorkingDirectory(workingDirectoryBak);
17         // print to confirm
18         System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());
19         
20         FileOutputFormat.setOutputPath(job, out);
21 
22         job.setJobName("MyJob");
23         job.setMapperClass(MapClass.class);
24         job.setReducerClass(Reduce.class);
25 
26         job.setInputFormat(KeyValueTextInputFormat.class);
27         job.setOutputFormat(TextOutputFormat.class);
28         job.setOutputKeyClass(Text.class);
29         job.setOutputValueClass(Text.class);
30         job.set("key.value.separator.in.input.line", ",");
31 
32         JobClient.runJob(job);
33 
34         return 0;
35     }

再試一次，正常，將近1分鍾執行完，配置差就是這樣。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hadoop問題：Input path does not exist: hdfs://Master:9000/user/hadoop/input Hadoop on Mac with IntelliJ IDEA - 5 解決java heap space問題 SVN 問題解決之 Working copy path does not exist in repository Hadoop問題：The auxService:mapreduce_shuffle does not exist Qt Error: dependent ‘.h‘ does not exist問題解決解決 java.io.FileNotFoundException: class path resource [applicationContext.xml] cannot be opened because it does not exist Mac 下解決修改IntelliJ IDEA 由於修改配置之后無法啟動問題 class path resource [processes/] cannot be resolved to URL because it does not exist class path resource [mapper/] cannot be resolved to URL because it does not exist 使用Quartus進行功能仿真時出現“testbench_vector_input_file option does not exist”的解決方法