在創建自定義的Mapper時候,編譯正確,但上傳到集群執行時出現錯誤:
11/16/05 22:53:16 INFO mapred.JobClient: Task Id : attempt_201111301626_0015_m_000000_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: actiondemo.MyJob$MapClass at org.apache.Hadoop.conf.Configuration.getClass(Configuration.java:866) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ClassNotFoundException: actiondemo.MyJob$MapClass at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:864) ... 8 more
程序開發的步驟是 在Windows的eclipse下開發,然后倒出JAR包,將JAR包放在集群上運行。
eclipse的jdk的版本不能高於hadoop里面配置的jdk的版本,否則會報錯。(具體是什么類型的錯誤,記不到了)
問題原因及解決辦法如下:
因為使用的是0.20以上的Hadoop版本,在調用jar中的自定義mapper時,需要設置setJarByClass方法,設置方法如下:
job.setJarByClass(MyJob.class);
其實,在輸出日志中也有提示信息:
11/16/05 22:53:03 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
job.setJarByClass(MyJob.class);