2015-3-27
參考:
http://www.cnblogs.com/baixl/p/4154429.html
http://blog.csdn.net/u010911997/article/details/44099165
============================================
hadoop在虛擬機上(遠程連接也是一樣只需要知道master的ip和core-site.xml配置即可。
Vmware上搭建了hadoop分布式平台:
192.168.47.133 master
192.168.47.134 slave1
192.168.47.135 slave2
core-site.xml 配置文件:
<property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> <description>The name of the default file system.</description> </property>
1 下載插件
hadoop-eclipse-plugin-2.6.0.jar
github上下載源碼后需要自己編譯。這里使用已經編譯好的插件即可
2 配置插件
把插件放到..\eclipse\plugins目錄下,重啟eclipse,配置Hadoop installation directory ,
如果插件安裝成功,打開Windows—Preferences后,在窗口左側會有Hadoop Map/Reduce選項,點擊此選項,在窗口右側設置Hadoop安裝路徑。(windows下只需把hadoop-2.5.1.tar.gz解壓到指定目錄)
3 配置Map/Reduce Locations
打開Windows—Open Perspective—Other,選擇Map/Reduce,點擊OK,控制台會出現:
右鍵 new Hadoop location 配置hadoop:輸入
Location Name,任意名稱即可.
配置Map/Reduce Master和DFS Mastrer,Host和Port配置成與core-site.xml的設置一致即可。
點擊"Finish"按鈕,關閉窗口。
點擊左側的DFSLocations—>master (上一步配置的location name),如能看到user,表示安裝成功
4 wordcount實例
File—>Project,選擇Map/Reduce Project,輸入項目名稱WordCount等。在WordCount項目里新建class,名稱為WordCount,代碼如下:
import java.io.IOException; import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount { public static class TokenizerMapper extendsMapper<Object,Text,Text,IntWritable>{ private final static IntWritable one=new IntWritable(1); private Text word =new Text(); public void map(Object key,Text value,Context context) throwsIOException,InterruptedException{ StringTokenizer itr=new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extendsReducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values,Contextcontext) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, newPath("hdfs://192.168.11.134:9000/in/test*.txt"));//路徑1 FileOutputFormat.setOutputPath(job, newPath("hdfs://192.168.11.134:9000/output"));//輸出路徑 System.exit(job.waitForCompletion(true) ? 0 : 1); } } |
上面的路徑1 和路徑2 由於在代碼中已經定義,這不需要在配置文件中定義,若上面路徑1和路徑2 代碼為:
FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); |
這需要配置運行路徑:類 右鍵 Run As—>Run Configurations
紅色部分為配置的hdfs上文件路徑,
點擊run 或或者:Run on Hadoop,運行結果會顯示在DFS Locations。若運行中有更新,右鍵DFS Locations,點disconnect更新
運行結果:
5 問題及解決辦法
5.1 出現 空指針異常:
1 在Hadoop的bin目錄下放winutils.exe,
2 在環境變量中配置 HADOOP_HOME,
3 hadoop.dll拷貝到C:\Windows\System32下面即可
下載地址:
http://mail-archives.apache.org/mod_mbox/incubator-slider-commits/201411.mbox/%3Ce263738846864bfda0dd6c17a7457988@git.apache.org%3E
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/29483696/bin/windows/hadoop-2.6.0-SNAPSHOT/bin/winutils.exe
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/29483696/bin/windows/hadoop-2.6.0-SNAPSHOT/bin/hadoop.dll
問題1:在DFS Lcation 上不能多文件進行操作:
在hadoop上的每個節點上修改該文件 conf/mapred-site.xml
增加:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
關閉權限驗證
問題2
log4j:WARN No appenders could be foundfor logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4jsystem properly.
log4j:WARN Seehttp://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
在src文件夾下創建以log4j.properties命名的文件
文件內容如下
log4j.rootLogger=WARN, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d%p [%c] - %m%n
問題3
java.io.IOException: Could not locateexecutable null/bin/winutils.exe in the Hadoop binaries.
缺少winutils.exe 下載一個添加進去就行
下載地址 http://download.csdn.net/detail/u010911997/8478049
問題4
Exceptionin thread "main" java.lang.UnsatisfiedLinkError:org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V
這是由於hadoop.dll 版本問題,2.4之前的和自后的需要的不一樣
需要選擇正確的版本並且在 Hadoop/bin和 C:\windows\system32 上將其替換
問題5
Exception in thread "main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
atorg.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)
目前未找到解決方法,只能修改源代碼
源代碼下載 http://pan.baidu.com/s/1jGJzVSy
將源代碼放入 工程的src目錄下並創建同樣的包名,然后修改源代碼
源代碼 未修改前
publicstaticbooleanaccess(String path, AccessRight desiredAccess)
throws IOException {
return access0(path,desiredAccess.accessRight());
}
源代碼 修改后
public staticbooleanaccess(String path, AccessRight desiredAccess)
throws IOException {
return ture;
// return access0(path,desiredAccess.accessRight());
}
修改后編譯成功,但是看不到軟件運行時候的信息反饋