緊接上篇,完成Hadoop的安裝並跑起來之后,是該運行相關例子的時候了,而最簡單最直接的例子就是HelloWorld式的WordCount例子。
參照博客進行運行:http://xiejianglei163.blog.163.com/blog/static/1247276201443152533684/
首先創建一個文件夾,並創建兩個文件,目錄隨意,為以下文件結構:
examples
--file1.txt
--file2.txt
文件內容隨意填寫,我是從新聞copy下來的一段英文:
執行以下命令:
hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -mkdir /data #在hadoop中創建/data文件夾,該文件夾用來存放輸入數據,這個文件不是Linux的根目錄下的文件,而是hadoop下的文件夾
hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -put -f ./data_input/* /data #將前面生成的兩個 文件拷貝至/data下
執行WordCount命令,並查看結果:
hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar org.apache.hadoop.examples.WordCount /data /output 14/07/22 22:34:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/22 22:34:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/07/22 22:34:29 INFO input.FileInputFormat: Total input paths to process : 2 14/07/22 22:34:29 INFO mapreduce.JobSubmitter: number of splits:2 14/07/22 22:34:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406038146260_0001 14/07/22 22:34:32 INFO impl.YarnClientImpl: Submitted application application_1406038146260_0001 14/07/22 22:34:32 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1406038146260_0001/ 14/07/22 22:34:32 INFO mapreduce.Job: Running job: job_1406038146260_0001 14/07/22 22:34:58 INFO mapreduce.Job: Job job_1406038146260_0001 running in uber mode : false 14/07/22 22:34:58 INFO mapreduce.Job: map 0% reduce 0% 14/07/22 22:35:34 INFO mapreduce.Job: map 100% reduce 0% 14/07/22 22:35:52 INFO mapreduce.Job: map 100% reduce 100% 14/07/22 22:35:52 INFO mapreduce.Job: Job job_1406038146260_0001 completed successfully 14/07/22 22:35:53 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=2521 FILE: Number of bytes written=283699 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2280 HDFS: Number of bytes written=1710 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=71182 Total time spent by all reduces in occupied slots (ms)=13937 Total time spent by all map tasks (ms)=71182 Total time spent by all reduce tasks (ms)=13937 Total vcore-seconds taken by all map tasks=71182 Total vcore-seconds taken by all reduce tasks=13937 Total megabyte-seconds taken by all map tasks=72890368 Total megabyte-seconds taken by all reduce tasks=14271488 Map-Reduce Framework Map input records=29 Map output records=274 Map output bytes=2814 Map output materialized bytes=2527 Input split bytes=202 Combine input records=274 Combine output records=195 Reduce input groups=190 Reduce shuffle bytes=2527 Reduce input records=195 Reduce output records=190 Spilled Records=390 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=847 CPU time spent (ms)=6410 Physical memory (bytes) snapshot=426119168 Virtual memory (bytes) snapshot=1953292288 Total committed heap usage (bytes)=256843776 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=2078 File Output Format Counters Bytes Written=1710 hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$
上面的日志顯示出了wordCount的詳細情況,然后執行查看結果命令查看統計結果:
hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -cat /output/part-r-00000 14/07/22 22:38:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable "as 1 "atrocious," 1 - 1 10-day 1 13 1 18 1 20, 1 2006. 1 3,000 1 432 1 65 1 7.4.52 1 :help 2 :help<Enter> 1 :q<Enter> 1 <F1> 1 Already, 1 Ban 1 Benjamin 1
后面省略了很多統計數據,wordCount統計結果完成。