[Linux][Hadoop] 運行WordCount例子

本文轉載自查看原文 2014-07-22 22:44 2979 Java/ wordcount/ Linux/ linux/ hadoop/ demo

緊接上篇，完成Hadoop的安裝並跑起來之后，是該運行相關例子的時候了，而最簡單最直接的例子就是HelloWorld式的WordCount例子。

參照博客進行運行：http://xiejianglei163.blog.163.com/blog/static/1247276201443152533684/

首先創建一個文件夾，並創建兩個文件，目錄隨意，為以下文件結構：

examples

--file1.txt

--file2.txt

文件內容隨意填寫，我是從新聞copy下來的一段英文：

執行以下命令：

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -mkdir /data    #在hadoop中創建/data文件夾，該文件夾用來存放輸入數據，這個文件不是Linux的根目錄下的文件，而是hadoop下的文件夾

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -put -f ./data_input/* /data #將前面生成的兩個 文件拷貝至/data下

執行WordCount命令，並查看結果：

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar org.apache.hadoop.examples.WordCount /data /output
14/07/22 22:34:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/22 22:34:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/07/22 22:34:29 INFO input.FileInputFormat: Total input paths to process : 2
14/07/22 22:34:29 INFO mapreduce.JobSubmitter: number of splits:2
14/07/22 22:34:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406038146260_0001
14/07/22 22:34:32 INFO impl.YarnClientImpl: Submitted application application_1406038146260_0001
14/07/22 22:34:32 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1406038146260_0001/
14/07/22 22:34:32 INFO mapreduce.Job: Running job: job_1406038146260_0001
14/07/22 22:34:58 INFO mapreduce.Job: Job job_1406038146260_0001 running in uber mode : false
14/07/22 22:34:58 INFO mapreduce.Job:  map 0% reduce 0%
14/07/22 22:35:34 INFO mapreduce.Job:  map 100% reduce 0%
14/07/22 22:35:52 INFO mapreduce.Job:  map 100% reduce 100%
14/07/22 22:35:52 INFO mapreduce.Job: Job job_1406038146260_0001 completed successfully
14/07/22 22:35:53 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=2521
                FILE: Number of bytes written=283699
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2280
                HDFS: Number of bytes written=1710
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=71182
                Total time spent by all reduces in occupied slots (ms)=13937
                Total time spent by all map tasks (ms)=71182
                Total time spent by all reduce tasks (ms)=13937
                Total vcore-seconds taken by all map tasks=71182
                Total vcore-seconds taken by all reduce tasks=13937
                Total megabyte-seconds taken by all map tasks=72890368
                Total megabyte-seconds taken by all reduce tasks=14271488
        Map-Reduce Framework
                Map input records=29
                Map output records=274
                Map output bytes=2814
                Map output materialized bytes=2527
                Input split bytes=202
                Combine input records=274
                Combine output records=195
                Reduce input groups=190
                Reduce shuffle bytes=2527
                Reduce input records=195
                Reduce output records=190
                Spilled Records=390
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=847
                CPU time spent (ms)=6410
                Physical memory (bytes) snapshot=426119168
                Virtual memory (bytes) snapshot=1953292288
                Total committed heap usage (bytes)=256843776
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=2078
        File Output Format Counters 
                Bytes Written=1710
hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$

上面的日志顯示出了wordCount的詳細情況，然后執行查看結果命令查看統計結果：

hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -cat /output/part-r-00000
14/07/22 22:38:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
"as     1
"atrocious,"    1
-       1
10-day  1
13      1
18      1
20,     1
2006.   1
3,000   1
432     1
65      1
7.4.52  1
:help   2
:help<Enter>    1
:q<Enter>       1
<F1>    1
Already,        1
Ban     1
Benjamin        1

后面省略了很多統計數據，wordCount統計結果完成。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 hadoop的wordcount例子運行 RedHat 安裝Hadoop並運行wordcount例子三.hadoop mapreduce之WordCount例子 Hadoop3.1.1運行自帶例子wordcount發生的錯誤 Hadoop集群（第6期）_WordCount運行詳解關於hadoop 2.6 運行WordCount 應該注意的問題 hadoop本地運行wordcount報錯解決，並成功運行 hadoop2編譯運行wordCount.java程序在命令行中運行Hadoop自帶的WordCount程序 hadoop:將WordCount打包成獨立運行的jar包