Mapreduce實例--求平均值

本文轉載自查看原文 2019-11-17 13:46 476

求平均數是MapReduce比較常見的算法，求平均數的算法也比較簡單，一種思路是Map端讀取數據，在數據輸入到Reduce之前先經過shuffle，將map函數輸出的key值相同的所有的value值形成一個集合value-list，然后將輸入到Reduce端，Reduce端匯總並且統計記錄數，然后作商即可。具體原理如下圖所示：

操作環境：

Centos 7

jdk 1.8

hadoop-3.2.0

IDEA2019

實現內容：

將自定義的電商關於商品點擊情況的數據文件，包含兩個字段(商品分類，商品點擊次數)，用"\t"分割，類似如下：

商品分類 商品點擊次數
52127    5
52120    93
52092    93
52132    38
52006    462
52109    28
52109    43
52132    0
52132    34
52132    9
52132    30
52132    45
52132    24
52009    2615
52132    25
52090    13
52132    6
52136    0
52090    10
52024    347

使用mapreduce統計出每類商品的平均點擊次數

商品分類 商品平均點擊次數
52006    462
52009    2615
52024    347
52090    11
52092    93
52109    35
52120    93
52127    5
52132    23
52136    0

一、啟動Hadoop集群，上傳數據集文件到hdfs

hadoop fs -mkdir -p /mymapreduce4/in
hadoop fs -put /data/mapreduce4/goods_click /mymapreduce4/in

二、在IDEA中創建java項目，導入Jar包，如果清楚自己使用的集群的jar包，可使用maven導入指定的jar包。

　　為了避免Jar包沖突，使用hadoop/share目錄下的所有jar包。

三、編寫java代碼程序

package mapreduce;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MyAverage{
    public static class Map extends Mapper<Object , Text , Text , IntWritable>{
        private static Text newKey=new Text();
        public void map(Object key,Text value,Context context) throws IOException, InterruptedException{
            //將輸入的純文本文件數據轉化成string
            String line=value.toString();
            System.out.println(line);
            //將值通過split()方法截取出來
            String arr[]=line.split("\t");
            newKey.set(arr[0]);
            int click=Integer.parseInt(arr[1]);
            //將數據和值輸入到reduce處理
            context.write(newKey, new IntWritable(click));
        }
    }
    public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{
        public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
            int num=0;
            int count=0;
            for(IntWritable val:values){
                //每個元素求和num
                num+=val.get();
                //統計元素的次數count
                count++;
            }
            //統計次數
            int avg=num/count;
            context.write(key,new IntWritable(avg));
        }
    }
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{
        Configuration conf=new Configuration();
        System.out.println("start");
        Job job =new Job(conf,"MyAverage");
        job.setJarByClass(MyAverage.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        Path in=new Path("hdfs://172.18.74.137:9000/mymapreduce4/in/goods_click");
        Path out=new Path("hdfs://172.18.74.137:9000/mymapreduce4/out");
        FileInputFormat.addInputPath(job,in);
        FileOutputFormat.setOutputPath(job,out);
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
}

四、運行查看結果

hadoop fs -ls /mymapreduce4/out
hadoop fs -cat /mymapreduce4/out/part-r-00000

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 numpy模塊求平均值 Flink實現求平均值 JavaScript怎么求多個數的平均值 Spark實現求平均值一維數組求平均值 plsql求平均值的幾種寫法 Matlab求平均值函數mean es6求平均值 -- reduce() mysql 大數據量求平均值 python 錄入姓名和成績, 並且求平均值