windows下通過idea連接hadoop和spark集群

本文轉載自查看原文 2019-04-24 10:57 1293 大數據

###windows下鏈接hadoop集群

1、假如在linux機器上已經搭建好hadoop集群

2、在windows上把hadoop的壓縮包解壓到一個沒有空格的目錄下，比如是D盤根目錄

3、配置環境變量
HADOOP_HOME=D:\hadoop-2.7.7
Path下添加 %HADOOP_HOME%\bin

4、下載相似版本的文件
hadoop.dll #存放在C:\Windows\System32 目錄下
winutils.exe #存放在%HADOOP_HOME%\bin 目錄下

#下載地址：
https://github.com/steveloughran/winutils

5、wordcount
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
* @author: LUGH1
* @date: 2019-4-8
* @description:
*/
public class WordCount {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://192.168.88.130:9000");
Job job = Job.getInstance(conf);
job.setJarByClass(WordCount.class);

job.setMapperClass(WdMapper.class);
job.setReducerClass(WdReducer.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.setInputPaths(job, new Path("/test/word.txt"));
FileOutputFormat.setOutputPath(job, new Path("/test/output"));

boolean result = job.waitForCompletion(true);
System.exit(result?0:1);

System.out.println("good job");
}
}

class WdMapper extends Mapper<Object, Text, Text, IntWritable> {
@Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] split = line.split(" ");
for(String word : split){
context.write(new Text(word), new IntWritable(1));
}
}
}

class WdReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count = 0;
for(IntWritable i : values){
count += i.get();
}
context.write(key,new IntWritable(count));
}
}

###windows下鏈接spark集群運行
主要設置：
1、配置master的地址：conf.setMaster("spark://192.168.88.130:7077")
2、配置jar包的位置：conf.setJars(List("hdfs://192.168.88.130:9000/test/sparkT-1.0-SNAPSHOT.jar"))
如上的sparkT-1.0-SNAPSHOT.jar包是通過idea打包然后通過hadoop fs -put上傳在hdfs上的

#代碼
import org.apache.spark.{SparkConf, SparkContext}

object sparkTest {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("test").setMaster("spark://192.168.88.130:7077")
// conf.set("spark.driver.host","192.168.88.1")
conf.setJars(List("hdfs://192.168.88.130:9000/test/sparkT-1.0-SNAPSHOT.jar"))
val sc = new SparkContext(conf)
// val path = "E:\\java_product\\test.txt"
val rdd = sc.textFile("hdfs://192.168.88.130:9000/test/word.txt")
// val rdd = sc.textFile("E:\\java_product\\test.txt")
val count = rdd.flatMap(line=>line.split(" ")).map(x=>(x,1)).reduceByKey(_+_)
count.collect().foreach(println) //.saveAsTextFile("hdfs://192.168.88.130:9000/test/wordoupt1")
}

}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 windows下eclipse遠程連接hadoop集群開發mapreduce 在 windows 下搭建 IDEA + Spark 連接 Hive 的環境 Mac OS下搭建Hadoop + Spark集群 Windows下Eclipse連接hadoop idea連接本地虛擬機Hadoop集群運行wordcount Windows下搭建Spark+Hadoop開發環境（二）win7下用Intelij IDEA 遠程調試spark standalone 集群 hadoop+spark集群搭建 spark集群安裝並集成到hadoop集群 Windows下IntelliJ IDEA中調試Spark Standalone