1. 新建IntelliJ下的maven項目
點擊File->New->Project,在彈出的對話框中選擇Maven,JDK選擇你自己安裝的版本,點擊Next
2. 填寫Maven的GroupId和ArtifactId
你可以根據自己的項目隨便填,點擊Next
這樣就新建好了一個空的項目
這里程序名填寫WordCount,我們的程序是一個通用的網上的范例,用來計算文件中單詞出現的次數
3. 設置程序的編譯版本
打開Intellij的Preference偏好設置,定位到Build, Execution, Deployment->Compiler->Java Compiler,
將WordCount的Target bytecode version修改為你的jdk版本(我的是1.8)
4. 配置依賴
編輯pom.xml進行配置
1) 添加apache源
在project內尾部添加
<repositories> <repository> <id>apache</id> <url>http://maven.apache.org</url> </repository> </repositories>
2) 添加hadoop依賴
這里只需要用到基礎依賴hadoop-core和hadoop-common;如果需要讀寫HDFS,
則還需要依賴hadoop-hdfs和hadoop-client;如果需要讀寫HBase,則還需要依賴hbase-client
在project內尾部添加
<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.2</version> </dependency> </dependencies>
修改pom.xml完成后,Intellij右上角會提示Maven projects need to be Imported,點擊Import Changes以更新依賴,或者點擊Enable Auto Import
最后,我的完整的pom.xml如下:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.fun</groupId> <artifactId>hadoop</artifactId> <version>1.0-SNAPSHOT</version> <repositories> <repository> <id>apache</id> <url>http://maven.apache.org</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.2</version> </dependency> </dependencies> <build> <plugins> <plugin> <artifactId>maven-dependency-plugin</artifactId> <configuration> <excludeTransitive>false</excludeTransitive> <stripVersion>true</stripVersion> <outputDirectory>./lib</outputDirectory> </configuration> </plugin> </plugins> </build> </project>
5. 編寫主程序
WordCount.java
/** * Created by jinshilin on 16/12/7. */ import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
6. 配置輸入和輸出結果文件夾
1) 添加和src目錄同級的input文件夾到項目中
在input文件夾中放置一個或多個輸入文件源
我的輸入文件源如下:
test.segmented:
dfdfadgdgag
aadads
fudflcl
cckcer
fadf
dfdfadgdgag
fudflcl
fuck
fuck
fuckfuck
haha
aaa
2) 配置運行參數
在Intellij菜單欄中選擇Run->Edit Configurations,在彈出來的對話框中點擊+,新建一個Application配置。配置Main class為WordCount(可以點擊右邊的...選擇),
Program arguments為input/ output/,即輸入路徑為剛才創建的input文件夾,輸出為output

由於Hadoop的設定,下次運行時務必刪除output文件夾!
好了,運行程序,結果如下:
aaa 1
aadads 1
cckcer 1
dfdfadgdgag 2
fadf 1
fuck 2
fuckfuck 1
fudflcl 2
haha 1
至此,一個簡單的hadoop程序完成!
