hadoop 讀寫 elasticsearch 初探


1、參考文檔:

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.html

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html#_emphasis_old_emphasis_literal_org_apache_hadoop_mapred_literal_api

 

2、Mapreduce相關配置

 

 

//以下ES配置主要是提供給ES的Format類進行讀取使用

Configuration conf = new Configuration();

conf.set(ConfigurationOptions.ES_NODES, "127.0.0.1");

conf.set(ConfigurationOptions.ES_PORT, "9200");

conf.set(ConfigurationOptions.ES_INDEX_AUTO_CREATE, "yes");

//設置讀取和寫入的資源index/type

conf.set(ConfigurationOptions.ES_RESOURCE, "helloes/demo"); //read Target index/type

 

 

 

//假如只是想檢索部分數據,可以配置ES_QUERY

//conf.set(ConfigurationOptions.ES_QUERY, "?q=me*");

 

//配置Elasticsearch為hadoop開發的format等

Job job = Job.getInstance(conf,ElasticsearchIndexMapper.class.getSimpleName());

job.setJarByClass(ElasticsearchIndexBuilder.class);

job.setSpeculativeExecution(false);//Disable speculative execution

job.setInputFormatClass(EsInputFormat.class);   

 

//假如數據輸出到HDFS,指定Map的輸出Value的格式。並且選擇Text格式

job.setOutputFormatClass(TextOutputFormat.class);

job.setMapOutputValueClass(Text.class);

job.setMapOutputKeyClass(NullWritable.class);   

 

 

//如果選擇輸入到ES

job.setOutputFormatClass(EsOutputFormat.class);//輸出到

job.setMapOutputValueClass(LinkedMapWritable.class);//輸出的數值類 

job.setMapOutputKeyClass(Text.class);   //輸出的Key值類

 

 

job.setMapperClass(ElasticsearchIndexMapper.class);

FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/es_input"));

FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/es_output"));

job.setNumReduceTasks(0);

job.waitForCompletion(true);

 

3、對應的Mapper類ElasticsearchIndexMapper

public class ElasticsearchIndexMapper extends Mapper {

@Override

protected void map(Object key, Object value, Context context)

        throws IOException, InterruptedException {

//假如我這邊只是想導出數據到HDFS

 

  LinkedMapWritable doc = (LinkedMapWritable) value;   

  Text docVal = new Text();

   docVal.set(doc.toString());

  context.write(NullWritable.get(), docVal);

}

}

4、小結

hadoop-ES讀寫最主要的就是ESInputFormat、ESOutputFormat的參數配置(Configuration)。

另外 其它數據源操作(Mysql等)也是類似,找到對應的InputFormat,OutputFormat配置上環境參數。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM