社區帖子全文搜索實戰(基於ElasticSearch)


要為社區APP的帖子提供全文搜索的功能,考察使用ElasticSearch實現此功能。

ES的安裝不再描述。

  • es集成中文分詞器(根據es版本選擇對應的插件版本)

  下載源碼:https://github.com/medcl/elasticsearch-analysis-ik
  maven編譯得到:elasticsearch-analysis-ik-1.9.5.zip

  在plugins目錄下創建ik目錄,將elasticsearch-analysis-ik-1.9.5.zip解壓在此目錄。

  • 創建索引(settings,mapping)

  配置

{
    "settings":{
        "number_of_shards":5,
        "number_of_replicas":1
    },
    "mappings":{
        "post":{
            "dynamic":"strict",
            "properties":{
                "id":{"type":"integer","store":"yes"},
                "title":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},
                "content":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},
                "author":{"type":"string","store":"yes","index":"no"},
                "time":{"type":"date","store":"yes","index":"no"}
            }
        }
    }
}

  執行命令,創建索引

  curl -XPOST 'spark2:9200/community' -d @post.json

  •  插入數據

  工程代碼依賴的jar包

pom.xml
<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch</artifactId>
  <version>2.3.3</version>
</dependency>
<dependency>
  <groupId>com.alibaba</groupId>
  <artifactId>fastjson</artifactId>
  <version>1.2.7</version>
</dependency>

ES client工具類

public class EsClient {

  private static TransportClient transportClient;

  static {
    Settings settings = Settings.builder().put("cluster.name", "es_cluster").build();
    try {
      transportClient = new TransportClient.Builder().settings(settings)
          .build()
          .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark2"), 9300))
          .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark3"), 9300));
    } catch (UnknownHostException e) {
      throw new RuntimeException(e);
    }
  }

  public static TransportClient getInstance() {
    return transportClient;
  }
}

插入數據

TransportClient client = EsClient.getInstance();


    for (int i = 0; i < 10000; i++) {
      Post post = new Post(i + "", "hll", "百度百科", "ES即etamsports ,全名上海英模特制衣有限公司,是法國Etam集團在中國的分支企業,創立於1994年底。ES的服裝適合出游、朋友聚會、晚間娛樂、校園生活等各種輕松", new Date());
      client.prepareIndex("community", "post", post.getId())
          .setSource(JSON.toJSONString(post))
          .execute()
          .actionGet();
    }
  • 查詢,高亮
 TransportClient client = EsClient.getInstance();
    SearchResponse response = client.prepareSearch("community")
        .setTypes("post")
        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
        .setQuery(QueryBuilders.multiMatchQuery("上海", "title", "content")) 
        .setFrom(0).setSize(10)
        .addHighlightedField("content")
        .setHighlighterPreTags("<red>")
        .setHighlighterPostTags("</red>")
        .execute()
        .actionGet();

    SearchHits hits = response.getHits();
    for (SearchHit hit : hits) {
      String s = "";
      System.out.println(hit.getHighlightFields());
      for (Text text : hit.highlightFields().get("content").getFragments()) {
        s += text.string();
      }
      Map<String, Object> source = hit.getSource();
      source.put("content", s);
      System.out.println(source);
    }

查詢結果

 
         

{author=hll, id=782, time=1490165237878, title=百度百科, content=ES即etamsports ,全名<red>上海</red>英模特制衣有限公司,是法國Etam集團在中國的分支企業,創立於1994年底。ES的服裝適合出游、朋友聚會、晚間娛樂、校園生活等各種輕松}

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM