SpringBoot使用Elasticsearch之RestHighLevelClient

本文轉載自查看原文 2020-10-02 20:50 2818 分布式

原文鏈接:https://www.jianshu.com/p/de838a665eec

一、SpringBoot模版方式接入(不建議)

其實一開始是准備用SpringBoot的模版來直接接入使用的，也就是以下這樣的接入方式，也是網上大家都這么說的使用方式。

        <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency>

但是后面看java api的官方文檔

Deprecated in 7.0.0.

The TransportClient is deprecated in favour of the Java High Level REST Client and will be removed in Elasticsearch 8.0. The migration guide describes all the steps needed to migrate.

再看看模版方式引入的源碼

SpringBoot模版方式引入

直接模版方式的java api調用方式，后續官方會不支持了，不建議使用，要使用Java High Level REST Client來代替，Elasticsearch 8.0版本后直接移除，想想還是換人家建議的使用方式吧，免得以后更新換代還得做遷移，也就是我們現在准備的使用方式。

二、High Level Java REST Client方式接入

使用High Level Java REST Client進行Elasticsearch檢索查詢，第一步添加依賴

org.elasticsearch.client:elasticsearch-rest-client
org.elasticsearch:elasticsearch

2.1、添加依賴

在SpringBoot中的具體添加方式是在pom.xml中:

        <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>6.3.2</version> </dependency> <!-- Java High Level REST Client --> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>6.3.2</version> </dependency>

2.2、添加配置地址

添加依賴之后即可進行初始化

RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("localhost", 9200, "http")));

這個 client的內部會維護一個線程池，所以在任務完成后可以通過 client.close()來釋放資源，但是這得看需求，如果需要頻繁進行查詢的話，就直接做成單例，避免線程池的不斷創建和釋放也會影響應用的性能，在SpringBoot的做法做成單例的話更簡單。
application.yml配置文件中添加集群地址，我這邊只有一個，有多個的可以用逗號分割然后自己解析。

elasticsearch: ip: localhost:9200

@Configuration public class ElasticsearchRestClient { /** * ES地址,ip:port */ @Value("${elasticsearch.ip}") String ipPort; @Bean public RestClientBuilder restClientBuilder() { return RestClient.builder(makeHttpHost(ipPort)); } @Bean(name = "highLevelClient") public RestHighLevelClient highLevelClient(@Autowired RestClientBuilder restClientBuilder) { restClientBuilder.setMaxRetryTimeoutMillis(60000); return new RestHighLevelClient(restClientBuilder); } private HttpHost makeHttpHost(String s) { String[] address = s.split(":"); String ip = address[0]; int port = Integer.parseInt(address[1]); return new HttpHost(ip, port, "http"); } }

我們這邊只有一個地址，如果有多個地址，自己做下處理即可。

三、Elasticsearch檢索查詢

經過上一步驟之后就可以在項目中使用client來進行具體的檢索及查詢操作了，具體使用之前先清楚幾個概念。

3.1 Elasticsearch數據結構

在我們這邊的使用場景中，Elasticsearch是用來存儲各個端的日志，在這種場景下，每一條日志就是一個Document(文檔)，我們知道日志中包含了很多信息，比如上傳時間，瀏覽器，ip等等，每條日志中包含多個字段信息就是Field(字段)，不同的日志可能有不同的類型，比如服務器日志，用戶行為日志，這就是Type(類型)，每天的日志分開進行存儲是Indice(索引)，可以類比於關系型數據庫比如MySQL。

關系型數據庫	Elasticsearch
Databases(數據庫)	Indices(索引)
Tables(表)	Types(類型)
Rows(行)	Documents(文檔)
Columns(列)	Fields(字段)

Elasticsearch包含多個索引(indices)（數據庫），每個索引可以包含多個類型(types)（表），每個類型包含多個文檔(documents)（行），每個文檔包含多個字段(Fields)（列）。

舉個栗子，手動添加一條日志，指定indice為customer，type為_doc，document的id為1。

localhost:9200/customer/_doc/1?pretty

{ "city": "北京", "useragent": "Mobile Safari", "sys_version": "Linux armv8l", "province": "北京", "event_id": "", "log_time": 1559191912, "session": "343730" }

然后再查詢一下剛添加的日志。

GET localhost:9200/customer/_doc/1?pretty

{ "_index": "customer", "_type": "_doc", "_id": "1", "_version": 3, "_seq_no": 2, "_primary_term": 1, "found": true, "_source": { "city": "北京", "useragent": "Mobile Safari", "sys_version": "Linux armv8l", "province": "北京", "event_id": "", "log_time": 1559191912, "session": "343730" } }

3.2 Elasticsearch條件查詢

第一步需要初始化SearchRequest，設置索引(indices)和類型(types)，以上面添加的日志為例。

        SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("customer"); searchRequest.types("_doc");

然后需要組合查詢條件，主要涉及到=、!=、>、<這幾個條件的查詢，需要更復雜的可以查看官方文檔。

// 條件= MatchQueryBuilder matchQuery = QueryBuilders.matchQuery("city", "北京"); TermQueryBuilder termQuery = QueryBuilders.termQuery("province", "福建"); // 范圍查詢 RangeQueryBuilder timeFilter = QueryBuilders.rangeQuery("log_time").gt(12345).lt(343750);

構建好需要的查詢條件后，需要進行組合查詢，在組合查詢里頭實現!=條件查詢，需要用到BoolQueryBuilder，BoolQueryBuilder包含4個方法:

must 相當於 &(與)條件。
must not 相當於~(非)條件。
should 相當於 | (或)條件。
filter 類似must，區別在於它不參與計算分值，在不需要用到分值計算的時候效率更高。

QueryBuilder totalFilter = QueryBuilders.boolQuery() .filter(matchQuery) .filter(timeFilter) .mustNot(termQuery);

3.3 Elasticsearch分頁查詢

可以設置每次查詢返回的文檔數量，如果不設置的話，默認只返回10條hits，這個數量可以手動設置:

sourceBuilder.query(totalFilter).size(100);

單單設置返回條數還不滿足需求，因為我們這邊是沒有辦法事先確定的，所以需要自己來實現分頁，需要from()方法進行輔助。

完整示例代碼如下:

@Service public class TestService { @Autowired RestHighLevelClient highLevelClient; private void search(RestHighLevelClient highLevelClient) throws IOException { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("customer"); searchRequest.types("_doc"); // 條件= MatchQueryBuilder matchQuery = QueryBuilders.matchQuery("city", "北京"); TermQueryBuilder termQuery = QueryBuilders.termQuery("province", "福建"); // 范圍查詢 RangeQueryBuilder timeFilter = QueryBuilders.rangeQuery("log_time").gt(12345).lt(343750); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); QueryBuilder totalFilter = QueryBuilders.boolQuery() .filter(matchQuery) .filter(timeFilter) .mustNot(termQuery); int size = 200; int from = 0; long total = 0; do { try { sourceBuilder.query(totalFilter).from(from).size(size); sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); searchRequest.source(sourceBuilder); SearchResponse response = highLevelClient.search(searchRequest); SearchHit[] hits = response.getHits().getHits(); for (SearchHit hit : hits) { System.out.println(hit.getSourceAsString()); } total = response.getHits().totalHits; System.out.println("測試:[" + total + "][" + from + "-" + (from + hits.length) + ")"); from += hits.length; // from + size must be less than or equal to: [10000] if (from >= 10000) { System.out.println("測試:超過10000條直接中斷"); break; } } catch (Exception e) { e.printStackTrace(); } } while (from < total); } }

3.4 分頁查詢異常

在分頁的過程中出現了一個問題是當查詢的數據超過10000條的時候報了異常：

from + size must be less than or equal to: [10000]

這個問題最快捷的解決方式是增大窗口大小:

curl -XPUT http://127.0.0.1:9200/customer/_settings -d '{ "index" : { "max_result_window" : 500000}}'

但是對應增大窗口大小，會犧牲更多的服務器的內存、CPU資源，在我們這邊的使用場景下，這樣做是划不來的，因為我們的目的是做目標數據的搜索，而不是大規模的遍歷，所以我們這邊會直接放棄超過這個數量的查詢，也就是上面的這段代碼:

 // from + size must be less than or equal to: [10000] if (from > 10000) { System.out.println("測試:超過10000條直接中斷"); break; }

對於Elasticsearch其實也是很多地方還不熟悉，感興趣的童鞋可以多多一起交流和指正，不然的話后續也只能在使用過程中來加深理解。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 使用RestHighLevelClient連接Elasticsearch springboot集成Elasticsearch6.X（RestHighLevelClient） springboot使用RestHighLevelClient批量插入 springboot使用RestHighLevelClient操作es ElasticSearch(2)-RestHighLevelClient springboot使用RestHighLevelClient7簡單操作ElasticSearch7增刪查改/索引創建 Spring boot項目使用 restHighLevelClient 接入 elasticsearch ElasticSearch使用RestHighLevelClient進行搜索查詢 Elasticsearch6中RestHighLevelClient和RestLowLevelClient使用 Springboot使用RestHighLevelClient7操作es