接上篇繼續,本篇主要研究如何查詢
一、sql方式查詢
習慣於數據庫開發的同學,自然最喜歡這種方式。為了方便講解,先寫一段代碼,生成一堆記錄
package com.cnblogs.yjmyzz; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; public class Test { public static void main(String[] args) throws IOException, URISyntaxException, InterruptedException { HttpClient httpClient = HttpClient.newBuilder().build(); for (int i = 1000000; i < 2000000; i++) { HttpRequest httpRequest = HttpRequest.newBuilder() .header("Content-Type", "application/json") .version(HttpClient.Version.HTTP_1_1) .uri(new URI("http://localhost:9200/cnblogs/_doc/" + i)) .POST(HttpRequest.BodyPublishers.ofString("{\n" + " \"blog_id\":" + i + ",\n" + " \"blog_title\":\"java並發編程(" + i + ")\",\n" + " \"blog_content\":\"java並發編程學習筆記" + i + "-by 菩提樹下的楊過\",\n" + " \"blog_category\":\"java\"\n" + "}")).build(); HttpResponse<String> response = httpClient.send(httpRequest, HttpResponse.BodyHandlers.ofString()); System.out.println(response.toString() + "\t" + i); } } }
這里沒借助任何第3方類庫,僅用jdk 11自帶的HttpClient向ES添加100w條記錄,插入后數據大致長這樣
如果想用sql取前10條,可以這樣:
POST http://localhost:9200/_sql?format=txt
{ "query": "SELECT * FROM cnblogs where blog_category='java' and blog_id between 1000000 and 1005000 order by blog_id desc limit 10" }
只要象查mysql一樣,寫sql就行了,非常方便。執行效果:
另外,es還提供了一個SQL的CLI,命令終端輸入 ./elasticsearch-sql-cli 即可
更多SQL搜索的細節,可參考 https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-sql.html
二、URI簡單搜索
2.1 根據內部_id精確搜索
GET http://localhost:9200/cnblogs/_doc/1001818
如果存在_id=1001818的數據,將返回
{ "_index": "cnblogs", "_type": "_doc", "_id": "1001818", "_version": 1, "_seq_no": 954, "_primary_term": 1, "found": true, "_source": { "blog_id": 1001818, "blog_title": "java並發編程(1001818)", "blog_content": "java並發編程學習筆記1001818-by 菩提樹下的楊過", "blog_category": "java" } }
如果數據不存在,將返回404的http狀態碼。
tips: 如果不希望返回_xxx這一堆元數據,可以URI后面加上/_source,即:http://localhost:9200/cnblogs/_doc/1001818/_source,將返回
{ "blog_id": 1001818, "blog_title": "java並發編程(1001818)", "blog_content": "java並發編程學習筆記1001818-by 菩提樹下的楊過", "blog_category": "java" }
另外有些大文本的字段,每次返回也比較消耗性能,如果只需要返回指定字段,可以這么做:
http://localhost:9200/cnblogs/_doc/1001818/_source/?_source=blog_id,blog_title
將只返回blog_id,blog_title這2列
2.2 利用_search?q搜索
GET http://localhost:9200/cnblogs/_search?q=blog_id:1001818
這表示搜索blog_id為1001818的記錄
更多搜索細節,可參考https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
三、DSL搜索
_search也支持POST復雜方式搜索,稱為Query DSL,比如:取出第5條數據
POST http://localhost:9200/cnblogs/_search
{ "size": 5, "from": 0 }
這跟mysql中的limit x,y 分頁是類似效果,但是要注意的事,這種分頁方式遇到偏移量大時,性能極低下,ES7.x默認會判斷,如果超過10000,就直接返回錯誤了
比如:
{ "size": 5, "from": 10000 }
會返回:
{ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10005]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting." } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "cnblogs", "node": "TZ_qYEMOSZ63E1HMl4lFfA", "reason": { "type": "illegal_argument_exception", "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10005]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting." } } ], "caused_by": { "type": "illegal_argument_exception", "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10005]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.", "caused_by": { "type": "illegal_argument_exception", "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10005]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting." } } }, "status": 400 }
利用DSL可以構造很復雜的查詢,
比如:
POST http://localhost:9200/cnblogs/_search
{ "query": { "bool": { "must": [ { "range": { "blog_id": { "gte": 1001818, "lte": 1001830 } } }, { "match": { "blog_category": "java" } } ] } }, "size": 10, "from": 0 }
翻譯成sql的話,等價於 blog_id between 1001818 and 10001830 and blog_category='java' limit 0,10
DSL不建議死記,可以通過Elasticsearch Tools以可視化方式生成
另外還可以通過highlight來讓匹配的結果,相應的關鍵字高亮顯示
{ "query": { "bool": { "must": [ { "match": { "blog_title": "並發 ES" } } ] } }, "highlight": { "fields": { "blog_title": {} } }, "size": "1", "from": 0 }
返回結果:
{ "took": 63, "timed_out": false, "_shards": { "total": 2, "successful": 2, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 10000, "relation": "gte" }, "max_score": 9.87141, "hits": [ { "_index": "cnblogs", "_type": "_doc", "_id": "1", "_score": 9.87141, "_source": { "blog_id": 10000001, "blog_title": "ES 7.8速成筆記(新標題)", "blog_content": "這是一篇關於ES的測試內容by 菩提樹下的楊過", "blog_category": "ES" }, "highlight": { "blog_title": [ "<em>ES</em> 7.8速成筆記(新標題)" ] } } ] } }
多出的highlight中,匹配成功的關鍵字,會有em標識。
指定排序(sort)
{ "query": { "bool": { "must": [ { "match": { "blog_title": "並發 ES" } } ] } }, "highlight": { "fields": { "blog_title": {} } }, "sort": [ { "blog_id": { "order": "desc" } } ], "size": "1", "from": 0 }
注意sort部分,默認為asc升序。
聚合(group by)
{ "aggs": { "all_interests": { "terms": { "field": "blog_category" } } }, "size": 0, "from": 0 }
上述查詢,類似sql中的 select count(0) from cnblogs group by blog_category 返回結果如下:
{ "took": 1783, "timed_out": false, "_shards": { "total": 2, "successful": 2, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 10000, "relation": "gte" }, "max_score": null, "hits": [] }, "aggregations": { "all_interests": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "java", "doc_count": 514666 }, { "key": "ES", "doc_count": 1 }, { "key": "sql", "doc_count": 1 } ] } } }
更多Query DSL細節,可參考文檔https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
四、使用Client SDK查詢
ES提供了2種客戶端:elasticsearch-rest-client、elasticsearch-rest-high-level-client
4.1 elasticsearch-rest-client
pom依賴:
<dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.8.6</version> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-client</artifactId> <version>7.8.0</version> </dependency>
示例代碼:
package com.cnblogs.yjmyzz; import com.google.gson.Gson; import com.google.gson.GsonBuilder; import org.apache.http.HttpHost; import org.apache.http.util.EntityUtils; import org.elasticsearch.client.*; import java.io.IOException; import java.util.HashMap; import java.util.Map; public class EsClientTest { private static Gson gson = new GsonBuilder() .setPrettyPrinting() .setDateFormat("yyyy-MM-dd HH:mm:ss.SSS") .create(); public static void main(String[] args) throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")); builder.setFailureListener(new RestClient.FailureListener() { @Override public void onFailure(Node node) { System.out.println("fail:" + node); return; } }); RestClient client = builder.build(); //簡單的get查詢示例 Request request = new Request("GET", "/cnblogs/_doc/1001818/_source/?_source=blog_id,blog_title"); request.addParameter("pretty", "true"); Response response = client.performRequest(request); System.out.println(response.getRequestLine()); System.out.println(response.getStatusLine()); System.out.println(EntityUtils.toString(response.getEntity())); System.out.println("----------------"); //post查詢示例 request = new Request("POST", "/cnblogs/_search/?_source=blog_id,blog_title"); request.addParameter("pretty", "true"); Map<String, Integer> map = new HashMap<>(); map.put("size", 2); map.put("from", 0); request.setJsonEntity(gson.toJson(map)); response = client.performRequest(request); System.out.println(response.getRequestLine()); System.out.println(response.getStatusLine()); System.out.println(EntityUtils.toString(response.getEntity())); } }
4.2 elasticsearch-rest-high-level-client
pom依賴:
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.8.0</version> </dependency>
示例代碼:
package com.cnblogs.yjmyzz; import com.google.gson.Gson; import com.google.gson.GsonBuilder; import org.apache.http.HttpHost; import org.elasticsearch.action.get.GetRequest; import org.elasticsearch.action.get.GetResponse; import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.*; import org.elasticsearch.index.query.QueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import java.io.IOException; public class EsClientHighLevelTest { public static void main(String[] args) throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")); builder.setFailureListener(new RestClient.FailureListener() { @Override public void onFailure(Node node) { System.out.println("fail:" + node); return; } }); RestHighLevelClient client = new RestHighLevelClient(builder); //簡單的get查詢示例 GetRequest request = new GetRequest("cnblogs", "1001818"); GetResponse response = client.get(request, RequestOptions.DEFAULT); System.out.println(response.getSourceAsString()); //search示例 SearchRequest searchRequest = new SearchRequest("cnblogs"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query(QueryBuilders.matchQuery("blog_title", "並發 筆記")); sourceBuilder.from(0); sourceBuilder.size(5); searchRequest.source(sourceBuilder); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); for (SearchHit hit : searchResponse.getHits()) { System.out.println(hit.getSourceAsString()); } client.close(); } }