參考:https://www.wenjiangs.com/doc/iwlst1pcp
1. DSL簡單介紹
官方介紹如下:
Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:
Leaf query clauses
Leaf query clauses look for a particular value in a particular field, such as the match, term or range queries. These queries can be used by themselves.
Compound query clauses
Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the bool or dis_max query), or to alter their behaviour (such as the constant_score query).
Query clauses behave differently depending on whether they are used in query context or filter context.
2.數據構造
1. 創建索引類型
1. 創建一個賬號索引,字段如下:
PUT /accounts { "mappings": { "properties": { "userid": { "type": "long" }, "username": { "type": "keyword" }, "fullname": { "type": "text" }, "sex": { "type": "double" }, "birth": { "type": "date" } } } }
2. 創建一個訂單索引
PUT /orders { "mappings": { "properties": { "orderid": { "type": "long" }, "ordernum": { "type": "keyword" }, "username": { "type": "keyword" }, "description": { "type": "text" }, "createTime": { "type": "date" }, "amount": { "type": "double" } } } }
2. 查看索引字段
liqiang@root MINGW64 ~/Desktop $ curl -X GET http://localhost:9200/accounts/_mapping?pretty=true % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 375 100 375 0 0 12096 0 --:--:-- --:--:-- --:--:-- 366k{ "accounts" : { "mappings" : { "properties" : { "birth" : { "type" : "date" }, "fullname" : { "type" : "text" }, "sex" : { "type" : "double" }, "userid" : { "type" : "long" }, "username" : { "type" : "keyword" } } } } } liqiang@root MINGW64 ~/Desktop $ curl -X GET http://localhost:9200/orders/_mapping?pretty=true % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 448 100 448 0 0 14451 0 --:--:-- --:--:-- --:--:-- 437k{ "orders" : { "mappings" : { "properties" : { "amount" : { "type" : "double" }, "createTime" : { "type" : "date" }, "description" : { "type" : "text" }, "orderid" : { "type" : "long" }, "ordernum" : { "type" : "keyword" }, "username" : { "type" : "keyword" } } } } }
3. 創建十條數據
1.創建用戶數據
private static void createDocument() throws UnknownHostException, IOException, InterruptedException { // on startup Settings settings = Settings.builder().put("cluster.name", "my-application").build(); TransportClient client = new PreBuiltTransportClient(settings) .addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"), 9300)); for (int i = 0; i < 10; i++) { XContentBuilder builder = XContentFactory.jsonBuilder().startObject().field("username", "zhangsan" + i) .field("fullname", "張三" + i).field("sex", i % 2 == 0 ? 1 : 2).field("userid", (i + 1)) .field("birth", new Date()).endObject(); // 存到users索引中的user類型中 IndexResponse response = client.prepareIndex("accounts", "_doc").setSource(builder).get(); // 打印保存信息 String _id = response.getId(); System.out.println("_id " + _id); Thread.sleep(1 * 1000); } // on shutdown client.close(); }
結果:
_id BpeN0nMBntNcepW152XL
_id B5eN0nMBntNcepW17WVO
_id CJeN0nMBntNcepW18mWF
_id CZeN0nMBntNcepW192XD
_id CpeN0nMBntNcepW1_GXZ
_id C5eO0nMBntNcepW1AWWe
_id DJeO0nMBntNcepW1BmUf
_id DZeO0nMBntNcepW1CmXE
_id DpeO0nMBntNcepW1D2Xh
_id D5eO0nMBntNcepW1FGVL
在kibana中使用Discover搜索數據如下:
2.創建訂單數據
private static void createDocument() throws UnknownHostException, IOException, InterruptedException { // on startup Settings settings = Settings.builder().put("cluster.name", "my-application").build(); TransportClient client = new PreBuiltTransportClient(settings) .addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"), 9300)); for (int i = 0; i < 10; i++) { XContentBuilder builder = XContentFactory.jsonBuilder().startObject().field("amount", i) .field("createTime", new Date()).field("description", "訂單描述" + i).field("orderid", (i + 1)) .field("ordernum", "order" + i).field("username", "zhangsan" + (i % 5)).endObject(); // 存到users索引中的user類型中 IndexResponse response = client.prepareIndex("orders", "_doc").setSource(builder).get(); // 打印保存信息 String _id = response.getId(); System.out.println("_id " + _id); Thread.sleep(1 * 1000); } // on shutdown client.close(); }
結果:
_id EJfo0nMBntNcepW15mUP
_id EZfo0nMBntNcepW16mW3
_id Epfo0nMBntNcepW172VR
_id E5fo0nMBntNcepW182Xr
_id FJfo0nMBntNcepW1-WXO
_id FZfo0nMBntNcepW1_mU5
_id Fpfp0nMBntNcepW1AmV2
_id F5fp0nMBntNcepW1BmXi
_id GJfp0nMBntNcepW1C2VR
_id GZfp0nMBntNcepW1D2WO
kibana查看數據:
(1)kibana的Management-》Index patterns-》Create index pattern
(2)Discover 查看數據
4. 創建9條news數據
(1)字段映射如下 =content字段采用ik分詞器進行分詞
{properties={creator={type=text, fields={keyword={ignore_above=256, type=keyword}}}, createTime={type=date}, description={type=double}, id={type=long}, title={search_analyzer=ik_smart, analyzer=ik_max_word, type=text}, type={type=text, fields={keyword={ignore_above=256, type=keyword}}}, content={search_analyzer=ik_smart, analyzer=ik_max_word, type=text}}}
(2) 數據如下:
{"creator":"creator1","createTime":"2020-08-27T02:52:24.491Z","type":"java","title":"java記錄","content":"這里是java記錄"}
{"creator":"creator2","createTime":"2020-08-27T02:52:31.677Z","type":"vue","title":"vue記錄","content":"這里是vue記錄"}
{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js記錄","content":"這里是js記錄"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js記錄","content":"這里是js記錄"}
{"creator":"creator7","createTime":"2020-08-27T02:52:33.733Z","type":"vue","title":"vue記錄","content":"這里是vue記錄"}
{"creator":"creator6","createTime":"2020-08-27T02:52:32.395Z","type":"java","title":"java記錄","content":"這里是java記錄"}
{"creator":"creator0","createTime":"2020-08-27T02:52:14.353Z","type":"雜文","title":"雜文記錄","content":"這里是雜文記錄"}
{"creator":"creator5","createTime":"2020-08-27T02:52:32.202Z","type":"雜文","title":"雜文記錄","content":"這里是雜文記錄"}
{"creator":"creator8","createTime":"2020-08-27T02:52:34.030Z","type":"js","title":"js記錄","content":"JS是真的強"}
3. kibana中使用DSL查詢
1.query and filter
The fullname field contains the word 張三
The username field contains the word "張三2"
The term field contains the exact value 1
The birth field contains a date from 1 Jan 2015 onwards
GET /_search { "query": { "bool": { "must": [ { "match": { "fullname": "張三" }}, { "match": { "username": "zhangsan2" }} ], "filter": [ { "term": { "sex": 1 }}, { "range": { "birth": { "gte": "2015-01-01" }}} ] } } }
結果:
{ "took" : 20, "timed_out" : false, "_shards" : { "total" : 6, "successful" : 6, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 2.0854702, "hits" : [ { "_index" : "accounts", "_type" : "_doc", "_id" : "CJeN0nMBntNcepW18mWF", "_score" : 2.0854702, "_source" : { "username" : "zhangsan2", "fullname" : "張三2", "sex" : 1, "userid" : 3, "birth" : "2020-08-09T09:29:44.832Z" } } ] } }
。。。
4.Java中DSL查詢
=====下面的query都是基於orders、news索引=====
1. matchAllQuery 查詢所有-文檔的分數都為1.0F
private static void matchAllQuery() throws UnknownHostException { // on startup Settings settings = Settings.builder().put("cluster.name", "my-application").build(); TransportClient client = new PreBuiltTransportClient(settings) .addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"), 9300)); // 1.構造查詢結果 MatchAllQueryBuilder matchAllQuery = QueryBuilders.matchAllQuery(); SearchResponse searchResponse = client.prepareSearch("orders").setTypes("_doc").setQuery(matchAllQuery).get(); // 2. 打印查詢結果 SearchHits hits = searchResponse.getHits(); // 獲取命中次數,查詢結果有多少對象 System.out.println("查詢結果有:" + hits.getTotalHits() + "條"); Iterator<SearchHit> iterator = hits.iterator(); while (iterator.hasNext()) { SearchHit searchHit = iterator.next(); // 每個查詢對象 System.out.println(searchHit.getSourceAsString()); // 獲取字符串格式打印 } // on shutdown client.close(); }
結果:
查詢結果有:10 hits條
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"訂單描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}
{"amount":1,"createTime":"2020-08-09T11:09:06.611Z","description":"訂單描述1","orderid":2,"ordernum":"order1","username":"zhangsan1"}
{"amount":2,"createTime":"2020-08-09T11:09:07.789Z","description":"訂單描述2","orderid":3,"ordernum":"order2","username":"zhangsan2"}
{"amount":3,"createTime":"2020-08-09T11:09:08.966Z","description":"訂單描述3","orderid":4,"ordernum":"order3","username":"zhangsan3"}
{"amount":4,"createTime":"2020-08-09T11:09:10.468Z","description":"訂單描述4","orderid":5,"ordernum":"order4","username":"zhangsan4"}
{"amount":5,"createTime":"2020-08-09T11:09:11.605Z","description":"訂單描述5","orderid":6,"ordernum":"order5","username":"zhangsan0"}
{"amount":6,"createTime":"2020-08-09T11:09:12.692Z","description":"訂單描述6","orderid":7,"ordernum":"order6","username":"zhangsan1"}
{"amount":7,"createTime":"2020-08-09T11:09:13.823Z","description":"訂單描述7","orderid":8,"ordernum":"order7","username":"zhangsan2"}
{"amount":8,"createTime":"2020-08-09T11:09:14.958Z","description":"訂單描述8","orderid":9,"ordernum":"order8","username":"zhangsan3"}
{"amount":9,"createTime":"2020-08-09T11:09:16.043Z","description":"訂單描述9","orderid":10,"ordernum":"order9","username":"zhangsan4"}
2. Full text queries 全文搜索==會進行分詞(主要針對text類型的字段,會對查詢語句進行分詞分析后搜索)
高級別的全文搜索通常用於在全文字段(例如:一封郵件的正文)上進行全文搜索。它們了解如何分析查詢的字段,並在執行之前將每個字段的分析器(或搜索分析器)應用於查詢字符串
1.match query 匹配查詢
用於執行全文查詢的標准查詢,包括模糊匹配和詞組或鄰近程度的查詢
匹配查詢的行為受到兩個參數的控制:
(1)operator:表示單個字段如何匹配查詢條件的分詞。默認是 or,可選項為and。例如:
GET /_search { "query": { "match" : { "message" : "this is a test" } } }
默認為or,偽代碼可以理解為:
if (doc.message contains "this" or doc.message contains "is" or doc.message contains "a" or doc.message contains "test") return doc
如果為and,偽代碼可以理解為:
if (doc.message contains "this" and doc.message contains "is" and doc.message contains "a" and doc.message contains "test") return doc
(2)minimum_should_match:表示字段匹配的數量,可以理解為相似度
例如:
private static void matchQuery() throws UnknownHostException { // on startup Settings settings = Settings.builder().put("cluster.name", "my-application").build(); TransportClient client = new PreBuiltTransportClient(settings) .addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"), 9300)); QueryBuilder qb = QueryBuilders.matchQuery("content", // field 字段 "java有點強" // text ); SearchResponse searchResponse = client.prepareSearch("news").setTypes("_doc").setQuery(qb).get(); // 2. 打印查詢結果 SearchHits hits = searchResponse.getHits(); // 獲取命中次數,查詢結果有多少對象 System.out.println("查詢結果有:" + hits.getTotalHits() + "條"); Iterator<SearchHit> iterator = hits.iterator(); while (iterator.hasNext()) { SearchHit searchHit = iterator.next(); // 每個查詢對象 System.out.println(searchHit.getSourceAsString()); // 獲取字符串格式打印 } // on shutdown client.close(); }
結果:
查詢結果有:3 hits條
{"creator":"creator1","createTime":"2020-08-27T02:52:24.491Z","type":"java","title":"java記錄","content":"這里是java記錄"}
{"creator":"creator8","createTime":"2020-08-27T02:52:34.030Z","type":"js","title":"js記錄","content":"JS是真的強"}
{"creator":"creator6","createTime":"2020-08-27T02:52:32.395Z","type":"java","title":"java記錄","content":"這里是java記錄"}
指定操作符為and,並且設定最小匹配度:
QueryBuilder qb = QueryBuilders.matchQuery("content", "這里是js").operator(Operator.AND).minimumShouldMatch("50%");
結果:
{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js記錄","content":"這里是js記錄"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js記錄","content":"這里是js記錄"}
2. matchPhraseQuery基於彼此鄰近搜索詞
match_phrase 查詢首先將查詢字符串解析成一個詞項列表,然后對這些詞項進行搜索,但只保留那些包含全部搜索詞項,且位置與搜索詞項相同的文檔。
QueryBuilder qb = QueryBuilders.matchPhraseQuery("content", "這里記錄");
結果查詢不到數據。
可以加slop參數,比如下面設為3則認為詞相差在3個位置以內也認為是臨近詞。
QueryBuilder qb = QueryBuilders.matchPhraseQuery("content", "這里記錄").slop(3);
結果:
{"creator":"creator0","createTime":"2020-08-27T02:52:14.353Z","type":"雜文","title":"雜文記錄","content":"這里是雜文記錄"}
{"creator":"creator5","createTime":"2020-08-27T02:52:32.202Z","type":"雜文","title":"雜文記錄","content":"這里是雜文記錄"}
{"creator":"creator6","createTime":"2020-08-27T02:52:32.395Z","type":"java","title":"java記錄","content":"這里是java記錄"}
{"creator":"creator1","createTime":"2020-08-27T02:52:24.491Z","type":"java","title":"java記錄","content":"這里是java記錄"}
{"creator":"creator2","createTime":"2020-08-27T02:52:31.677Z","type":"vue","title":"vue記錄","content":"這里是vue記錄"}
{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js記錄","content":"這里是js記錄"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js記錄","content":"這里是js記錄"}
{"creator":"creator7","createTime":"2020-08-27T02:52:33.733Z","type":"vue","title":"vue記錄","content":"這里是vue記錄"}
3. 多字段查詢(multi_match query)
可以用來對多個字段的版本進行匹配查詢
// 第一個參數是text,后面是可變參數的fields QueryBuilder qb = QueryBuilders.multiMatchQuery("java和JS真的強", "content", "title");
4.查詢語句查詢(query_string query)
與lucene查詢語句的語法結合的更加緊密的一種查詢,允許你在一個查詢語句中使用多個 特殊條件關鍵字(如:AND|OR|NOT )對多個字段進行查詢。
// +表示必須,-表示必須沒有 QueryBuilder qb = QueryBuilders.queryStringQuery("+js -強").field("content");
結果:
{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js記錄","content":"這里是js記錄"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js記錄","content":"這里是js記錄"}
3. Term level queries==精確查找,不會進行分詞
通常用於結構化數據,如數字、日期和枚舉,而不是全文字段。或者,在分析過程之前,它允許你繪制低級查詢。
1. term query
Find documents which contain the exact term specified in the field specified.
TermQueryBuilder termQuery = QueryBuilders.termQuery("orderid", 1);
結果:
查詢結果有:1 hits條
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"訂單描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}
補充:termQuery也可以用於text字段,只是作為一個詞去查詢,不會再次分析查詢語句
比如,用"這里是"去搜索,
QueryBuilder qb = QueryBuilders.termQuery("content", "這里是");
結果:
{"creator":"creator0","createTime":"2020-08-27T02:52:14.353Z","type":"雜文","title":"雜文記錄","content":"這里是雜文記錄"}
{"creator":"creator5","createTime":"2020-08-27T02:52:32.202Z","type":"雜文","title":"雜文記錄","content":"這里是雜文記錄"}
{"creator":"creator6","createTime":"2020-08-27T02:52:32.395Z","type":"java","title":"java記錄","content":"這里是java記錄"}
{"creator":"creator1","createTime":"2020-08-27T02:52:24.491Z","type":"java","title":"java記錄","content":"這里是java記錄"}
{"creator":"creator2","createTime":"2020-08-27T02:52:31.677Z","type":"vue","title":"vue記錄","content":"這里是vue記錄"}
{"creator":"creator3","createTime":"2020-08-27T02:52:31.915Z","type":"js","title":"js記錄","content":"這里是js記錄"}
{"creator":"creator4","createTime":"2020-08-27T02:52:32.067Z","type":"es","title":"js記錄","content":"這里是js記錄"}
{"creator":"creator7","createTime":"2020-08-27T02:52:33.733Z","type":"vue","title":"vue記錄","content":"這里是vue記錄"}
從結果看出,只查出包含"結果是"這個詞的doc,我們分析這里是java記錄"分詞效果:
POST /_analyze { "analyzer":"ik_max_word", "text": "這里是java記錄" }
分析如下:
{ "tokens" : [ { "token" : "這里是", "start_offset" : 0, "end_offset" : 3, "type" : "CN_WORD", "position" : 0 }, { "token" : "這里", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 1 }, { "token" : "是", "start_offset" : 2, "end_offset" : 3, "type" : "CN_CHAR", "position" : 2 }, { "token" : "java", "start_offset" : 3, "end_offset" : 7, "type" : "ENGLISH", "position" : 3 }, { "token" : "記錄", "start_offset" : 7, "end_offset" : 9, "type" : "CN_WORD", "position" : 4 } ] }
2. terms query-文檔的分數都是1.0F
Find documents which contain any of the exact terms specified in the field specified.
TermsQueryBuilder termsQuery = QueryBuilders.termsQuery("orderid", "1", "2");
結果:
查詢結果有:2 hits條
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"訂單描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}
{"amount":1,"createTime":"2020-08-09T11:09:06.611Z","description":"訂單描述1","orderid":2,"ordernum":"order1","username":"zhangsan1"}
3. range query
Find documents where the field specified contains values (dates, numbers, or strings) in the range specified.
RangeQueryBuilder includeUpper = QueryBuilders.rangeQuery("amount").from(5).to(10).includeLower(true) .includeUpper(false);
參數解釋:
include lower value means that from is gt when false or gte when true
include upper value means that to is lt when false or lte when true
結果:
查詢結果有:5 hits條
{"amount":5,"createTime":"2020-08-09T11:09:11.605Z","description":"訂單描述5","orderid":6,"ordernum":"order5","username":"zhangsan0"}
{"amount":6,"createTime":"2020-08-09T11:09:12.692Z","description":"訂單描述6","orderid":7,"ordernum":"order6","username":"zhangsan1"}
{"amount":7,"createTime":"2020-08-09T11:09:13.823Z","description":"訂單描述7","orderid":8,"ordernum":"order7","username":"zhangsan2"}
{"amount":8,"createTime":"2020-08-09T11:09:14.958Z","description":"訂單描述8","orderid":9,"ordernum":"order8","username":"zhangsan3"}
{"amount":9,"createTime":"2020-08-09T11:09:16.043Z","description":"訂單描述9","orderid":10,"ordernum":"order9","username":"zhangsan4"}
上面等價於
RangeQueryBuilder includeUpper = QueryBuilders.rangeQuery("amount").gte("5").lt("10");
4. exists query
Find documents where the field specified contains any non-null value.
ExistsQueryBuilder existsQuery = QueryBuilders.existsQuery("createTime");
5. prefix query
Find documents where the field specified contains terms which being with the exact prefix specified.
PrefixQueryBuilder prefixQuery = QueryBuilders.prefixQuery("description", "描述");
6. wildcard query(通配符查詢)
Find documents where the field specified contains terms which match the pattern specified, where the pattern supports single character wildcards (?) and multi-character wildcards (*)
// description 以描述開始的 WildcardQueryBuilder wildcardQuery = QueryBuilders.wildcardQuery("description", "描述*");
7 regexp query (正則查詢)
Find documents where the field specified contains terms which match the regular expression specified.
RegexpQueryBuilder regexpQuery = QueryBuilders.regexpQuery("ordernum", "order.*");
8. fuzzy query (模糊查詢)
Find documents where the field specified contains terms which are fuzzily similar to the specified term. Fuzziness is measured as a Levenshtein edit distance of 1 or 2.
FuzzyQueryBuilder fuzzyQuery = QueryBuilders.fuzzyQuery("ordernum", "order");
9. ids query
Find documents with the specified type and IDs.
IdsQueryBuilder addIds = QueryBuilders.idsQuery().addIds("EJfo0nMBntNcepW15mUP", "EZfo0nMBntNcepW16mW3");
4. compound queries 復合查詢
1. constant_score query 改變查詢結果的分數
A query which wraps another query, but executes it in filter context. All matching documents are given the same “constant” _score.
ConstantScoreQueryBuilder boost = QueryBuilders
.constantScoreQuery(QueryBuilders.termQuery("ordernum", "order4")).boost(2F);
結果:
查詢結果有:1 hits條
2.0 {"amount":4,"createTime":"2020-08-27T13:42:41.559Z","description":"訂單描述4","orderid":5,"ordernum":"order4","username":"zhangsan4"}
2. bool query
The default query for combining multiple leaf or compound query clauses, as must, should, must_not, or filter clauses. The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context.
組合多個葉子並發查詢或復合查詢條件的默認查詢類型,例如must, should, must_not, 以及 filter 條件。 在 must 和 should 子句他們的分數相結合-匹配條件越多,預期越好-而 must_not 和 filter 子句在過濾器上下文中執行。
must:所有的語句都 必須(must) 匹配,與mysql中的 AND 等價, 並且參與計算分值。
must_nout:所有的語句都 不能(must not) 匹配,與mysql的 NOT 等價。不會影響評分;它的作用只是將不相關的文檔排除
filter:返回的文檔必須滿足filter子句的條件。但是不會像Must一樣參與計算分值。filter返回的文檔的_score都是0。
should:至少有一個語句要匹配,與 mysql的OR 等價。在一個Bool查詢中,如果沒有must或者filter,有一個或者多個should子句,那么只要滿足一個就可以返回;如果定義了must或者filter,should會增加權重,提高分數
如果一個查詢既有filter又有should,那么至少包含一個should子句。bool查詢也支持禁用協同計分選項disable_coord。一般計算分值的因素取決於所有的查詢條件。bool查詢也是采用more_matches_is_better的機制,因此滿足must和should子句的文檔將會合並起來計算分值。
例如:用news索引使用filter查詢所有-分數清零,然后用should查詢類型為java的,提高權重
BoolQueryBuilder filter = QueryBuilders.boolQuery(); filter.filter(QueryBuilders.matchAllQuery());// 查詢所有,分數為0 filter.should(QueryBuilders.termQuery("type", "java")); // 類型為java的分數提升
例如: 用orders索引
1) 使用must-termsQuery的分數都為0
BoolQueryBuilder filter = QueryBuilders.boolQuery()
.must(QueryBuilders.termsQuery("ordernum", "order4", "order5", "order6"));
結果:
查詢結果有:3 hits條
1.0 {"amount":4,"createTime":"2020-08-27T13:42:41.559Z","description":"訂單描述4","orderid":5,"ordernum":"order4","username":"zhangsan4"}
1.0 {"amount":5,"createTime":"2020-08-27T13:42:42.662Z","description":"訂單描述5","orderid":6,"ordernum":"order5","username":"zhangsan0"}
1.0 {"amount":6,"createTime":"2020-08-27T13:42:43.932Z","description":"訂單描述6","orderid":7,"ordernum":"order6","username":"zhangsan1"}
(2) 用should提升username為zhangsan0增加權重1F,username為zhangsan1增加權重0.1F。也就是實現zhangsan0-1-4的排序
BoolQueryBuilder filter = QueryBuilders.boolQuery() .must(QueryBuilders.termsQuery("ordernum", "order4", "order5", "order6")) .should(QueryBuilders.termQuery("username", "zhangsan0")) // 默認是1F .should(QueryBuilders.termQuery("username", "zhangsan1").boost(0.1F));
結果:
查詢結果有:3 hits條
2.4816046{"amount":5,"createTime":"2020-08-27T13:42:42.662Z","description":"訂單描述5","orderid":6,"ordernum":"order5","username":"zhangsan0"}
1.1481605{"amount":6,"createTime":"2020-08-27T13:42:43.932Z","description":"訂單描述6","orderid":7,"ordernum":"order6","username":"zhangsan1"}
1.0{"amount":4,"createTime":"2020-08-27T13:42:41.559Z","description":"訂單描述4","orderid":5,"ordernum":"order4","username":"zhangsan4"}
補充:評分計算
bool 查詢會為每個文檔計算相關度評分 _score ,再將所有匹配的 must 和 should 語句的分數 _score 求和,最后除以 must 和 should 語句的總數。
補充:控制精度
所有 must 語句必須匹配,所有 must_not 語句都必須不匹配,但有多少 should 語句應該匹配呢?默認情況下,沒有 should 語句是必須匹配的,只有一個例外:那就是當沒有 must 語句的時候,至少有一個 should 語句必須匹配。
我們可以通過 minimum_should_match 參數控制需要匹配的 should 語句的數量,它既可以是一個絕對的數字,又可以是個百分比
例如:
GET /my_index/my_type/_search { "query": { "bool": { "should": [ { "match": { "title": "brown" }}, { "match": { "title": "fox" }}, { "match": { "title": "dog" }} ], "minimum_should_match": 2 } } }
minimum_should_match也可以用百分比表示,例如"70%"
3. dis_max query 混合查詢
支持多並發查詢的查詢,並可返回與任意查詢條件子句匹配的任何文檔類型。與 bool 查詢可以將所有匹配查詢的分數相結合使用的方式不同的是,dis_max 查詢只使用最佳匹配查詢條件的分數。
DisMaxQueryBuilder tieBreaker = QueryBuilders.disMaxQuery().add(QueryBuilders.termQuery("ordernum", "order0"))
.add(QueryBuilders.termQuery("ordernum", "order1")).boost(1.2f).tieBreaker(0.7f);
結果:
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"訂單描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}
{"amount":1,"createTime":"2020-08-09T11:09:06.611Z","description":"訂單描述1","orderid":2,"ordernum":"order1","username":"zhangsan1"}
4. boosting query 改變權重
Return documents which match a positive query, but reduce the score of documents which also match a negative query.
希望包含了某項內容的結果不是不出現,而是排序靠后。boostingQuery的第一個參數是增加權重,第二個參數是降低權重。
BoostingQueryBuilder negativeBoost = QueryBuilders .boostingQuery(QueryBuilders.termsQuery("orderid", "1", "2"), QueryBuilders.termQuery("orderid", "1")) .negativeBoost(0.2f);
結果:
查詢結果有:2 hits條
{"amount":1,"createTime":"2020-08-09T11:09:06.611Z","description":"訂單描述1","orderid":2,"ordernum":"order1","username":"zhangsan1"}
{"amount":0,"createTime":"2020-08-09T11:09:05.259Z","description":"訂單描述0","orderid":1,"ordernum":"order0","username":"zhangsan0"}
5. Joining queries
1. nested query (嵌套查詢)
Documents may contains fields of type nested. These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.
2. has_child and has_parent queries 父子查詢
A parent-child relationship can exist between two document types within a single index. The has_child query returns parent documents whose child documents match the specified query, while the has_parent query returns child documents whose parent document matches the specified query.
6. Specialized queries
1.more_like_this query 相似度查詢=可以用於相關文檔推薦等場景
This query finds documents which are similar to the specified text, document, or collection of documents.
這個查詢能檢索到與指定文本、文檔或者文檔集合相似的文檔。
(1) 根據字段以及關鍵字進行相似度搜索
String[] fields = { "content" }; // fields String[] texts = { "這里是java記錄" }; // 需要分析的文本 MoreLikeThisQueryBuilder qb = QueryBuilders.moreLikeThisQuery(fields, texts, null).minTermFreq(1) .maxQueryTerms(12).minimumShouldMatch("70%");
(2)第二種是根據ES中現有的文檔來進行相似度匹配
// 根據ES中的文檔進行相似度查詢,第一個參數是index,第二個是文檔ID Item item = new Item("news", "CyLULXQBRGkNEPJ3ya9x"); Item[] items = { item }; MoreLikeThisQueryBuilder qb = QueryBuilders.moreLikeThisQuery(items).minTermFreq(1).maxQueryTerms(12) .minimumShouldMatch("70%");
重要的參數解釋:
1) 構造方法上的三個參數:
/** * A more like this query that finds documents that are "like" the provided texts or documents * which is checked against the fields the query is constructed with. * * @param fields the field names that will be used when generating the 'More Like This' query. * @param likeTexts the text to use when generating the 'More Like This' query. * @param likeItems the documents to use when generating the 'More Like This' query. */ public static MoreLikeThisQueryBuilder moreLikeThisQuery(String[] fields, String[] likeTexts, Item[] likeItems) { return new MoreLikeThisQueryBuilder(fields, likeTexts, likeItems); }
fields 是匹配的字段,默認是所有字段
likeTexts 是匹配的文本
likeItems ES中的文檔信息,傳遞此參數會根據文檔信息進行相似度查詢
2) 匹配參數如下:
max_query_terms:The maximum number of query terms that will be selected. Increasing this value gives greater accuracy at the expense of query execution speed. Defaults to 25.
min_term_freq:The minimum term frequency below which the terms will be ignored from the input document. Defaults to 2.
min_doc_freq:The minimum document frequency below which the terms will be ignored from the input document. Defaults to 5.
max_doc_freq:The maximum document frequency above which the terms will be ignored from the input document. This could be useful in order to ignore highly frequent words such as stop words. Defaults to unbounded (0).
min_word_length:The minimum word length below which the terms will be ignored. The old name min_word_len is deprecated. Defaults to 0.
max_word_length:The maximum word length above which the terms will be ignored. The old name max_word_len is deprecated. Defaults to unbounded (0).
stop_words:An array of stop words. Any word in this set is considered "uninteresting" and ignored. If the analyzer allows for stop words, you might want to tell MLT to explicitly ignore them, as for the purposes of document similarity it seems reasonable to assume that "a stop word is never interesting".
analyzer:The analyzer that is used to analyze the free form text. Defaults to the analyzer associated with the first field in fields.
minimum_should_match:After the disjunctive query has been formed, this parameter controls the number of terms that must match. The syntax is the same as the minimum should match. (Defaults to "30%").
補充:ES提供了一個類LevenshteinDistance,可以作為兩個字符序列之間差異的字符串度量標准,Levenshtein Distance是將一個單詞轉換為另一個單詞所需的單字符編輯(插入、刪除或替換)的最小數量。
LevenshteinDistance ld = new LevenshteinDistance(); float distance = ld.getDistance("這里是java記錄", "這里是java記錄"); System.out.println(distance); float distance2 = ld.getDistance("這里是java記錄", "這里是js記錄"); System.out.println(distance2);
結果:
1.0
0.6666666
2.percolate query
This query finds percolator queries based on documents.
3.wrapper query
A query that accepts other queries as json or yaml string.
補充:matchAllQuery、termsQuery 返回的文檔的分數score都是1.