一、Es中創建索引
1.創建索引:
在之前的Es插件的安裝和使用中說到創建索引自定義分詞器和創建type,當時是分開寫的,其實創建索引時也可以創建type,並指定分詞器。
PUT /my_index { "settings": { "analysis": { "analyzer": { "ik_smart_pinyin": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] }, "ik_max_word_pinyin": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type" : "pinyin", "keep_separate_first_letter" : true, "keep_full_pinyin" : true, "keep_original" : true, "limit_first_letter_length" : 16, "lowercase" : true, "remove_duplicated_term" : true } } } }, "mappings": { "my_type":{ "properties": { "id":{ "type": "integer" }, "name":{ "type": "text", "analyzer": "ik_max_word_pinyin" }, "age":{ "type":"integer" } } } } }
2.添加數據
POST /my_index/my_type/_bulk
{ "index": { "_id":1}}
{ "id":1,"name": "張三","age":20}
{ "index": { "_id": 2}}
{ "id":2,"name": "張四","age":22}
{ "index": { "_id": 3}}
{ "id":3,"name": "張三李四王五","age":20}
3.查看數據類型
GET /my_index/my_type/_mapping 結果: { "my_index": { "mappings": { "my_type": { "properties": { "age": { "type": "integer" }, "id": { "type": "integer" }, "name": { "type": "text", "analyzer": "ik_max_word_pinyin" } } } } } }
二、結合JAVA(在這之前需在項目中配置好es,網上有好多例子可以參考)
1.創建Es實體類
package com.example.es_query_list.entity.es;
import lombok.Getter;
import lombok.Setter;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
@Setter
@Getter
@Document(indexName = "my_index",type = "my_type")
public class User {
@Id
private Integer id;
private String name;
private Integer age;
}
2.創建dao層
package com.example.es_query_list.repository.es; import com.example.es_query_list.entity.es.User; import org.springframework.data.elasticsearch.repository.ElasticsearchRepository; public interface EsUserRepository extends ElasticsearchRepository<User,Integer> { }
三、基本工作完成后,開始查詢
1.精確值查詢
查詢非文本類型數據
GET /my_index/my_type/_search { "query": { "term": { "age": { "value": "20" } } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "name": "張三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "name": "李四", "age": 20 } } ] } }
2.查詢文本類型
{ "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }
這時小伙伴們可能看到查詢結果為空,為什么精確匹配卻查不到我輸入的准確值呢???之前說過咱們在創建type時,字段指定的分詞器,如果輸入未被分析出來的詞是查不到結果的,讓我們證明一下!!!!
首先先查看一下咱們查詢的詞被分析成哪幾部分
GET my_index/_analyze { "text":"張三李四王五", "analyzer": "ik_max_word" } 結果: { "tokens": [ { "token": "張三李四", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 }, { "token": "張三", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 1 }, { "token": "三", "start_offset": 1, "end_offset": 2, "type": "TYPE_CNUM", "position": 2 }, { "token": "李四", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 3 }, { "token": "四", "start_offset": 3, "end_offset": 4, "type": "TYPE_CNUM", "position": 4 }, { "token": "王", "start_offset": 4, "end_offset": 5, "type": "CN_CHAR", "position": 5 }, { "token": "五", "start_offset": 5, "end_offset": 6, "type": "TYPE_CNUM", "position": 6 } ] }
結果說明,張三李四王五被沒有被分析成張三李四王五,所以查詢結果為空。
解決方法:更新type中字段屬性值,自定義一個映射指定類型為keyword類型,該類型在es中是指不會被分詞器分析,也就是說這就是傳說中的准確不能再准確的值了
POST /my_index/_mapping/my_type { "properties": { "name": { "type": "text", "analyzer": "ik_max_word_pinyin", "fields": { "keyword":{ //自定義映射名 "type": "keyword" } } } } }
設置好完成后,需將原有的數據刪除在添加一遍,再次查詢就能查到了
public List<User> termQuery() { QueryBuilder queryBuilder = QueryBuilders.termQuery("age",20); // QueryBuilder queryBuilder = QueryBuilders.termQuery("name.keyword","張三李四王五"); SearchQuery searchQuery = new NativeSearchQueryBuilder() .withIndices("my_index") .withTypes("my_type") .withQuery(queryBuilder) .build(); List<User> list = template.queryForList(searchQuery,User.class); return list; }
四、組合過濾器
布爾過濾器
注意:官方文檔有點問題,在5.X后,filtered 被bool代替了,The filtered
query is replaced by the bool query。
一個 bool
過濾器由三部分組成:
{ "bool" : { "must" : [], "should" : [], "must_not" : [], } }
must
所有的語句都 必須(must) 匹配,與 AND
等價。
must_not
所有的語句都 不能(must not) 匹配,與 NOT
等價。
should
至少有一個語句要匹配,與 OR
等價。
GET /my_index/my_type/_search { "query" : { "bool" : { "should" : [ { "term" : {"age" : 20}}, { "term" : {"age" : 30}} ], "must" : { "term" : {"name.keyword" : "張三"} } } } }
public List<User> boolQuery() { BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); boolQueryBuilder.should(QueryBuilders.termQuery("age",20)); boolQueryBuilder.should(QueryBuilders.termQuery("age",30)); boolQueryBuilder.must(QueryBuilders.termQuery("name.keyword","張三")); SearchQuery searchQuery = new NativeSearchQueryBuilder() .withIndices("my_index") .withTypes("my_type") .withQuery(boolQueryBuilder) .build(); List<User> list = template.queryForList(searchQuery,User.class); return list; }
嵌套布爾過濾器
盡管 bool
是一個復合的過濾器,可以接受多個子過濾器,需要注意的是 bool
過濾器本身仍然還只是一個過濾器。 這意味着我們可以將一個 bool
過濾器置於其他 bool
過濾器內部,這為我們提供了對任意復雜布爾邏輯進行處理的能力。
GET /my_index/my_type/_search { "query" : { "bool" : { "should" : [ { "term" : {"age" : 20}}, { "bool" : { "must": [ {"term": { "name.keyword": { "value": "李四" } }} ] }} ] } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "張三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "張三李四王五", "age": 20 } } ] } }
因為 term
和 bool
過濾器是兄弟關系,他們都處於外層的布爾邏輯 should
的內部,返回的命中文檔至少須匹配其中一個過濾器的條件。
這兩個 term
語句作為兄弟關系,同時處於 must
語句之中,所以返回的命中文檔要必須都能同時匹配這兩個條件。
五、查找多個精確值
GET my_index/my_type/_search { "query": { "terms": { "age": [ 20, 22 ] } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 1, "_source": { "id": 2, "name": "張四", "age": 22 } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "張三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "張三李四王五", "age": 20 } } ] } }
一定要了解 term
和 terms
是 包含(contains) 操作,而非 等值(equals) (判斷)。
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("age",list);
六、范圍查詢
1、數字范圍查詢
GET my_index/my_type/_search { "query": { "range": { "age": { "gte": 10, "lte": 20 } } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "張三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "張三李四王五", "age": 20 } } ] } }
注:gt(大於) gte(大於等於) lt(小於) lte(小於等於)
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("age").gte(10).lte(20);
2.對於時間范圍查詢
更新type,添加時間字段
POST /my_index/_mapping/my_type
{
"properties": {
"date":{
"type":"date",
"format":"yyyy-MM-dd"
}
}
}
添加數據:
POST /my_index/my_type/_bulk { "index": { "_id":4}} { "id":4,"name": "趙六","age":20,"date":"2018-10-1"} { "index": { "_id": 5}} { "id":5,"name": "對七","age":22,"date":"2018-11-20"} { "index": { "_id": 6}} { "id":6,"name": "王八","age":20,"date":"2018-7-28"}
查詢:
GET my_index/my_type/_search { "query": { "range": { "date": { "gte": "2018-10-20", "lte": "2018-11-29" } } } } 結果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "5", "_score": 1, "_source": { "id": 5, "name": "對七", "age": 22, "date": "2018-11-20" } } ] } }
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("date").gte("2018-10-20").lte("2018-11-29");
七、處理null值
1.添加數據
POST /my_index/posts/_bulk { "index": { "_id": "1" }} { "tags" : ["search"] } { "index": { "_id": "2" }} { "tags" : ["search", "open_source"] } { "index": { "_id": "3" }} { "other_field" : "some data" } { "index": { "_id": "4" }} { "tags" : null } { "index": { "_id": "5" }} { "tags" : ["search", null] }
2.查詢指定字段存在的數據
GET /my_index/posts/_search { "query" : { "constant_score" : { //不在去計算評分,默認都是1 "filter" : { "exists" : { "field" : "tags" } } } } } 結果: { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "posts", "_id": "5", "_score": 1, "_source": { "tags": [ "search", null ] } }, { "_index": "my_index", "_type": "posts", "_id": "2", "_score": 1, "_source": { "tags": [ "search", "open_source" ] } }, { "_index": "my_index", "_type": "posts", "_id": "1", "_score": 1, "_source": { "tags": [ "search" ] } } ] } }
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.constantScoreQuery(QueryBuilders.existsQuery("tags")));
3.查詢指定字段缺失數據
注:Filter Query Missing 已經從 ES 5 版本移除
GET /my_index/posts/_search { "query" : { "bool": { "must_not": [ {"constant_score": { "filter": { "exists": { "field": "tags" }} }} ] } } } 查詢結果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "posts", "_id": "4", "_score": 1, "_source": { "tags": null } }, { "_index": "my_index", "_type": "posts", "_id": "3", "_score": 1, "_source": { "other_field": "some data" } } ] } }
注:處理null值,當字段內容為空時,將自定義將其當做為null值處理
boolQueryBuilder.mustNot(QueryBuilders.boolQuery().filter(QueryBuilders.constantScoreQuery(QueryBuilders.existsQuery("tags"))));
八、關於緩存
1.核心
其核心實際是采用一個 bitset 記錄與過濾器匹配的文檔。Elasticsearch 積極地把這些 bitset 緩存起來以備隨后使用。一旦緩存成功,bitset 可以復用 任何 已使用過的相同過濾器,而無需再次計算整個過濾器。
這些 bitsets 緩存是“智能”的:它們以增量方式更新。當我們索引新文檔時,只需將那些新文檔加入已有 bitset,而不是對整個緩存一遍又一遍的重復計算。和系統其他部分一樣,過濾器是實時的,我們無需擔心緩存過期問題。
2.獨立的過濾器緩存
屬於一個查詢組件的 bitsets 是獨立於它所屬搜索請求其他部分的。這就意味着,一旦被緩存,一個查詢可以被用作多個搜索請求。bitsets 並不依賴於它所存在的查詢上下文。這樣使得緩存可以加速查詢中經常使用的部分,從而降低較少、易變的部分所帶來的消耗。
同樣,如果單個請求重用相同的非評分查詢,它緩存的 bitset 可以被單個搜索里的所有實例所重用。
讓我們看看下面例子中的查詢,它查找滿足以下任意一個條件的電子郵件:
查詢條件(例子):(1)在收件箱中,且沒有被讀過的 (2)不在 收件箱中,但被標注重要的
GET /inbox/emails/_search { "query": { "constant_score": { "filter": { "bool": { "should": [ { "bool": { 1 "must": [ { "term": { "folder": "inbox" }}, { "term": { "read": false }} ] }}, { "bool": { 2 "must_not": { "term": { "folder": "inbox" } }, "must": { "term": { "important": true } } }} ] } } } } }
1和2共用的一個過濾器,所以使用同一個bitset
盡管其中一個收件箱的條件是 must
語句,另一個是 must_not
語句,但他們兩者是完全相同的。這意味着在第一個語句執行后, bitset 就會被計算然后緩存起來供另一個使用。當再次執行這個查詢時,收件箱的這個過濾器已經被緩存了,所以兩個語句都會使用已緩存的 bitset 。
這點與查詢表達式(query DSL)的可組合性結合得很好。它易被移動到表達式的任何地方,或者在同一查詢中的多個位置復用。這不僅能方便開發者,而且對提升性能有直接的益處。
3.自動緩存行為
在 Elasticsearch 的較早版本中,默認的行為是緩存一切可以緩存的對象。這也通常意味着系統緩存 bitsets 太富侵略性,從而因為清理緩存帶來性能壓力。不僅如此,盡管很多過濾器都很容易被評價,但本質上是慢於緩存的(以及從緩存中復用)。緩存這些過濾器的意義不大,因為可以簡單地再次執行過濾器。
檢查一個倒排是非常快的,然后絕大多數查詢組件卻很少使用它。例如 term
過濾字段 "user_id"
:如果有上百萬的用戶,每個具體的用戶 ID 出現的概率都很小。那么為這個過濾器緩存 bitsets 就不是很合算,因為緩存的結果很可能在重用之前就被剔除了。
這種緩存的擾動對性能有着嚴重的影響。更嚴重的是,它讓開發者難以區分有良好表現的緩存以及無用緩存。
為了解決問題,Elasticsearch 會基於使用頻次自動緩存查詢。如果一個非評分查詢在最近的 256 次查詢中被使用過(次數取決於查詢類型),那么這個查詢就會作為緩存的候選。但是,並不是所有的片段都能保證緩存 bitset 。只有那些文檔數量超過 10,000 (或超過總文檔數量的 3% )才會緩存 bitset 。因為小的片段可以很快的進行搜索和合並,這里緩存的意義不大。
一旦緩存了,非評分計算的 bitset 會一直駐留在緩存中直到它被剔除。剔除規則是基於 LRU 的:一旦緩存滿了,最近最少使用的過濾器會被剔除。