1.搜索api
1.1.routing:路由
執行搜索時,它將廣播到所有索引/索引分片(副本之間的循環)。可以通過提供routing
參數來控制將搜索哪些分片。例如,在索引book時,路由值可以是name。
POST book/english?routing=test
{
"name":"test",
"age":"1",
"book":"zhegnsh1正式"
}
按路由查詢:
POST book/_search?routing=test
{
"query": { "bool" : { "must" : { "query_string" : { "query" : "good" } }, "filter" : { "term" : { "name" : "test" } } } } }
"query" : "good" 全文搜索模糊匹配,返回任何字段包含"good”的記錄。
1.2.adaptive replica selection:自適應副本選擇
作為以循環方式發送到數據副本的請求的替代方法,可以啟用自適應副本選擇。這允許協調節點根據許多標准將請求發送到被認為“最佳”的副本:
- 協調節點與包含數據副本的節點之間的過去請求的響應時間
- 超過搜索請求的時間在包含數據的節點上執行
- 包含數據的節點上的搜索線程池的隊列大小
這可以通過改變所述動態群集配置開啟 cluster.routing.use_adaptive_replica_selection
從false
到true
PUT /_cluster/settings { "transient": { "cluster.routing.use_adaptive_replica_selection": true } }
1.3.Stats Groups:統計組
搜索可以與統計組相關聯,統計組維護每個組的統計聚合。稍后可以使用indices stats API專門檢索它 。例如,以下是將請求與兩個不同的組相關聯的搜索正文請求:
POST /_search
{
"query" : { "match_all" : {} }, "stats" : ["group1", "group2"] }
1.4.全局搜索超時
作為請求正文搜索的一部分,單個搜索可能會超時 。由於搜索請求可以源自多個源,因此Elasticsearch具有全局搜索超時的動態集群級設置,適用於未在請求正文搜索中設置超時的所有搜索請求。默認值為無全局超時。search.default_search_timeout
可以使用“ 群集更新設置”端點設置和設置設置密鑰。設置此值可-1
將全局搜索超時重置為無超時。
1.5.搜索取消
可以使用標准任務取消 機制取消搜索。默認情況下,正在運行的搜索僅檢查是否在段邊界上取消它,因此取消可能會被大段延遲。通過將動態集群級別設置設置search.low_level_cancellation
為,可以提高搜索取消響應性true
。但是,它帶來了更頻繁的取消檢查的額外開銷,這在大型快速運行的搜索查詢中是顯而易見的。更改此設置僅影響更改后開始的搜索。
1.6.搜索並發和並行
默認情況下,Elasticsearch不會根據請求命中的分片數拒絕任何搜索請求。雖然Elasticsearch將優化協調節點上的搜索執行,但大量分片會對CPU和內存產生重大影響。以這樣的方式組織數據通常是一個更好的主意,即更少的大分片。如果您要配置軟限制,可以更新action.search.shard_count.limit
群集設置以拒絕搜索過多分片的搜索請求。
request參數max_concurrent_shard_requests
可用於控制搜索API將為請求執行的最大並發分片請求數。此參數應用於保護單個請求不會使群集過載(例如,默認請求將命中群集中的所有索引,如果每個節點的分片數量很高,則可能導致碎片請求被拒絕)。此默認值基於群集中的數據節點數,但最多256
。
2.multi-index,multi-type:多索引,多類型搜索
2.1.單個索引的所有類型
GET /book/_search?q=name:bb
返回name字段包含‘bb’的所有文檔(模糊查詢)
response:

{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 4.181759, "hits": [ { "_index": "book", "_type": "english", "_id": "nwmH_mQBbhSmAk-T97Mf", "_score": 4.181759, "_source": { "name": "bb傳交換機發個沙發覆蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" } } ] } }
2.2.搜索特定類型
GET /twitter/tweet,user/_search?q=user:kimchy
2.3.搜索多個索引
GET book1,book/_search?q=name:bb
2.4.搜索所有索引
GET _all/_search?q=name:bb
或者
GET _search?q=name:bb
其他參數解釋如下:
|
查詢字符串(映射到 |
|
在查詢中未定義字段前綴時使用的默認字段。 |
|
分析查詢字符串時要使用的分析器名稱。 |
|
是否應分析通配符和前綴查詢。默認為 |
|
應在協調節點上一次減少的分片結果數。如果請求中潛在的分片數量很大,則應將此值用作保護機制,以減少每個搜索請求的內存開銷。 |
|
要使用的默認運算符可以是 |
|
如果設置為true將導致忽略基於格式的失敗(如向數字字段提供文本)。默認為false。 |
|
對於每個命中,包含如何計算命中得分的解釋。 |
|
設置為 |
|
每個匹配返回的文檔的選擇性存儲字段,逗號分隔。不指定任何值將導致沒有字段返回。 |
|
排序執行。可以是 |
|
排序時,設置為 |
|
設置為 |
|
搜索超時,將搜索請求限制在指定的時間值內執行,並使用在到期時累積的點擊數進行保釋。默認為無超時。 |
|
在達到查詢執行將提前終止時,為每個分片收集的最大文檔數。如果設置,響應將具有一個布爾字段, |
|
從命中的索引開始返回。默認為 |
|
要返回的點擊次數。默認為 |
|
要執行的搜索操作的類型。可以是 |
|
|
官方文檔參考:Search Api
3.Request body search:帶參數條件搜索
3.1.query term搜索整個詞
POST book1/_search { "query" : { "term" : { "name" : "test goog money" } } }
response

{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }
由於默認分詞器把“test goog my money”,分成了三個單詞,所有沒有匹配到。
3.2.query match分詞后搜索
post book1/_search { "query": { "match":{ "name":"test goog money" } } }
response

{ "took": 17, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 11.610666, "hits": [ { "_index": "book1", "_type": "english", "_id": "5oVDQ2UBRzBxBrDgtIl0", "_score": 11.610666, "_source": { "name": "test goog my money", "age": 12, "class": "dsfdsf", "addr": "中國" } }, { "_index": "book1", "_type": "english", "_id": "32", "_score": 1.8562036, "_source": { "name": "test", "age": "1" } }, { "_index": "book1", "_type": "english", "_id": "33", "_score": 1.8562036, "_source": { "name": "test", "age": "1" } } ] } }
3.3.query_string分詞后匹配任何字段
{ "query": { "query_string":{ "query":"test goog my money 國" } } }
response

{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 22, "max_score": 11.610666, "hits": [ { "_index": "book1", "_type": "english", "_id": "5oVDQ2UBRzBxBrDgtIl0", "_score": 11.610666, "_source": { "name": "test goog my money", "age": 12, "class": "dsfdsf", "addr": "中國" } }, { "_index": "book1", "_type": "english", "_id": "lAmG_mQBbhSmAk-T-bN1", "_score": 2.016771, "_source": { "name": "國1里", "age": 12, "class": "dsfdsf", "addr": "中國" } }, { "_index": "book1", "_type": "english", "_id": "32", "_score": 1.8562036, "_source": { "name": "test", "age": "1" } }, { "_index": "book1", "_type": "english", "_id": "33", "_score": 1.8562036, "_source": { "name": "test", "age": "1" } }, { "_index": "book1", "_type": "english", "_id": "jgmG_mQBbhSmAk-TnrMw", "_score": 1.5432179, "_source": { "name": "國國", "age": 12, "class": "dsfdsf", "addr": "中國" } }, { "_index": "book1", "_type": "english", "_id": "kgmG_mQBbhSmAk-T6bOW", "_score": 1.5067708, "_source": { "name": "國1里國", "age": 12, "class": "dsfdsf", "addr": "中國1" } }, { "_index": "book1", "_type": "english", "_id": "kwmG_mQBbhSmAk-T8bN7", "_score": 1.5067708, "_source": { "name": "國1里國", "age": 12, "class": "dsfdsf", "addr": "中國" } }, { "_index": "book1", "_type": "english", "_id": "mgmH_mQBbhSmAk-TbbMX", "_score": 0.18232156, "_source": { "name": "里個覆蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" } }, { "_index": "book1", "_type": "english", "_id": "mwmH_mQBbhSmAk-TerNv", "_score": 0.18232156, "_source": { "name": "里個蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" } }, { "_index": "book1", "_type": "english", "_id": "ngmH_mQBbhSmAk-T6LPZ", "_score": 0.13353139, "_source": { "name": "cvh交換機發個沙發覆蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" } } ] } }
參數說明:
|
|
|
從某個偏移量中檢索命中。默認為 |
|
要返回的點擊次數。默認為 |
|
要執行的搜索操作的類型。可以是 |
|
設置為 |
|
|
|
在達到查詢執行將提前終止時,為每個分片收集的最大文檔數。如果設置,響應將具有一個布爾字段, |
|
應在協調節點上一次減少的分片結果數。如果請求中潛在的分片數量很大,則應將此值用作保護機制,以減少每個搜索請求的內存開銷。 |
出了上述情況,search_type
,request_cache
和allow_partial_search_results
設置必須作為查詢字符串參數傳遞。搜索請求的其余部分應該在正文中傳遞。正文內容也可以作為名為的REST參數傳遞source
。
HTTP GET和HTTP POST都可用於使用body執行搜索。由於並非所有客戶端都支持使用正文GET,因此也允許使用POST。
3.4 From / Size:分頁
POST _search { "from" : 0, "size" : 2, "query": { "query_string":{ "query":"test goog my money國" } } }
默認:"from" : 0, "size" : 10
4.Sort:排序
{ "sort" : [ { "name" : "desc" }, "_score" ], "query": { "term":{ "name":"國" } } }
如果報錯:
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [class] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
開啟該字段 fielddata=true :
PUT /book1/_mapping/english/?pretty {"english":{"properties":{"name":{"type":"text","fielddata":true}}}}
在對其
進行排序時,順序默認為desc_score
,在對其他任何內容進行排序時,默認為asc。
也可以多個字段排序

{ "sort" : [ { "name" : {"order" : "desc"}}, // 此寫法與下面的寫法等價 { "age" : "desc" }, "_score" ], "query": { "term":{ "name":"國" } } }
4.1.數組字段排序
Elasticsearch支持按數組或多值字段進行排序。該mode
選項控制選擇哪個數組值以對其所屬的文檔進行排序。該mode
選項可以具有以下值:
|
選擇最低值。 |
|
選擇最高價值。 |
|
使用所有值的總和作為排序值。僅適用於基於數字的數組字段。 |
|
使用所有值的平均值作為排序值。僅適用於基於數字的數組字段。 |
|
使用所有值的中位數作為排序值。僅適用於基於數字的數組字段。 |
示例:
POST book1/_search { "sort" : [ {"age" : {"order" : "asc", "mode" : "avg"}} ], "query": { "term":{ "name":"test" } } }
response:

{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 5, "max_score": null, "hits": [ { "_index": "book1", "_type": "english", "_id": "32", "_score": null, "_source": { "name": "test", "age": "1" }, "sort": [ 1 ] }, { "_index": "book1", "_type": "english", "_id": "33", "_score": null, "_source": { "name": "test", "age": "1" }, "sort": [ 1 ] }, { "_index": "book1", "_type": "english", "_id": "5oVDQ2UBRzBxBrDgtIl0", "_score": null, "_source": { "name": "test goog my money", "age": 12, "class": "dsfdsf", "addr": "中國" }, "sort": [ 12 ] }, { "_index": "book1", "_type": "english", "_id": "54UiUmUBRzBxBrDgfIl9", "_score": null, "_source": { "name": "test goog my money", "age": [ 11, 13, 14 ], "class": "dsfdsf", "addr": "中國" }, "sort": [ 13 ] }, { "_index": "book1", "_type": "english", "_id": "6IUkUmUBRzBxBrDgFok2", "_score": null, "_source": { "name": "test goog my money", "age": [ 14, 54, 45, 34 ], "class": "dsfdsf", "addr": "中國" }, "sort": [ 37 ] } ] } }
4.2嵌套查詢排序
該字段必須是嵌套字段才行

POST /_search { "query" : { "term" : { "product" : "chocolate" } }, "sort" : [ { "offer.price" : { "mode" : "avg", "order" : "asc", "nested": { "path": "offer", "filter": { "term" : { "offer.color" : "blue" } } } } } ] }
4.3缺失值
該missing
參數指定如何其缺少字段文檔應被視為:該missing
值可以被設置為_last
,_first
或自定義的值(將被用於缺少文檔作為排序值)。默認是_last
。
4.4.忽略未映射的字段
默認情況下,如果沒有與字段關聯的映射,搜索請求將失敗。該unmapped_type
選項允許忽略沒有映射但不按其排序的字段。此參數的值用於確定要發出的排序值。以下是如何使用它的示例:
GET /_search { "sort" : [ { "price" : {"unmapped_type" : "long"} } ], "query" : { "term" : { "product" : "chocolate" } } }
4.5.地理距離排序
_geo_distance
。這是一個例子,假設
pin.location
是一個類型的字段
geo_point
:
GET /_search { "sort" : [ { "_geo_distance" : { "pin.location" : [-70, 40], "order" : "asc", "unit" : "km", "mode" : "min", "distance_type" : "arc" } } ], "query" : { "term" : { "user" : "kimchy" } } }
4.6._source:控制返回的字段
GET /_search { "_source": { "includes": [ "obj1.*", "obj2.*" ], "excludes": [ "*.description" ] }, "query" : { "term" : { "user" : "kimchy" } } }
返回匹配includes的,去除匹配excludes的字段!
4.7.Script Field:修改返回的字段
POST book1/_search
{ "script_fields" : { "test1" : { "script" : { "lang": "painless", "source": "doc['age'].value * 2" } }, "test2" : { "script" : { "lang": "painless", "source": "doc['age'].value * params.factor", "params" : { "factor" : 2.0 } } } }, "query": { "term":{ "name":"test" } } }
response:

{ "took": 135, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 5, "max_score": 2.9026666, "hits": [ { "_index": "book1", "_type": "english", "_id": "5oVDQ2UBRzBxBrDgtIl0", "_score": 2.9026666, "fields": { "test1": [ 24 ], "test2": [ 24 ] } }, { "_index": "book1", "_type": "english", "_id": "6IUkUmUBRzBxBrDgFok2", "_score": 2.1818507, "fields": { "test1": [ 28 ], "test2": [ 28 ] } }, { "_index": "book1", "_type": "english", "_id": "32", "_score": 1.5205609, "fields": { "test1": [ 2 ], "test2": [ 2 ] } }, { "_index": "book1", "_type": "english", "_id": "33", "_score": 1.5205609, "fields": { "test1": [ 2 ], "test2": [ 2 ] } }, { "_index": "book1", "_type": "english", "_id": "54UiUmUBRzBxBrDgfIl9", "_score": 1.0615592, "fields": { "test1": [ 22 ], "test2": [ 22 ] } } ] } }
4.7.1控制返回字段
POST book1/_search { "script_fields" : { "test1" : { "script" : "params['_source']['addr']" } }, "query": { "term":{ "name":"test" } } }
請注意_source
此處的關鍵字以導航類似json的模型。
理解之間的區別是很重要的 doc['my_field'].value
和params['_source']['my_field']
。第一個,使用doc關鍵字,將導致該字段的術語加載到內存(緩存),這將導致更快的執行,但更多的內存消耗。此外,doc[...]
符號僅允許簡單的值字段(您不能從中返回json對象),並且僅對非分析或基於單個術語的字段有意義。但是,doc
如果可能的話,仍然是使用文檔中值的推薦方法,因為_source
必須在每次使用時加載和解析。使用_source
非常慢。
4.8.Doc Value Ffields:返回匹配文檔的所有分詞
POST /book1/_search { "query": { "bool":{ "filter":[ {"term":{"name":"test"}}, { "term":{"addr":"中"}} ] } }, "docvalue_fields" : ["name", "addr"] }
repsonse:

{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 0, "hits": [ { "_index": "book1", "_type": "english", "_id": "6IUkUmUBRzBxBrDgFok2", "_score": 0, "_source": { "name": "test goog my money", "age": [ 14, 54, 45, 34 ], "class": "dsfdsf", "addr": "中國" }, "fields": { "name": [ "goog", "money", "my", "test" ], "addr": [ "中", "國" ] } }, { "_index": "book1", "_type": "english", "_id": "54UiUmUBRzBxBrDgfIl9", "_score": 0, "_source": { "name": "test goog my money", "age": [ 11, 13, 14 ], "class": "dsfdsf", "addr": "中國" }, "fields": { "name": [ "goog", "money", "my", "test" ], "addr": [ "中", "國" ] } }, { "_index": "book1", "_type": "english", "_id": "5oVDQ2UBRzBxBrDgtIl0", "_score": 0, "_source": { "name": "test goog my money", "age": 12, "class": "dsfdsf", "addr": "中國" }, "fields": { "name": [ "goog", "money", "my", "test" ], "addr": [ "中", "國" ] } } ] } }
Doc Value和Ffields的更多了解:ES-正排索Doc Values和Field Data
4.9 Post Filter:聚合過濾
在已經計算聚合之后,將post_filter
其應用於hits
搜索請求最末端的搜索。
POST /shirts/_search
{
"query": {
"bool": {
"filter": {
"term": { "brand": "gucci" } //詢現在可以找到Gucci的所有襯衫,無論顏色如何
}
}
},
"aggs": {
"colors": {
"terms": { "field": "color" } //返回流行的(出現在文檔最多頻率的顏色)的Gucci的所有襯衫
},
"color_red": {
"filter": {
"term": { "color": "red" } // 顏色為紅色的Gucci的所有襯衫
},
"aggs": {
"models": {
"terms": { "field": "model" } // 最流行的款式(出現在文檔最多頻率的顏色)的Gucci的所有襯衫
}
}
}
},
"post_filter": {
"term": { "color": "red" } // 搜索中刪除紅色以外的顏色的記錄
}
}
5.HighLighting:搜索結果高亮顯示
POST book1/_search { "query":{ "bool": { "filter": [ { "term": { "name": "里" }}, { "term": { "age": "12" }} ] } }, "highlight" : { "fields" : { "name" : {"type" : "plain"} } } }
response:


{ "took": 77, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 11, "max_score": 0, "hits": [ { "_index": "book1", "_type": "english", "_id": "mgmH_mQBbhSmAk-TbbMX", "_score": 0, "_source": { "name": "里個覆蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "<em>里</em>個覆蓋否" ] } }, { "_index": "book1", "_type": "english", "_id": "mwmH_mQBbhSmAk-TerNv", "_score": 0, "_source": { "name": "里個蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "<em>里</em>個蓋否" ] } }, { "_index": "book1", "_type": "english", "_id": "lAmG_mQBbhSmAk-T-bN1", "_score": 0, "_source": { "name": "國1里", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "國1<em>里</em>" ] } }, { "_index": "book1", "_type": "english", "_id": "lQmH_mQBbhSmAk-TDrNt", "_score": 0, "_source": { "name": "里", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "<em>里</em>" ] } }, { "_index": "book1", "_type": "english", "_id": "lwmH_mQBbhSmAk-TObNH", "_score": 0, "_source": { "name": "里fgsaf覆蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "<em>里</em>fgsaf覆蓋否" ] } }, { "_index": "book1", "_type": "english", "_id": "mAmH_mQBbhSmAk-TTbO2", "_score": 0, "_source": { "name": "里jhj發個沙發覆蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "<em>里</em>jhj發個沙發覆蓋否" ] } }, { "_index": "book1", "_type": "english", "_id": "kgmG_mQBbhSmAk-T6bOW", "_score": 0, "_source": { "name": "國1里國", "age": 12, "class": "dsfdsf", "addr": "中國1" }, "highlight": { "name": [ "國1<em>里</em>國" ] } }, { "_index": "book1", "_type": "english", "_id": "kwmG_mQBbhSmAk-T8bN7", "_score": 0, "_source": { "name": "國1里國", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "國1<em>里</em>國" ] } }, { "_index": "book1", "_type": "english", "_id": "nAmH_mQBbhSmAk-Tg7OW", "_score": 0, "_source": { "name": "里蓋否", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "<em>里</em>蓋否" ] } }, { "_index": "book1", "_type": "english", "_id": "nQmH_mQBbhSmAk-TkbP4", "_score": 0, "_source": { "name": "里否", "age": 12, "class": "dsfdsf", "addr": "中國" }, "highlight": { "name": [ "<em>里</em>否" ] } } ] } }
默認<em></em>標簽包裹,也可以自定義標簽,例如:<span></span>
5.1.自定義標簽
{ "query":{ "bool": { "filter": [ { "term": { "name": "里" }}, { "term": { "age": "12" }} ] } }, "highlight" : { "pre_tags" : ["<span>"], "post_tags" : ["</span>"], "fields" : { "name" : {"type" : "plain"} } } }
5.2.控制高亮區域和返回片段數量
{ "query":{ "match": { "name":"the 里" } }, "highlight" : { "pre_tags" : ["<tag1>"], "post_tags" : ["</tag1>"], "type": "plain", "fragment_size" : 20, "number_of_fragments" : 5, "fields" : { "name":{} } } }
force_source:即使字段單獨存儲,也會根據源突出顯示。默認為false
。分段器
指定如何在高亮片段中分解文本:simple
或span
。僅適用於plain
熒光筆。默認為span
。
-
simple
- 將文本分解為相同大小的片段。
-
span
- 將文本分解為相同大小的片段,但試圖避免在突出顯示的術語之間分解文本,默認。
fragment_offset控制要開始突出顯示的邊距。僅在使用fvh
熒光筆時有效。fragment_size突出顯示的片段的大小(以字符為單位)默認為100。
matched_fields:在多個字段上組合匹配以突出顯示單個字段。對於以不同方式分析相同字符串的多字段,這是最直觀的。所有matched_fields
必須term_vector
設置為 with_positions_offsets
,但只加載組合匹配的字段,因此只有該字段從store
設置為受益 yes
。僅適用於fvh
熒光筆。
no_match_size:如果沒有要突出顯示的匹配片段,則要從字段開頭返回的文本量。默認為0(不返回任何內容)。
number_of_fragments:要返回的最大片段數。如果片段數設置為0,則不返回任何片段。而是突出顯示並返回整個字段內容。當您需要突出顯示標題或地址等短文本時,這可能很方便,但不需要分段。如果number_of_fragments
為0,fragment_size
則忽略。默認為5。
order:設置為時按排名突出顯示片段score
。默認情況下,片段將按照它們在字段中出現的順序輸出(順序:) none
。將此選項設置為score
將首先輸出最相關的片段。每個熒光筆都應用自己的邏輯來計算相關性分數。有關 不同熒光筆如何找到最佳碎片的更多詳細信息,請參閱文檔高亮顯示器如何在內部工作。
phrase_limit:控制考慮的文檔中匹配短語的數量。防止fvh
熒光筆分析太多短語並消耗太多內存。使用時matched_fields
,phrase_limit
會考慮每個匹配字段的短語。提高限制會增加查詢時間並消耗更多內存。僅由fvh
熒光筆支持。默認為256。
require_field_match:默認情況下,僅突出顯示包含查詢匹配的字段。設置require_field_match
為false
突出顯示所有字段。默認為true
。
tags_schema:設置為styled
使用內置標記架構。該styled
架構定義了如下的pre_tags
並定義post_tags
為</em>
。
type:unified
,plain
或fvh
。默認為 unified
。
5.3.hightlighter類型
Elasticsearch支持三種hightlighter:unified
,plain
和fvh
(快速矢量熒光筆)。可以指定type
要為每個字段使用的突出顯示器。
unified
該unified
熒光筆使用Lucene的統一hightlighter。這個hightlighter將文本分成句子,並使用BM25算法對單個句子進行評分,就好像它們是語料庫中的文檔一樣。它還支持准確的短語和多項(模糊,前綴,正則表達式)突出顯示。這是默認的hightlighter。
plain
該plain hightlighter
使用標准Lucene的hightlighter。它試圖在詞匯查詢中理解單詞重要性和任何單詞定位標准方面反映查詢匹配邏輯。
該plain hightlighter
最適合在單一field突出簡單的查詢匹配。為了准確反映查詢邏輯,它會創建一個微小的內存中索引,並通過Lucene的查詢執行計划程序重新運行原始查詢條件,以訪問當前文檔的低級別匹配信息。對每個字段和需要突出顯示的每個文檔重復此操作。如果要在復雜查詢的大量文檔中突出顯示很多字段,我們建議使用unified hightlighter
postings
或term_vector
字段。
fvh
該fvh
熒光筆使用Lucene的快速hightlighter。此突出顯示器可用於映射中term_vector
設置為的 字段with_positions_offsets。
- 需要設置
term_vector
以with_positions_offsets
增加索引的大小 - 可以將來自多個字段的匹配組合成一個結果。看到
matched_fields
- 可以為不同位置的匹配分配不同的權重,允許在突出顯示提升詞組匹配的提升查詢時,將詞組匹配等術語排序在術語匹配之上