Elasticsearch mapping文檔相似性算法


Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similaritysetting provides a simple way of choosing a similarity algorithm other than the default TF/IDF, such as BM25.

Similarities are mostly useful for text fields, but can also apply to other field types.

Custom similarities can be configured by tuning the parameters of the built-in similarities. For more details about this expert options, see the similarity module.

The only similarities which can be used out of the box, without any further configuration are:

BM25
The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene. See  Pluggable Similarity Algorithms for more information.
classic
The TF/IDF algorithm which used to be the default in Elasticsearch and Lucene. See  Lucene’s Practical Scoring Function for more information.

The similarity can be set on the field level when a field is first created, as follows:

PUT my_index
{ "mappings": { "my_type": { "properties": { "default_field": {  "type": "text" }, "classic_field": { "type": "text", "similarity": "classic"  } } } } }

The default_field uses the BM25 similarity.

The classic_field uses the classic similarity (ie TF/IDF).

 

參考:https://www.elastic.co/guide/en/elasticsearch/reference/current/similarity.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM