Elasticsearch mapping文档相似性算法


Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similaritysetting provides a simple way of choosing a similarity algorithm other than the default TF/IDF, such as BM25.

Similarities are mostly useful for text fields, but can also apply to other field types.

Custom similarities can be configured by tuning the parameters of the built-in similarities. For more details about this expert options, see the similarity module.

The only similarities which can be used out of the box, without any further configuration are:

BM25
The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene. See  Pluggable Similarity Algorithms for more information.
classic
The TF/IDF algorithm which used to be the default in Elasticsearch and Lucene. See  Lucene’s Practical Scoring Function for more information.

The similarity can be set on the field level when a field is first created, as follows:

PUT my_index
{ "mappings": { "my_type": { "properties": { "default_field": {  "type": "text" }, "classic_field": { "type": "text", "similarity": "classic"  } } } } }

The default_field uses the BM25 similarity.

The classic_field uses the classic similarity (ie TF/IDF).

 

参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/similarity.html


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM