Elasticsearch mapping文档相似性算法

本文转载自查看原文 2017-02-27 11:00 1527 elasticsearch

Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similaritysetting provides a simple way of choosing a similarity algorithm other than the default TF/IDF, such as BM25.

Similarities are mostly useful for text fields, but can also apply to other field types.

Custom similarities can be configured by tuning the parameters of the built-in similarities. For more details about this expert options, see the similarity module.

The only similarities which can be used out of the box, without any further configuration are:

BM25: The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene. See Pluggable Similarity Algorithms for more information.
classic: The TF/IDF algorithm which used to be the default in Elasticsearch and Lucene. See Lucene’s Practical Scoring Function for more information.

The similarity can be set on the field level when a field is first created, as follows:

PUT my_index
{ "mappings": { "my_type": { "properties": { "default_field": {  "type": "text" }, "classic_field": { "type": "text", "similarity": "classic"  } } } } }

COPY AS CURL VIEW IN CONSOLE

	The `default_field` uses the `BM25` similarity.
	The `classic_field` uses the `classic` similarity (ie TF/IDF).

参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/similarity.html

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 2020年文档相似性算法：初学者教程相似性度量 - 数据挖掘算法（2） TF-IDF算法与余弦相似性文本相似性计算--MinHash和LSH算法信号相似性的描述基于python语言使用余弦相似性算法进行文本相似度分析英文句子相似性判断向量的相似性度量 python 地址相似性判断 facebook 相似性搜索库 faiss