Elasticsearch mapping文檔相似性算法

本文轉載自查看原文 2017-02-27 11:00 1527 elasticsearch

Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similaritysetting provides a simple way of choosing a similarity algorithm other than the default TF/IDF, such as BM25.

Similarities are mostly useful for text fields, but can also apply to other field types.

Custom similarities can be configured by tuning the parameters of the built-in similarities. For more details about this expert options, see the similarity module.

The only similarities which can be used out of the box, without any further configuration are:

BM25: The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene. See Pluggable Similarity Algorithms for more information.
classic: The TF/IDF algorithm which used to be the default in Elasticsearch and Lucene. See Lucene’s Practical Scoring Function for more information.

The similarity can be set on the field level when a field is first created, as follows:

PUT my_index
{ "mappings": { "my_type": { "properties": { "default_field": {  "type": "text" }, "classic_field": { "type": "text", "similarity": "classic"  } } } } }

COPY AS CURL VIEW IN CONSOLE

	The `default_field` uses the `BM25` similarity.
	The `classic_field` uses the `classic` similarity (ie TF/IDF).

參考：https://www.elastic.co/guide/en/elasticsearch/reference/current/similarity.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 2020年文檔相似性算法：初學者教程相似性度量 - 數據挖掘算法（2） TF-IDF算法與余弦相似性文本相似性計算--MinHash和LSH算法信號相似性的描述基於python語言使用余弦相似性算法進行文本相似度分析英文句子相似性判斷向量的相似性度量 python 地址相似性判斷 facebook 相似性搜索庫 faiss