ElasticSearch之分詞器edge_ngram和ngram的區別

本文轉載自查看原文 2020-11-16 20:18 1586 大數據相關

ElasticSearch一看就懂之分詞器edge_ngram和ngram的區別
1 year ago
edge_ngram和ngram是ElasticSearch自帶的兩個分詞器，一般設置索引映射的時候都會用到，設置完步長之后，就可以直接給解析器analyzer的tokenizer賦值使用。
這里，我們統一用字符串來做分詞示例：
字符串

edge_ngram分詞器，分詞結果如下：
{
"tokens": [
{
"token": "字",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "字符",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "字符串",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 2
}
]
}
ngram分詞器，分詞結果如下：
{
"tokens": [
{
"token": "字",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "字符",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "字符串",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "符",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 3
},
{
"token": "符串",
"start_offset": 1,
"end_offset": 3,
"type": "word",
"position": 4
},
{
"token": "串",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 5
}
]
}
一目了然，看明白了嗎？簡單理解來說：edge_ngram的分詞器，就是從首字開始，按步長，逐字符分詞，直至最終結尾文字；ngram呢，就不僅是從首字開始，而是逐字開始按步長，逐字符分詞。
具體應用呢？如果必須首字匹配的情況，那么用edge_ngram自然是最佳選擇，如果需要文中任意字符的匹配，ngram就更為合適了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch中ngram和edgengram分詞器 Elasticsearch之分詞器的作用聊聊 elasticsearch 之分詞器配置 (IK+pinyin) Elasticsearch之分詞器的工作流程 ElasticSearch 分詞器 ElasticSearch 分詞器 ElasticSearch 分詞器 elasticsearch分詞器 Elasticsearch 分詞器 elasticsearch之ik分詞器