Elasticsearch：Smart Chinese Analysis plugin

本文轉載自查看原文 2019-12-24 10:09 1088 ELK Stack

Smart Chinese Analysis插件將Lucene的Smart Chinese分析模塊集成到Elasticsearch中，用於分析中文或中英文混合文本。支持的分析器在大型訓練語料庫上使用基於隱馬爾可夫（Markov）模型的概率知識來查找簡體中文文本的最佳分詞。它使用的策略是首先將輸入文本分解為句子，然后對句子進行切分以獲得單詞。該插件提供了一個稱為smartcn分析器的分析器，以及一個稱為smartcn_tokenizer的標記器。請注意，兩者均不能使用任何參數進行配置。

要將smartcn Analysis插件安裝在Elasticsearch Docker容器中，請使用以下屏幕截圖中顯示的命令。然后，我們重新啟動容器以使插件生效：

./bin/elasticsearch-plugin install analysis-smartcn

在Elasticsearch的安裝目錄運行上面的命令。顯示的結果如下：

    $ ./bin/elasticsearch-plugin install analysis-smartcn
    -> Downloading analysis-smartcn from elastic
    [=================================================] 100%   
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by org.bouncycastle.jcajce.provider.drbg.DRBG (file:/Users/liuxg/elastic/elasticsearch-7.3.0/lib/tools/plugin-cli/bcprov-jdk15on-1.61.jar) to constructor sun.security.provider.Sun()
    WARNING: Please consider reporting this to the maintainers of org.bouncycastle.jcajce.provider.drbg.DRBG
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    -> Installed analysis-smartcn
    (base) localhost:elasticsearch-7.3.0 liuxg$ ./bin/elasticsearch-plugin list
    analysis-icu
    analysis-ik
    analysis-smartcn
    pinyin

上面顯示我們已經成功地把analysis-smartcn安裝成功了。針對docker的安裝，我們可以通過如下的命令來進入到docker里，再進行安裝：

    $ docker exec -it es01 /bin/bash
    [root@ec4d19f59a7d elasticsearch]# ls
    LICENSE.txt  README.textile  config  jdk  logs     plugins
    NOTICE.txt   bin             data    lib  modules
    [root@ec4d19f59a7d elasticsearch]#

在這里es01是docker中的Elasticsearch實例。具體安裝請參閱我的文章“Elastic：用Docker部署Elastic棧”。

注意：在我們安裝好smartcn分析器后，我們必須重新啟動Elasticsearch使它開始起作用。

實例

在下面，我們在Kibana中用一個實例來展示這個用法：

    POST _analyze 
    {
      "text": "股市，投資，穩，賺，不，賠，必修課，如何，做，好，倉，位，管理，和，情緒，管理",
      "analyzer": "smartcn"
    }

顯示結果：

    {
      "tokens" : [
        {
          "token" : "股市",
          "start_offset" : 0,
          "end_offset" : 2,
          "type" : "word",
          "position" : 0
        },
        {
          "token" : "投資",
          "start_offset" : 3,
          "end_offset" : 5,
          "type" : "word",
          "position" : 2
        },
        {
          "token" : "穩",
          "start_offset" : 6,
          "end_offset" : 7,
          "type" : "word",
          "position" : 4
        },
        {
          "token" : "賺",
          "start_offset" : 8,
          "end_offset" : 9,
          "type" : "word",
          "position" : 6
        },
        {
          "token" : "不",
          "start_offset" : 10,
          "end_offset" : 11,
          "type" : "word",
          "position" : 8
        },
        {
          "token" : "賠",
          "start_offset" : 12,
          "end_offset" : 13,
          "type" : "word",
          "position" : 10
        },
        {
          "token" : "必修課",
          "start_offset" : 14,
          "end_offset" : 17,
          "type" : "word",
          "position" : 12
        },
        {
          "token" : "如何",
          "start_offset" : 18,
          "end_offset" : 20,
          "type" : "word",
          "position" : 14
        },
        {
          "token" : "做",
          "start_offset" : 21,
          "end_offset" : 22,
          "type" : "word",
          "position" : 16
        },
        {
          "token" : "好",
          "start_offset" : 23,
          "end_offset" : 24,
          "type" : "word",
          "position" : 18
        },
        {
          "token" : "倉",
          "start_offset" : 25,
          "end_offset" : 26,
          "type" : "word",
          "position" : 20
        },
        {
          "token" : "位",
          "start_offset" : 27,
          "end_offset" : 28,
          "type" : "word",
          "position" : 22
        },
        {
          "token" : "管理",
          "start_offset" : 29,
          "end_offset" : 31,
          "type" : "word",
          "position" : 24
        },
        {
          "token" : "和",
          "start_offset" : 32,
          "end_offset" : 33,
          "type" : "word",
          "position" : 26
        },
        {
          "token" : "情緒",
          "start_offset" : 34,
          "end_offset" : 36,
          "type" : "word",
          "position" : 28
        },
        {
          "token" : "管理",
          "start_offset" : 37,
          "end_offset" : 39,
          "type" : "word",
          "position" : 30
        }
      ]
    }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 elasticsearch文檔-analysis IK Analysis for Elasticsearch centos7系統下elasticsearch7.5.1集群安裝elasticsearch-jieba-plugin 7.4.2 和analysis-kuromoji 日語分詞器 Elasticsearch之Analysis（分析器） elasticSearch+ik_smart 支持符號檢索 Elasticsearch 6.5.4 安裝IK Analysis插件 Elasticsearch 中文分詞(elasticsearch-analysis-ik) 安裝 ElasticSearch 安裝 elasticsearch-analysis-ik分詞器 Elasticsearch-5.5.0以及elasticsearch-analysis-ik安裝使用 elasticsearch 中文分詞（elasticsearch-analysis-ik）安裝