基於Spring Boot的問答系統之一：elasticsearch 7.2的hello world入門

本文轉載自查看原文 2019-11-01 09:21 292 Java/ 問答系統/ es

好久沒有寫代碼了，最近想做一個基於spring boot + vue + elasticsearch + NLP（語義相關性）的小系統練練手，系統后面可以成為一個聊天機器人，客服系統的原型等等。

所以今天就帶來第一篇文章：elasticsearch的hello world入門

一、安裝es

目標：在本地安裝一個單節點es玩

1.下載es

目前官網最新的下載地址是：https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.2.0-linux-x86_64.tar.gz
下載之后，解壓到一個目錄，比如你的開發目錄：your_path/elasticsearch

2.更改配置文件

a. 配置文件路徑：config/elasticsearch.yml
b. 把下面的項改為成自己的值

# Use a descriptive name for your cluster:
# 集群名
cluster.name: my-ces
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 節點名
node.name: ces-node-1
# Path to directory where to store the data (separate multiple locations by comma):
# es存儲數據的地方
path.data: ～/es/data
#
# Path to log files:
# es的運行log
path.logs: ～/es/logs
# Set the bind address to a specific IP (IPv4 or IPv6):
# 綁定地址為本地
network.host: _local_
#
# Set a custom port for HTTP:
# 監聽短褲
http.port: 9200

3. 運行測試

a. 運行bin/elasticsearch,
b. 打開瀏覽器輸入：localhost:9200，如果顯示以下內容，則成功。

{
  "name" : "ces-node-1",//設置的節點名
  "cluster_name" : "my-ces",//配置的集群名
  "cluster_uuid" : "6XOfx0eQReG3iMKek9hdTA",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

4. 安裝ik插件並測試

ik是什么

ik是一個分詞插件，要使用es來檢索中文數據，需要安裝本插件。

安裝

按照https://github.com/medcl/elasticsearch-analysis-ik上面但指引安裝並測試就可以了

二、創建索引

首先索引類似一個mysql數據庫的table，你要往es里面存數據，當然就需要es里面先建立一個索引。

網上很多教程就是基於原生的http接口教大家如何創建索引，如果對於es或者http不熟悉的朋友，經常搞得一頭霧水，今天我教大家使用es的python包來做。

安裝python的elasticsearch包

pip install elasticsearch

定義mapping.json

這個的作用就是，定義index長什么樣子，哪些字段需要被檢索，哪些字段不檢索，假如現在有一個一問一答的數據：

question: 世界上最高的山峰是什么
answer：當然是珠峰了

我們想使用es來檢索，做成一個問答機器人，那么我們定義如下的index結構：

{
    "settings":{
        "number_of_shards":2,  //可以先忽略
        "number_of_replicas":1
        },
     "mappings": {
            "dynamic": "strict",
            "properties": {
                "question": {//需要被索引
                    "type": "text",
                    "analyzer": "ik_max_word",//ik分詞器
                    "search_analyzer": "ik_smart",//ik分詞器
                    "index": true,
                    "boost": 8
                },
                "answer": {
                    "type": "text",
                    "index": false
                }
            }
    }
}

並保存為：es_index_mapping.json

創建索引

使用python版本的es很簡單就實現了，直接上代碼：

from elasticsearch import Elasticsearch
from elasticsearch import helpers
from common.conf import ServiceConfig
import os.path as path
import json


class EsDriver:
    def __init__(self):
        self.service_conf = ServiceConfig()
        # hosts 實際就是: [{"host": "localhost", "port": 9200}]
        self.es = Elasticsearch(hosts=self.service_conf.get_es_hosts())

    def create_index(self, index_name):
        dir_root = path.normpath("%s/.." % path.dirname(path.abspath(__file__)))
        with open(dir_root + "/data/es_index_mapping.json", 'r') as json_file:
            index_mapping_json = json.load(json_file)
        # 調用indices.create，傳入index name（你自己取），然后就創建好了
        return self.es.indices.create(index_name, body=index_mapping_json)

三、批量導入數據

創建好了index，那么我們就要往里面導入數據，python的es包提供批量導入的功能，只需要幾行代碼就可以實現：

假如你有一個文件qa.processed.txt，是這樣的格式：
query\t['answer1','answer2'],比如
你開心嗎\t["很開心"]

class EsDriver:

    ...

    def bulk_insert(self, index_name, bulk_size=500):
        doc_list = []

        with open('/data/qa.processed.txt', 'r') as qa_file:
            for line in qa_file:
                ls = line.strip().split('\t')
                if len(ls) != 2:
                    continue

                doc_list.append({
                    "_index": index_name, # 要插入到哪個index
                    "_type": "_doc",
                    "_source": {
                        "question": ls[0],# query
                        "answer": ls[1] # answer
                    }
                })
                if len(doc_list) % bulk_size == 0:
                    # 調用es helper的方法 bulk插入到索引中
                    helpers.bulk(self.es, doc_list, stats_only=True)
                    del doc_list[:]
        if len(doc_list) != 0:
            helpers.bulk(self.es, doc_list)

        print("bulk insert done")

執行完上述的操作之后，數據就嘩嘩的導入到es中了。

搜索

導入數據之后，我們就要去搜索數據了，同樣的使用es包里面的search函數就搞定了。比如現在你想搜索：你好

那么代碼如何寫呢？

class EsDriver:

    ...

    def search(self, query, index_name):
        return self.es.search(index=index_name, body={
            "query": {
                "match": {
                    "question": query
                }
            }
        })

然后你打印一下返回的結果，就知道數據返回是什么樣了。

附：幾個常見狀態操作

索引狀態

curl -X GET "localhost:9200/_cat/indices?v&pretty"

集群狀態

curl -X GET "localhost:9200/_cat/health?v&pretty"

索引mapping & setting

curl -X GET "localhost:9200/customer?pretty"
customer是index

通過id查詢一個index下的文檔數據

curl -X GET "localhost:9200/customer/_doc/1?pretty"
customer是index

后續文章帶來：數據集離線處理：構造特征，入es庫，java 工程構建
有興趣的小伙伴，可以添加博主vx交流：crazy042438，一起來做

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Spring Boot入門===Hello World spring boot（一）hello world spring boot(一)：Hello World 問答系統問答系統智能問答系統用 Spring Boot 實現電商系統 Web API （一）Hello World 開源智能問答系統 Spring Boot Hello World (restful接口)例子 Spring Boot實踐教程(一)：Hello,world!