Elasticsearch（全文搜索）

本文轉載自查看原文 2020-05-26 05:26 1403 運維相關

前言

收集大量的日志信息之后，把這些日志存放在哪里？才能對其日志內容進行搜素呢？MySQL？

如果MySQL里存儲了1000W條這樣的數據，每條記錄的details字段有128個字。

用戶想要查詢details字段包含“ajax”這個關鍵詞的記錄。

MySQL執行

select * from logtable where details like "%ajax%";

每次執行這條SQL語句，都需要逐一查詢logtable中每條記錄，最頭痛的是找到這條記錄之后，每次還要對這條記錄中details字段里的文本內容進行全文掃描。

判斷這個當前記錄中的details字段是的內容否包含 “ajax”？有可能會查詢 10000w*128次.

如果用戶想搜素 “ajax”拼錯了拼成了“ajxa”,這個sql無法搜素到用戶想要的信息。因為不支持嘗試把用戶輸入的錯別字"ajxa"拆分開使用‘a‘,‘j‘,‘x‘,'a' 去盡可能多的匹配我想要的信息。

所以想要支持搜素details字段的Text內容的情況下，把海量的日志信息存在MySQL中是不太合理的。

Elasticsearch簡介

1.倒排索引

倒排索引是一種索引數據結構：從文本數據內容中提取出不重復的單詞進行分詞，每1個單詞對應1個ID對單詞進行區分，還對應1個該單詞在那些文檔中出現的列表把這些信息組建成索引。

倒排索引還記錄了該單詞在文檔中出現位置、頻率（次數/TF）用於快速定位文檔和對搜素結果進行排序。

（出現在文檔1,<11位置>頻率1次）
（1，<11>,1）,(2,<7>,1),(3,<3,9>,2)

2.全文檢索

全文檢索：把用戶輸入的關鍵詞也進行分詞，利用倒排索引，快速鎖定關鍵詞出現在那些文檔。

說白了就是根據value查詢key（根據文檔中內容關鍵字，找到該該關鍵字所在的文檔的）而非根據key查詢value。

3.Lucene

Lucene是apache軟件基金會4 jakarta項目組的一個java子項目，是一個開放源代碼的全文檢索引擎JAR包。幫助我我們實現了以上的需求。

lucene實現倒排索引之后，那么海量的數據如何分布式存儲？如何高可用？集群節點之間如何管理？這是Elasticsearch實現的功能。

常說的ELK是Elasticsearch（全文搜素）+Logstash（內容收集）+Kibana（內容展示）三大開源框架首字母大寫簡稱。

本文主要簡單的介紹Elaticsearch，Elasticsearch是一個基於Lucene的分布式、高性能、可伸縮的搜素和分析系統，它提供了RESTful web API。

Elasticsearch安裝

官網下載

ES的版本和Kibana的版本必須一致，官網下載比較慢，還好有好心人。

系統配置

vm.max_map_count = 655360

vim /etc/sysctl.conf


權限

 chown -R elsearch:elsearch /data/elastic-search

　安全

/etc/security/limits.conf

[root@zhanggen config]# java -version
openjdk version "1.8.0_102"
OpenJDK Runtime Environment (build 1.8.0_102-b14)
OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)

[root@zhanggen config]# cat /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 4096
* hard nproc 4096

es配置文件

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /data/elastic-search/data/
#
# Path to log files:
#
path.logs: /data/elastic-search/log/
#
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: false

#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

/elasticsearch-7.3.2/config/elasticsearch.yml

啟動

[elsearch@zhanggen /]$ ./elasticsearch-7.3.2/bin/elasticsearch

訪問

Elasticsearch使用

關於Elasticsearch的使用都是基於RESTful風格的API進行的。

1.查看健康狀態

http://192.168.56.135:9200/_cat/health?v

epoch      timestamp cluster        status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1590655194 08:39:54  my-application green

2.創建索引

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "web"
}

3.刪除索引

{
    "acknowledged": true
}

4.插入數據

request body

{
    "name":"張根",
    "age":22,
    "marrid":"false"
}

response body

{
    "_index": "students",
    "_type": "go",
    "_id": "cuWEWnIBWnQK6MVivzvO",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

ps:也可以使用PUT方法，但是需要傳入id

reqeust body

{
    "name":"李淵",
    "age":1402,
    "marrid":"true"
}

response body

{
    "_index": "students",
    "_type": "go",
    "_id": "2",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

5.查詢all

[root@zhanggen zhanggen]#  curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": { "match_all":{}}}'

{
  "took" : 211,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "cuWEWnIBWnQK6MVivzvO",
        "_score" : 1.0,
        "_source" : {
          "name" : "張根",
          "age" : 22,
          "marrid" : "false"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "c-WPWnIBWnQK6MViOzt1",
        "_score" : 1.0,
        "_source" : {
          "name" : "張百忍",
          "age" : 3200,
          "marrid" : "true"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "dOWPWnIBWnQK6MVimTsg",
        "_score" : 1.0,
        "_source" : {
          "name" : "李淵",
          "age" : 1402,
          "marrid" : "true"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "deWQWnIBWnQK6MViazuu",
        "_score" : 1.0,
        "_source" : {
          "name" : "姜尚",
          "age" : 5903,
          "marrid" : "fale"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "duWSWnIBWnQK6MViXDtD",
        "_score" : 1.0,
        "_source" : {
          "name" : "孛兒只斤.鐵木真",
          "age" : 814,
          "marrid" : "true"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "query" : {
            "match" : {
              "name" : "張根"
            }
          }
        }
      }
    ]
  }
}

6.分頁查詢（from， size）

from 偏移，默認為0，size 返回的結果數，默認為10

[root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{
"query": { "match_all": {} },
"from":1,
"size":2
}'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "c-WPWnIBWnQK6MViOzt1",
        "_score" : 1.0,
        "_source" : {
          "name" : "張百忍",
          "age" : 3200,
          "marrid" : "true"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "dOWPWnIBWnQK6MVimTsg",
        "_score" : 1.0,
        "_source" : {
          "name" : "李淵",
          "age" : 1402,
          "marrid" : "true"
        }
      }
    ]
  }
}

View Code

7.模糊查詢字段中包含某些關鍵詞

[root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": {"term": {"name":"張"}}}'

{
  "took" : 155,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.8161564,
    "hits" : [
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "cuWEWnIBWnQK6MVivzvO",
        "_score" : 0.8161564,
        "_source" : {
          "name" : "張根",
          "age" : 22,
          "marrid" : "false"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "c-WPWnIBWnQK6MViOzt1",
        "_score" : 0.7083998,
        "_source" : {
          "name" : "張百忍",
          "age" : 3200,
          "marrid" : "true"
        }
      }
    ]
  }
}

View Code

8.range范圍查找

范圍查詢接收以下參數：

gte：大於等於
gt：大於
lte：小於等於
lt：小於
boost：設置查詢的推動值（boost），默認為1.0

curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query":{"range":{"age":{"gt":"18"}}}}'

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "cuWEWnIBWnQK6MVivzvO",
        "_score" : 1.0,
        "_source" : {
          "name" : "張根",
          "age" : 22,
          "marrid" : "false"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "c-WPWnIBWnQK6MViOzt1",
        "_score" : 1.0,
        "_source" : {
          "name" : "張百忍",
          "age" : 3200,
          "marrid" : "true"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "dOWPWnIBWnQK6MVimTsg",
        "_score" : 1.0,
        "_source" : {
          "name" : "李淵",
          "age" : 1402,
          "marrid" : "true"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "deWQWnIBWnQK6MViazuu",
        "_score" : 1.0,
        "_source" : {
          "name" : "姜尚",
          "age" : 5903,
          "marrid" : "fale"
        }
      },
      {
        "_index" : "students",
        "_type" : "go",
        "_id" : "duWSWnIBWnQK6MViXDtD",
        "_score" : 1.0,
        "_source" : {
          "name" : "孛兒只斤.鐵木真",
          "age" : 814,
          "marrid" : "true"
        }
      }
    ]
  }
}

安裝Kibana

kibana是針對elasticsearch操作及數據展示的工具，支持中文。安裝時請確保kibama和Elasticsearch的版本一致。

配置文件

[root@zhanggen config]# cat ./kibana.yml|grep -Ev '^$|#'
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
elasticsearch.username: "kibana"
elasticsearch.password: "xxxxxxx"
i18n.locale: "zh-CN"
[root@zhanggen config]#

啟動

[root@zhanggen bin]# ./kibana  --allow-root

kibana使用

管理-----》kibana索引模式-----》創建索引模式

ps:

Elasticsearch果然是全文檢索， 666。果然對用戶輸入的搜素關鍵詞，進行了分詞。我輸入了（訪問日志為例）把包含“問”的文檔也搜素出來了！

而不是僅僅搜素內容包含“訪問日志為例”這個1個詞的文檔。

Go操作Elasticsearch

我們使用第三方庫https://github.com/olivere/elastic來連接ES並進行操作。

注意下載與你的ES相同版本的client，例如我們這里使用的ES是7.2.1的版本，那么我們下載的client也要與之對應為github.com/olivere/elastic/v7。

使用go.mod來管理依賴下載指定版本的第三庫：

module go相關模塊/elasticsearch

go 1.13

require github.com/olivere/elastic/v7 v7.0.4

代碼

package main


import (
"context"
"fmt"

"github.com/olivere/elastic/v7"
)

// Elasticsearch demo

type Person struct {
	Name    string `json:"name"`
	Age     int    `json:"age"`
	Married bool   `json:"married"`
}

func main() {
	client, err := elastic.NewClient(elastic.SetURL("http://192.168.56.135:9200/"))
	if err != nil {
		// Handle error
		panic(err)
	}

	fmt.Println("connect to es success")
	p1 := Person{Name: "曹操", Age: 155, Married: true}
	put1, err := client.Index().
		Index("students").Type("go").
		BodyJson(p1).
		Do(context.Background())
	if err != nil {
		// Handle error
		panic(err)
	}
	fmt.Printf("Indexed user %s to index %s, type %s\n", put1.Id, put1.Index, put1.Type)
}

kafka消息---->elasticsearch支持消息檢索

參考

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch 全文搜索 Elasticsearch構建全文搜索系統 Elasticsearch系列---深入全文搜索 ElasticSearch 2 (14) - 深入搜索系列之全文搜索 ElasticSearch 結構化搜索全文 elasticsearch配合mysql實現全文搜索全文搜索引擎 Elasticsearch 全文搜索引擎 ElasticSearch 還是 Solr？全文搜索引擎 Elasticsearch 入門社區帖子全文搜索實戰（基於ElasticSearch）