前言
收集大量的日志信息之后,把這些日志存放在哪里?才能對其日志內容進行搜素呢?MySQL?
如果MySQL里存儲了1000W條這樣的數據,每條記錄的details字段有128個字。
用戶想要查詢details字段包含“ajax”這個關鍵詞的記錄。
MySQL執行
select * from logtable where details like "%ajax%";
每次執行這條SQL語句,都需要逐一查詢logtable中每條記錄,最頭痛的是找到這條記錄之后,每次還要對這條記錄中details字段里的文本內容進行全文掃描。
判斷這個當前記錄中的details字段是的內容否包含 “ajax”?有可能會查詢 10000w*128次.
如果用戶想搜素 “ajax”拼錯了拼成了“ajxa”,這個sql無法搜素到用戶想要的信息。因為不支持嘗試把用戶輸入的錯別字"ajxa"拆分開使用‘a‘,‘j‘,‘x‘,'a' 去盡可能多的匹配我想要的信息。
所以想要支持搜素details字段的Text內容的情況下,把海量的日志信息存在MySQL中是不太合理的。
Elasticsearch簡介
1.倒排索引
倒排索引是一種索引數據結構:從文本數據內容中提取出不重復的單詞進行分詞,每1個單詞對應1個ID對單詞進行區分,還對應1個該單詞在那些文檔中出現的列表 把這些信息組建成索引。
倒排索引還記錄了該單詞在文檔中出現位置、頻率(次數/TF)用於快速定位文檔和對搜素結果進行排序。
(出現在文檔1,<11位置>頻率1次) (1,<11>,1),(2,<7>,1),(3,<3,9>,2)
2.全文檢索
全文檢索:把用戶輸入的關鍵詞也進行分詞,利用倒排索引,快速鎖定關鍵詞出現在那些文檔。
說白了就是根據value查詢key(根據文檔中內容關鍵字,找到該該關鍵字所在的文檔的)而非根據key查詢value。
3.Lucene
Lucene是apache軟件基金會4 jakarta項目組的一個java子項目,是一個開放源代碼的全文檢索引擎JAR包。幫助我我們實現了以上的需求。
lucene實現倒排索引之后,那么海量的數據如何分布式存儲?如何高可用?集群節點之間如何管理?這是Elasticsearch實現的功能。
常說的ELK是Elasticsearch(全文搜素)+Logstash(內容收集)+Kibana(內容展示)三大開源框架首字母大寫簡稱。
本文主要簡單的介紹Elaticsearch,Elasticsearch是一個基於Lucene的分布式、高性能、可伸縮的搜素和分析系統,它提供了RESTful web API。
Elasticsearch安裝
官網下載
ES的版本和Kibana的版本必須一致,官網下載比較慢,還好有好心人。
系統配置

vm.max_map_count = 655360
權限
chown -R elsearch:elsearch /data/elastic-search
安全
/etc/security/limits.conf
[root@zhanggen config]# java -version openjdk version "1.8.0_102" OpenJDK Runtime Environment (build 1.8.0_102-b14) OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) [root@zhanggen config]# cat /etc/security/limits.conf * soft nofile 65536 * hard nofile 131072 * soft nproc 4096 * hard nproc 4096
es配置文件

# ======================== Elasticsearch Configuration ========================= # # NOTE: Elasticsearch comes with reasonable defaults for most settings. # Before you set out to tweak and tune the configuration, make sure you # understand what are you trying to accomplish and the consequences. # # The primary way of configuring a node is via this file. This template lists # the most important settings you may want to configure for a production cluster. # # Please consult the documentation for further information on configuration options: # https://www.elastic.co/guide/en/elasticsearch/reference/index.html # # ---------------------------------- Cluster ----------------------------------- # # Use a descriptive name for your cluster: # cluster.name: my-application # # ------------------------------------ Node ------------------------------------ # # Use a descriptive name for the node: # node.name: node-1 # # Add custom attributes to the node: # #node.attr.rack: r1 # # ----------------------------------- Paths ------------------------------------ # # Path to directory where to store the data (separate multiple locations by comma): # path.data: /data/elastic-search/data/ # # Path to log files: # path.logs: /data/elastic-search/log/ # # # ----------------------------------- Memory ----------------------------------- # # Lock the memory on startup: # #bootstrap.memory_lock: false # # Make sure that the heap size is set to about half the memory available # on the system and that the owner of the process is allowed to use this # limit. # # Elasticsearch performs poorly when the system is swapping the memory. # # ---------------------------------- Network ----------------------------------- # # Set the bind address to a specific IP (IPv4 or IPv6): # network.host: 0.0.0.0 # # Set a custom port for HTTP: # http.port: 9200 # # For more information, consult the network module documentation. # # --------------------------------- Discovery ---------------------------------- # # Pass an initial list of hosts to perform discovery when this node is started: # The default list of hosts is ["127.0.0.1", "[::1]"] # #discovery.seed_hosts: ["host1", "host2"] # # Bootstrap the cluster using an initial set of master-eligible nodes: # cluster.initial_master_nodes: ["node-1"] # # For more information, consult the discovery and cluster formation module documentation. # # ---------------------------------- Gateway ----------------------------------- # # Block initial recovery after a full cluster restart until N nodes are started: # #gateway.recover_after_nodes: 3 # # For more information, consult the gateway module documentation. # # ---------------------------------- Various ----------------------------------- # # Require explicit names when deleting indices: # #action.destructive_requires_name: true
啟動
[elsearch@zhanggen /]$ ./elasticsearch-7.3.2/bin/elasticsearch
訪問
Elasticsearch使用
關於Elasticsearch的使用都是基於RESTful風格的API進行的。
1.查看健康狀態
http://192.168.56.135:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1590655194 08:39:54 my-application green
2.創建索引
{ "acknowledged": true, "shards_acknowledged": true, "index": "web" }
3.刪除索引
{ "acknowledged": true }
4.插入數據
request body
{ "name":"張根", "age":22, "marrid":"false" }
response body
{ "_index": "students", "_type": "go", "_id": "cuWEWnIBWnQK6MVivzvO", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 }
ps:也可以使用PUT方法,但是需要傳入id
reqeust body
{ "name":"李淵", "age":1402, "marrid":"true" }
response body
{ "_index": "students", "_type": "go", "_id": "2", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 1 }
5.查詢all
[root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": { "match_all":{}}}'

{ "took" : 211, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 6, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "students", "_type" : "go", "_id" : "cuWEWnIBWnQK6MVivzvO", "_score" : 1.0, "_source" : { "name" : "張根", "age" : 22, "marrid" : "false" } }, { "_index" : "students", "_type" : "go", "_id" : "c-WPWnIBWnQK6MViOzt1", "_score" : 1.0, "_source" : { "name" : "張百忍", "age" : 3200, "marrid" : "true" } }, { "_index" : "students", "_type" : "go", "_id" : "dOWPWnIBWnQK6MVimTsg", "_score" : 1.0, "_source" : { "name" : "李淵", "age" : 1402, "marrid" : "true" } }, { "_index" : "students", "_type" : "go", "_id" : "deWQWnIBWnQK6MViazuu", "_score" : 1.0, "_source" : { "name" : "姜尚", "age" : 5903, "marrid" : "fale" } }, { "_index" : "students", "_type" : "go", "_id" : "duWSWnIBWnQK6MViXDtD", "_score" : 1.0, "_source" : { "name" : "孛兒只斤.鐵木真", "age" : 814, "marrid" : "true" } }, { "_index" : "students", "_type" : "go", "_id" : "2", "_score" : 1.0, "_source" : { "query" : { "match" : { "name" : "張根" } } } } ] } }
6.分頁查詢 (from, size)
from 偏移,默認為0,size 返回的結果數,默認為10
[root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{ "query": { "match_all": {} }, "from":1, "size":2 }'
返回

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 6, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "students", "_type" : "go", "_id" : "c-WPWnIBWnQK6MViOzt1", "_score" : 1.0, "_source" : { "name" : "張百忍", "age" : 3200, "marrid" : "true" } }, { "_index" : "students", "_type" : "go", "_id" : "dOWPWnIBWnQK6MVimTsg", "_score" : 1.0, "_source" : { "name" : "李淵", "age" : 1402, "marrid" : "true" } } ] } }
7.模糊查詢字段中包含某些關鍵詞
[root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": {"term": {"name":"張"}}}'
返回

{ "took" : 155, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.8161564, "hits" : [ { "_index" : "students", "_type" : "go", "_id" : "cuWEWnIBWnQK6MVivzvO", "_score" : 0.8161564, "_source" : { "name" : "張根", "age" : 22, "marrid" : "false" } }, { "_index" : "students", "_type" : "go", "_id" : "c-WPWnIBWnQK6MViOzt1", "_score" : 0.7083998, "_source" : { "name" : "張百忍", "age" : 3200, "marrid" : "true" } } ] } }
8.range范圍查找
范圍查詢接收以下參數:
- gte:大於等於
- gt:大於
- lte:小於等於
- lt:小於
- boost:設置查詢的推動值(boost),默認為1.0
curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query":{"range":{"age":{"gt":"18"}}}}'

{ "took" : 11, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 5, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "students", "_type" : "go", "_id" : "cuWEWnIBWnQK6MVivzvO", "_score" : 1.0, "_source" : { "name" : "張根", "age" : 22, "marrid" : "false" } }, { "_index" : "students", "_type" : "go", "_id" : "c-WPWnIBWnQK6MViOzt1", "_score" : 1.0, "_source" : { "name" : "張百忍", "age" : 3200, "marrid" : "true" } }, { "_index" : "students", "_type" : "go", "_id" : "dOWPWnIBWnQK6MVimTsg", "_score" : 1.0, "_source" : { "name" : "李淵", "age" : 1402, "marrid" : "true" } }, { "_index" : "students", "_type" : "go", "_id" : "deWQWnIBWnQK6MViazuu", "_score" : 1.0, "_source" : { "name" : "姜尚", "age" : 5903, "marrid" : "fale" } }, { "_index" : "students", "_type" : "go", "_id" : "duWSWnIBWnQK6MViXDtD", "_score" : 1.0, "_source" : { "name" : "孛兒只斤.鐵木真", "age" : 814, "marrid" : "true" } } ] } }
安裝Kibana
kibana是針對elasticsearch操作及數據展示的工具,支持中文。安裝時請確保kibama和Elasticsearch的版本一致。
配置文件
[root@zhanggen config]# cat ./kibana.yml|grep -Ev '^$|#' server.port: 5601 server.host: "0.0.0.0" elasticsearch.hosts: ["http://localhost:9200"] elasticsearch.username: "kibana" elasticsearch.password: "xxxxxxx" i18n.locale: "zh-CN" [root@zhanggen config]#
啟動
[root@zhanggen bin]# ./kibana --allow-root
kibana使用
管理-----》kibana索引模式-----》創建索引模式
ps:
Elasticsearch果然是全文檢索, 666。果然對用戶輸入的搜素關鍵詞,進行了分詞。我輸入了(訪問日志為例)把包含“問”的文檔也搜素出來了!
而不是僅僅搜素內容包含“訪問日志為例”這個1個詞的文檔。
Go操作Elasticsearch
我們使用第三方庫https://github.com/olivere/elastic來連接ES並進行操作。
注意下載與你的ES相同版本的client,例如我們這里使用的ES是7.2.1的版本,那么我們下載的client也要與之對應為github.com/olivere/elastic/v7
。
使用go.mod
來管理依賴下載指定版本的第三庫:
module go相關模塊/elasticsearch go 1.13 require github.com/olivere/elastic/v7 v7.0.4
代碼
package main import ( "context" "fmt" "github.com/olivere/elastic/v7" ) // Elasticsearch demo type Person struct { Name string `json:"name"` Age int `json:"age"` Married bool `json:"married"` } func main() { client, err := elastic.NewClient(elastic.SetURL("http://192.168.56.135:9200/")) if err != nil { // Handle error panic(err) } fmt.Println("connect to es success") p1 := Person{Name: "曹操", Age: 155, Married: true} put1, err := client.Index(). Index("students").Type("go"). BodyJson(p1). Do(context.Background()) if err != nil { // Handle error panic(err) } fmt.Printf("Indexed user %s to index %s, type %s\n", put1.Id, put1.Index, put1.Type) }
kafka消息---->elasticsearch支持消息檢索