windows安裝ES7.8.0+Logstash7.8.0+Kibana7.8.0+FileBeat7.8.0

本文轉載自查看原文 2020-08-16 15:38 1250 elk

前言

在大型分布式集群系統中由於日志是分布在不同服務器上，在排錯過程中需要登錄不同服務器grep | tail 查看日志是非常不方便，所以需要統一的日志管理平台聚合收集日志。比如阿里的sls（收費產品）…

但是一般實際開發過程中存在4套環境，dev（開發），test（測試），pre（預發驗證），和prod（生產）。如果你的生產環境使用的是全套雲服務且忽略成本，那你可以直接使用雲服務廠商的日志組件。單如果你想節約成本，有沒有不收費又好用的日志組件呢？有，ELK。

ELK是目前最流行的日志搜索組合組件，分別是Elasticsearch+Logstash+Kibana組件的簡稱，但是FileBeat又是什么玩意兒呢？請往下看。

本文環境：

Windows10+(Elasticsearch+Logstash+Kibana+FileBeat)7.8.0

為什么是windows環境，首先在搭建ELK環境過程中需要大量編寫測試一些配置文件要不停調試，配置文件在（Linux | windows）環境是通用的，為了方便且更好的寫好這篇文章，所以直接用我本地環境。本文以實戰為主，拒絕花里胡巧沒用的。ok，在動手搭建之前先簡單介紹一下這4個組件到底是干什么的。

1、ElasticSearch

1、基於Lucene的分布式全文搜索引擎，2、基於rest接口，3、java語言開發，源碼開放。從這3點可以看出，1、擴展方便為分布式而生，2、基於rest接口訪問，無關對接語言，3、開源免費，研發實力足夠強可以自己定制。es天生適合做大數據搜索存儲

2、Logstash

Logstash是一個開源的服務器端數據處理管道，可以同時從多個數據源獲取數據，並對其進行轉換，然后將其發送到你最喜歡的“存儲

簡單解釋下，這個組件是專門收集日志，並且對日志進行加工處理格式化，然后分發到你指定的地方（mysql，mq，nosql）的一個數據處理管道。基於java環境，但是特別占用內存

3、Kibana

開源的分析和可視化web平台，主要是和es搭配使用。

4、FileBeat

輕量級的日志采集工具，它和Logstash是同一個作者，因為Logstash太笨重且吃內存，所有作者新出了這么一個組件，FileBeat可以一個進程搜集服務器中所有指定的多個日志文件。Logstash做不到的，但是Logstash強大的數據處理和數據分發能力比FileBeat做的好。

下面是這4個組件簡單的邏輯關系圖

在搭建之前先下載這4個組件，https://www.elastic.co/cn/downloads/

1、啟動Elasticsearch

如果你是Linux，請異步這里

修改es配置文件

以下是我的配置：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#節點，名字自定義
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#數據存儲位置
path.data: E:\elk\elasticsearch-7.8.0-windows-x86_64\elasticsearch-7.8.0\data
#
# Path to log files:
#日志存儲位置
path.logs: E:\elk\elasticsearch-7.8.0-windows-x86_64\elasticsearch-7.8.0\logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#綁定訪問的主機ip，0.0.0.0 是不限制
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#綁定的訪問端口，默認就是9200
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#集群初始化的主節點，這個需要包含node.name 否則會報錯 
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
# 這些是es-head需要的配置
http.cors.enabled: true 
http.cors.allow-origin: "*"
node.master: true
node.data: true

修改完成直接雙擊啟動

瀏覽器訪問如下，則啟動成功。

2、Kibana啟動

修改Kibana配置文件

# Kibana is served by a back end server. This setting specifies the port to use.
#訪問端口，默認就是5601
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "0.0.0.0"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URLs of the Elasticsearch instances to use for all your queries.
elasticsearch.hosts: ["http://localhost:9200"]

雙擊啟動即可

啟動成功頁面

啟動成功之后可以在Kibana中查看一下當前es中存在的索引

3、啟動logstash

編寫一個logstash_test.conf配置文件，體驗一下Logstash

input {
  stdin {}
}

output {

  stdout{  }
  
}

啟動命令

.\bin\logstash.bat -f .\config\logstash_test.conf

直接在控制台輸入test logstash，控制台輸出我們輸入的內容

換一種輸出數據的格式，以json | rubydebug

stdout{ codec => rubydebug }
stdout{ codec => json}

大家可以自己測試，看一下輸出的效果

Logstash接入FileBeat

配置文件做一下改動，多了inpu插件配置，監聽一個5044端口，這個就是FileBeat網絡端口

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  beats {
    port => 5044
  }
}

output {

  stdout{ codec => rubydebug }
   
}

4、啟動FileBeat

新建FileBeat配置文件filebeat_test.yml

# ============================== Filebeat inputs ===============================

filebeat.inputs:
- type: log
  #開啟日志讀取
  enabled: true
  #日志路徑
  paths:
    - D:\data\logs\demo\*.log
  #額外的字段
  fields:
   app: demo
   review: 1
  #匹配多行，按照時間正則匹配 yyyy-MM-dd HH:mm:ss.SSS  
  multiline.pattern: ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
  multiline.negate: true
  #日期之后匹配
  multiline.match: after
  #tail_files = true 不收集存量日志
  tail_files: true
  #額外增加的標簽字段
  tags: ["demo"] 

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================
#es 索引模板
setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

# ------------------------------ Logstash Output -------------------------------
#輸出到logstash
output.logstash:
  #The Logstash hosts
  hosts: ["localhost:5044"]

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

啟動FileBeat

.\filebeat.exe -e -c .\filebeat_test.yml

訪問demo項目，打印一些日志，讓FileBeta讀取

Logstash控制台，這是json形式打印出來的

我們挑一條日志格式化看看FileBeat采集的日志通過Logstash打印出來之后是什么樣子。

{
    "@version":"1",
    "message":"2020-08-16 17:29:13.138 6a83f82acbf8000 [http-nio-8080-exec-9] DEBUG com.cd.demo.controllr.DemoController[28] - ======================debug",
    "fields":{
        "app":"demo",
        "review":1
    },
    "tags":[
          "demo",
          "beats_input_codec_plain_applied"
    ],
    "@timestamp":"2020-08-16T09:29:15.075Z",
    "ecs":{
        "version":"1.5.0"
    },
    "input":{
        "type":"log"
    },
    "host":{
        "name":"WIN-IJE5R5BU096",
        "architecture":"x86_64",
        "id":"0ea80d06-30e3-4b1f-9e15-cf0006381169",
        "ip":[
            "fe80::445f:b46c:2007:b14b",
            "192.168.1.83",
            "fe80::3c10:2e3f:fc46:9d28",
            "169.254.157.40",
            "fe80::9842:fa00:1199:8207",
            "169.254.130.7",
            "fe80::ac1b:b018:58f6:f20e",
            "169.254.242.14",
            "172.24.36.1",
            "fe80::2d69:dab9:f82c:33ce",
            "169.254.51.206"
        ],
        "hostname":"WIN-IJE5R5BU096",
        "mac":[
            "f8:b4:6a:20:7f:f4",
            "c0:b5:d7:28:44:85",
            "c2:b5:d7:28:44:85",
            "e2:b5:d7:28:44:85",
            "00:ff:3f:88:6e:59"
        ],
        "os":{
            "name":"Windows 10 Home China",
            "version":"10.0",
            "family":"windows",
            "build":"18363.1016",
            "platform":"windows",
            "kernel":"10.0.18362.1016 (WinBuild.160101.0800)"
        }
    },
    "log":{
        "file":{
            "path":"D:\data\logs\demo\demo-info.log"
        },
        "offset":2020
    },
    "agent":{
        "name":"WIN-IJE5R5BU096",
        "version":"7.8.0",
        "ephemeral_id":"f073c7ec-d88b-4ddb-9870-bd6128d5497a",
        "type":"filebeat",
        "id":"e0b6d369-cc12-4132-bf81-5c2f85ce2b2a",
        "hostname":"WIN-IJE5R5BU096"
    }
}

以上就是FileBeat收集通過Logstash未經過濾打印在控制台的數據，可以看到FileBeat收集的日志還是很全面（軟件，硬件，網絡），這些日志我們並不全部需要，我們僅把自己需要的字段存儲即可，這就需要格式化數據。格式化數據需要借助Logstash的Filter插件，https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

找到grok插件，它的意思是將非結構化事件數據轉為字段，轉為字段之后我們方便存儲統計

我們真正要提取的是message字段，這個字段是我們實際的業務日志，使用grok編寫正則去提取message字段，可以用Kibana自帶的grok工具或者是 http://grokdebug.herokuapp.com/?#（grokdebug並不好用，時常無法訪問）

來測試編寫正則，匹配提取我們的日志，下圖就展示了將message抽取為一個個單獨的字段

完整的Logasth配置

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  beats {
    port => 5044
  }
}

filter {
    #提取message字段，這個字段是業務日志，使用正則匹配的形式將message提取為一個個字段，為什么有兩個message呢？假如你的日志格式不統一就需要多個正則去匹配，但是盡量避免這種情況的出現
    #多個正則匹配，如果日志量比較大，會降低Logstash的處理效率，
    grok{
      match => [
            "message" , "(?m)%{TIMESTAMP_ISO8601:logdate}\s+%{BASE16NUM:traceId}+\s\[%{DATA:thread}\]\s+%{LOGLEVEL:level}\s+%{PROG:className}\[%{INT:classLine}\]\s+-+\s+%{GREEDYDATA:msg}",
            "message" , "(?m)%{TIMESTAMP_ISO8601:logdate}\s\s+\[%{DATA:thread}\]\s+%{LOGLEVEL:level}\s\s++%{PROG:className}\[%{INT:classLine}\]\s+-+\s+%{GREEDYDATA:msg}"
       ]       
 
   }
  #使用業務日志時間替換Logstash的@timestamp時間，避免兩個時間不同步
  date {
        match => ["logdate", "yyyy-MM-dd HH:mm:ss.SSS"]
        target => "@timestamp"
        remove_field => ["logdate"]
  }
  #去除一些不需要的字段，注意一定要保留@timestamp字段否則無法按照日期維度建立索引，同時也保留message字段方便我們查看
  #同時將額外添加的fields字段當作一個新的字段添加，這樣以便我們知道是哪個應用日志，也可以使用tags字段來做定義
    mutate{
      remove_field => ["@version","@metadata","input","agent","ecs","fields"]
      add_field => {
        "appName" => "%{[fields][app]}"
      }
    }
}


#經過filter過濾之后的字段繼續輸出到控制台
output {

  stdout{ codec => json }
   
}

重啟Logstash觀察控制台日志

{
    "tags":[
        "demo",
        "beats_input_codec_plain_applied"
    ],
    "log":{
        "file":{
            "path":"D:\data\logs\demo\demo-info.log"
        },
        "offset":2020
    },
    "host":{
        "mac":[
            "f8:b4:6a:20:7f:f4",
            "c0:b5:d7:28:44:85",
            "c2:b5:d7:28:44:85",
            "e2:b5:d7:28:44:85",
            "00:ff:3f:88:6e:59"
        ],
        "os":{
            "build":"18363.1016",
            "platform":"windows",
            "name":"Windows 10 Home China",
            "kernel":"10.0.18362.1016 (WinBuild.160101.0800)",
            "family":"windows",
            "version":"10.0"
        },
        "id":"0ea80d06-30e3-4b1f-9e15-cf0006381169",
        "name":"WIN-IJE5R5BU096",
        "ip":[
            "fe80::445f:b46c:2007:b14b",
            "192.168.1.83",
            "fe80::3c10:2e3f:fc46:9d28",
            "169.254.157.40",
            "fe80::9842:fa00:1199:8207",
            "169.254.130.7",
            "fe80::ac1b:b018:58f6:f20e",
            "169.254.242.14",
            "172.24.36.1",
            "fe80::2d69:dab9:f82c:33ce",
            "169.254.51.206"
        ],
        "hostname":"WIN-IJE5R5BU096",
        "architecture":"x86_64"
    },
    "level":"DEBUG",
    "traceId":"6a91634313f7000",
    "className":"com.cd.demo.controllr.DemoController",
    "msg":"======================debug",
    "appName":"demo",
    "classLine":"28",
    "message":"2020-08-17 09:07:13.761 6a91634313f7000 [http-nio-8080-exec-1] DEBUG com.cd.demo.controllr.DemoController[28] - ======================debug",
    "thread":"http-nio-8080-exec-1",
    "@timestamp":"2020-08-17T01:07:13.761Z"
}

現在看這個日志基本上已經是我們想要的，包括，日志路徑，應用名，業務日志，主機信息等核心日志信息。

接着我們將日志結果輸出到ES中去文檔地址：https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-index

高版本的Logstash日志索引默認建立方式是{now/d}-000001 格式，例如：logstash-2020.02.10-000001，如果想自己定義指定 ilm_enabled => false即可

Logstash配置輸出到ES完整配置，加了詳細配置說明

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  beats {
    port => 5044
  }
}

filter {
    #提取message字段，這個字段是業務日志，使用正則匹配的形式將message提取為一個個字段，為什么有兩個message呢？假如你的日志格式不統一就需要多個正則去匹配，但是盡量避免這種情況的出現
    #多個正則匹配，如果日志量比較大，會降低Logstash的處理效率，
    grok{
      match => [
            "message" , "(?m)%{TIMESTAMP_ISO8601:logdate}\s+%{BASE16NUM:traceId}+\s\[%{DATA:thread}\]\s+%{LOGLEVEL:level}\s+%{PROG:className}\[%{INT:classLine}\]\s+-+\s+%{GREEDYDATA:msg}",
            "message" , "(?m)%{TIMESTAMP_ISO8601:logdate}\s\s+\[%{DATA:thread}\]\s+%{LOGLEVEL:level}\s\s++%{PROG:className}\[%{INT:classLine}\]\s+-+\s+%{GREEDYDATA:msg}"
       ]       
 
   }
  #使用業務日志時間替換Logstash的@timestamp時間，避免兩個時間不同步
  date {
        match => ["logdate", "yyyy-MM-dd HH:mm:ss.SSS"]
        target => "@timestamp"
        remove_field => ["logdate"]
  }
  #去除一些不需要的字段，注意一定要保留@timestamp字段否則無法按照日期維度建立索引，同時也保留message字段方便我們查看
  #同時將額外添加的fields字段當作一個新的字段添加，這樣以便我們知道是哪個應用日志，也可以使用tags字段來做定義
    mutate{
      remove_field => ["@version","@metadata","input","agent","ecs","fields"]
      add_field => {
        "appName" => "%{[fields][app]}"
      }
    }
}


#經過filter過濾之后的字段繼續輸出到控制台 和  ES 
output {
  stdout{ codec => rubydebug }
    elasticsearch {
      hosts => ["127.0.0.1:9200"]
      #按照每天一個日志索引建立
      index => "logstash-demo-%{+yyyy.MM.dd}"
      #關閉Logstash的ilm_enabled，否則會按照{now/d}-000001   方式創建索引文件
      #ilm_enabled => false
    }

}