ES系列 FileBeat發送日志到logstash、ES、多個output過濾配置

本文轉載自查看原文 2021-01-21 10:25 384 java

簡單概述

　　最近在了解ELK做日志采集相關的內容，這篇文章主要講解通過filebeat來實現日志的收集。日志采集的工具有很多種，如fluentd, flume, logstash,betas等等。首先要知道為什么要使用filebeat呢？因為logstash是jvm跑的，資源消耗比較大，啟動一個logstash就需要消耗500M左右的內存，而filebeat只需要10來M內存資源。常用的ELK日志采集方案中，大部分的做法就是將所有節點的日志內容通過filebeat送到kafka消息隊列，然后使用logstash集群讀取消息隊列內容，根據配置文件進行過濾。然后將過濾之后的文件輸送到elasticsearch中，通過kibana去展示。

filebeat介紹

　　Filebeat由兩個主要組成部分組成：prospector和 harvesters。這些組件一起工作來讀取文件並將事件數據發送到您指定的output。

什么是harvesters？
　　harvesters負責讀取單個文件的內容。harvesters逐行讀取每個文件，並將內容發送到output中。每個文件都將啟動一個harvesters。harvesters負責文件的打開和關閉，這意味着harvesters運行時，文件會保持打開狀態。如果在收集過程中，即使刪除了這個文件或者是對文件進行重命名，Filebeat依然會繼續對這個文件進行讀取，這時候將會一直占用着文件所對應的磁盤空間，直到Harvester關閉。默認情況下，Filebeat會一直保持文件的開啟狀態，直到超過配置的close_inactive參數，Filebeat才會把Harvester關閉。

關閉Harvesters會帶來的影響：
　　file Handler將會被關閉，如果在Harvester關閉之前，讀取的文件已經被刪除或者重命名，這時候會釋放之前被占用的磁盤資源。
　　當時間到達配置的scan_frequency參數，將會重新啟動為文件內容的收集。
　　如果在Havester關閉以后，移動或者刪除了文件，Havester再次啟動時，將會無法收集文件數據。
　　當需要關閉Harvester的時候，可以通過close_*配置項來控制。

什么是Prospector？

　　Prospector負責管理Harvsters，並且找到所有需要進行讀取的數據源。如果input type配置的是log類型，Prospector將會去配置度路徑下查找所有能匹配上的文件，然后為每一個文件創建一個Harvster。每個Prospector都運行在自己的Go routine里。

　　Filebeat目前支持兩種Prospector類型：log和stdin。每個Prospector類型可以在配置文件定義多個。log Prospector將會檢查每一個文件是否需要啟動Harvster，啟動的Harvster是否還在運行，或者是該文件是否被忽略（可以通過配置 ignore_order，進行文件忽略）。如果是在Filebeat運行過程中新創建的文件，只要在Harvster關閉后，文件大小發生了變化，新文件才會被Prospector選擇到。

filebeat工作原理

　　Filebeat可以保持每個文件的狀態，並且頻繁地把文件狀態從注冊表里更新到磁盤。這里所說的文件狀態是用來記錄上一次Harvster讀取文件時讀取到的位置，以保證能把全部的日志數據都讀取出來，然后發送給output。如果在某一時刻，作為output的ElasticSearch或者Logstash變成了不可用，Filebeat將會把最后的文件讀取位置保存下來，直到output重新可用的時候，快速地恢復文件數據的讀取。在Filebaet運行過程中，每個Prospector的狀態信息都會保存在內存里。如果Filebeat出行了重啟，完成重啟之后，會從注冊表文件里恢復重啟之前的狀態信息，讓FIlebeat繼續從之前已知的位置開始進行數據讀取。

Prospector會為每一個找到的文件保持狀態信息。因為文件可以進行重命名或者是更改路徑，所以文件名和路徑不足以用來識別文件。對於Filebeat來說，都是通過實現存儲的唯一標識符來判斷文件是否之前已經被采集過。

　　如果在你的使用場景中，每天會產生大量的新文件，你將會發現Filebeat的注冊表文件會變得非常大。這個時候，你可以參考（ the section called “Registry file is too large? edit），來解決這個問題。

二、下載FileBeat安裝包

wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.1-linux-x86_64.tar.gz

回到頂部

三、FileBeat發送日志到ES

1、解壓文件

tar -zxvf filebeat-6.3.1-linux-x86_64.tar.gz

2、編輯filebeat.yml

vim filebeat.yml

按照要求修改輸入和輸出部分為（紅色）：

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

# Change to true to enable this input configuration.
enabled: true

# Paths that should be crawled and fetched. Glob based paths.
paths:
/home/log/*.log
#- c:\programdata\elasticsearch\logs\*

。。。

#============================= Filebeat modules ===============================

filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml

# Set to true to enable config reloading
reload.enabled: true

。。。

#============================= Filebeat modules ===============================

filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml

# Set to true to enable config reloading
reload.enabled: true

output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]

3、啟動

./filebeat -e -c filebeat.yml -d "Publish"

4、驗證

上傳日志到文件到指定目錄

日志內容：

{"@timestamp":"2018-09-20T01:21:02.363+08:00","@version":1,"message":"測試日志修改索引看看","logger_name":"com.example.demo.DemoApplicationTests","thread_name":"main","level":"INFO","level_value":20000,"appName":"test-name","appname":"test-name"}
{"@timestamp":"2018-09-20T01:21:02.364+08:00","@version":1,"message":"查詢所有學生，pageNo1,pageSize1","logger_name":"com.example.service.StudentService","thread_name":"main","level":"INFO","level_value":20000,"appName":"test-name","appname":"test-name"}
{"@timestamp":"2018-09-20T01:21:02.622+08:00","@version":1,"message":"Student(id=1, name=小明, classname=112, age=21, telphone=2147483647, nickName=null)","logger_name":"com.example.demo.DemoApplicationTests","thread_name":"main","level":"INFO","level_value":20000,"appName":"test-name","appname":"test-name"}

5、kibana查看

回到頂部

四、FileBeat發送日志到Logstash，由logstash發送到ES

1、fileBeat配置

vim  /home/filebeat-6.3.1-linux-x86_64/filebeat.yml

（只改紅色部分其他跟上面配置一致）：

#output.elasticsearch: 關閉ES配置
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

output.logstash: # The Logstash hosts hosts: ["localhost:5044"]

2、配置Logstash

vim /home/logstash-6.3.1/config/conf.d/logstash-es.conf

添加配置：

input {
  beats {
    port => 5044
    ssl  => false
    codec => json  #格式化成json,否則下面%{appname}取不到值
  }
}

output {
     elasticsearch {
        #action => "index"
        hosts => ["localhost:9200"]
        index =>  "%{appname}-%{+YYYY.MM.dd}" #根據項目名稱動態創建索引
        template => "/home/elasticsearch-6.3.1/config/templates/logstash.json" 索引模板地址
        manage_template => false #關閉logstash默認索引模板
        template_name => "crawl" #映射模板的名字
        template_overwrite => true #如果設置為true，模板名字一樣的時候，新的模板會覆蓋舊的模板
     }
}

3、啟動logstash和filebeat

/home/logstash-6.3.1/bin/logstash --path.settings /home/logstash-6.3.1/config/ -f /home/logstash-6.3.1/config/conf.d/logstash-es.conf &
/home/filebeat-6.3.1-linux-x86_64/filebeat -e -c filebeat.yml -d "Publish" &

4、驗證

拷貝日志文件ELK-2018-09-20.log到/home/log文件下

內容如下：

{"@timestamp":"2018-09-20T01:56:55.293+08:00","@version":1,"message":"今天是中秋節放假111，pageNo1,pageSize1","logger_name":"com.example.service.StudentService","thread_name":"main","level":"INFO","level_value":20000,"appName":"test-name","appname":"test-name", "host": "192.168.1.100"}

5、打開kibana

回到頂部

五、logstash多個output配置

1、修改配置文件、

input {
    tcp {
        port => 10514
        codec => "json"
    }
}

input {
  beats {
    port => 5044
    ssl  => false
    codec => json
  }
}

output {

     elasticsearch {
        #action => "index"
        hosts => ["localhost:9200"]
        index =>  "%{appname}-%{+YYYY.MM.dd}"
        template => "/home/elasticsearch-6.3.1/config/templates/logstash.json"
        manage_template => false #關閉logstash自動管理模板功能  
        template_name => "crawl" #映射模板的名字  
        template_overwrite => true
     }


     if [level] == "ERROR" {
         elasticsearch {
            #action => "index"
            hosts => ["localhost:9200"]
            index =>  "%{appname}-error-%{+YYYY.MM.dd}"
            template => "/home/elasticsearch-6.3.1/config/templates/logstash.json"
            manage_template => false #關閉logstash自動管理模板功能
            template_name => "crawl" #映射模板的名字
            template_overwrite => true
         }
     }
}

output {
    stdout {
        codec => rubydebug
    }
}

打開kibana另外一個索引中只有errorr日志

回到頂部

六、logback生成ELK日志中文亂碼問題

自定義json過濾器

 <!-- 輸出到ELK文件 -->
    <appender name="elkLog"
              class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOGPATH}${file.separator}ELK-${TIMESTAMP}.log</file>
        <append>true</append>
        <encoder charset="UTF-8" class="net.logstash.logback.encoder.LogstashEncoder" >
            <jsonFactoryDecorator class="com.example.logback.MyJsonFactoryDecorator" />
            <customFields>{"appname":"${appName}"}</customFields>
        </encoder>
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>${LOGPATH}${file.separator}all${file.separator}%d{yyyy-MM-dd}.log</fileNamePattern>
            <maxHistory>30</maxHistory>
        </rollingPolicy>
        <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
            <MaxFileSize>10MB</MaxFileSize>
        </triggeringPolicy>
    </appender>

java類

package com.example.logback;

import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.databind.MappingJsonFactory;
import net.logstash.logback.decorate.JsonFactoryDecorator;

public class MyJsonFactoryDecorator implements JsonFactoryDecorator {
    @Override
    public MappingJsonFactory decorate(MappingJsonFactory factory) {
        // 禁用對非ascii碼進行escape編碼的特性
        factory.disable(JsonGenerator.Feature.ESCAPE_NON_ASCII);
        return factory;
    }
}

回到頂部

七、logstash+elasticsearch配置索引模板

在使用logstash收集日志的時候，我們一般會使用logstash自帶的動態索引模板，雖然無須我們做任何定制操作，就能把我們的日志數據推送到elasticsearch索引集群中，但是在我們查詢的時候，就會發現，默認的索引模板常常把我們不需要分詞的字段，給分詞了，這樣以來，我們的比較重要的聚合統計就不准確了：

如果使用的是logstash的默認模板，它會按-切分機器名，這樣以來想統計那台機器上的收集日志最多就有問題了，所以這時候，就需要我們自定義一些索引模板了：

在logstash與elasticsearch集成的時候，總共有如下幾種使用模板的方式：

（1）使用默認自帶的索引模板，大部分的字段都會分詞，適合開發和時候快速驗證使用
（2）在logstash收集端自定義配置模板，因為分散在收集機器上，維護比較麻煩
（3）在elasticsearc服務端自定義配置模板，由elasticsearch負責加載模板，可動態更改，全局生效，維護比較容易

以上幾種方式：

使用第一種，最簡單，無須任何配置
使用第二種，適合小規模集群的日志收集，需要在logstash的output插件中使用template指定本機器上的一個模板json路徑，例如 template => "/tmp/logstash.json"

使用第三種，適合大規模集群的日志收集，如何配置，主要配置logstash的output插件中兩個參數：

manage_template => false//關閉logstash自動管理模板功能  
template_name => "crawl"//映射模板的名字

如果使用了，第三種需要在elasticsearch的集群中的config/templates路徑下配置模板json，在elasticsearch中索引模板可分為兩種：

1、靜態模板

適合索引字段數據固定的場景，一旦配置完成，不能向里面加入多余的字段，否則會報錯
優點：scheam已知，業務場景明確，不容易出現因字段隨便映射從而造成元數據撐爆es內存，從而導致es集群全部宕機
缺點：字段數多的情況下配置稍繁瑣

一個靜態索引模板配置例子如下：

{  
  "crawl" : {  
      "template": "crawl-*",  
        "settings": {  
            "index.number_of_shards": 3,  
            "number_of_replicas": 0   
        },  
    "mappings" : {  
      "logs" : {  
        "properties" : {  
          "@timestamp" : {  
            "type" : "date",  
            "format" : "dateOptionalTime",  
            "doc_values" : true  
          },  
          "@version" : {  
            "type" : "string",  
            "index" : "not_analyzed",  
        "doc_values" : true      
          },  
          "cid" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "crow" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "erow" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "host" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "httpcode" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "message" : {  
            "type" : "string"  
          },  
          "path" : {  
            "type" : "string"  
          },  
          "pcode" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "pro" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "ptype" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "save" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "t1" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "t2" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "t3" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "url" : {  
            "type" : "string"  
          }  
        }  
      }  
    }  
  }  
}

2、動態模板

適合字段數不明確，大量字段的配置類型相同的場景，多加字段不會報錯

優點：可動態添加任意字段，無須改動scheaml，
缺點：如果添加的字段非常多，有可能造成es集群宕機

如下的一個logstash的動態索引模板，只設置message字段分詞，其他的字段默認不分詞

{  
  "template" : "crawl-*",  
  "settings" : {  
   "index.number_of_shards": 5,  
   "number_of_replicas": 0    
  
},  
  "mappings" : {  
    "_default_" : {  
      "_all" : {"enabled" : true, "omit_norms" : true},  
      "dynamic_templates" : [ {  
        "message_field" : {  
          "match" : "message",  
          "match_mapping_type" : "string",  
          "mapping" : {  
            "type" : "string", "index" : "analyzed", "omit_norms" : true,  
            "fielddata" : { "format" : "disabled" }  
          }  
        }  
      }, {  
        "string_fields" : {  
          "match" : "*",  
          "match_mapping_type" : "string",  
          "mapping" : {  
            "type" : "string", "index" : "not_analyzed", "doc_values" : true  
          }  
        }  
      } ],  
      "properties" : {  
        "@timestamp": { "type": "date" },  
        "@version": { "type": "string", "index": "not_analyzed" },  
        "geoip"  : {  
          "dynamic": true,  
          "properties" : {  
            "ip": { "type": "ip" },  
            "location" : { "type" : "geo_point" },  
            "latitude" : { "type" : "float" },  
            "longitude" : { "type" : "float" }  
          }  
        }  
      }  
    }  
  }  
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ES系列十八、FileBeat發送日志到logstash、ES、多個output過濾配置 Filebeat+Logstash采集多個日志文件寫入不同的ES索引 Logstash-Logstash 配置（四）output配置（3）寫入到ES（重點） filebeat+logstash+es搭建 Logstash-Logstash 配置（五）實戰舉例：將錯誤日志寫入es filebeat收集本地日志到ES 單個logstash文件收集多個filebeat日志 logstash結合es，日志收集 logstash系列一使用logstash遷移ES數據十八、.net core（.NET 6）搭建ElasticSearch(ES)系列之使用Logstash通過Rabbitmq接收Serilog日志到ES