ELK-filter過濾器使用方法


kibana自帶grok插件工具

處理日志讀取,思路是:先分析日志信息是什么格式,以及日志規則需要filter里面的什么模塊處理或者組合使用處理??

官網地址

https://www.elastic.co/guide/en/logstash/7.12/filter-plugins.html

grok正則測試

https://grokdebug.herokuapp.com/

logstash的grok路徑

[root@es-web1 ~]# ll /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns

-rw-r--r-- 1 root root 5514 Apr 21 03:50 /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns

案例 非json格式日志

192.168.7.10 - - [24/May/2021:15:50:47 +0800] "GET /shijiange HTTP/1.1" 404 571 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"

通過grok正則獲取

%{IP:clientip} - - \[(?<requesttime>[^ ]+ \+\d+)\] "(?<requesttype>\w+) (?<requesturl>[^ ]+) HTTP/\d.\d" (?<status>\d+) (?<size>\d+) "[^"]+" "(?<ua>[^"]+)"

效果

Grok提供的常用Patterns說明及舉例

大多數Linux使用人員都有過用正則表達式來查詢機器中相關文件或文件里內容的經歷,在Grok里,我們也是使用正則表達式來識別日志里的相關數據塊。
  有兩種方式來使用正則表達式:

  直接寫正則來匹配
  用Grok表達式映射正則來匹配
  在我看來,每次重新寫正則是一件很痛苦的事情,為什么不用表達式來一勞永逸呢?
  特別提示:Grok表達式很像C語言里的宏定義
  要學習Grok的默認表達式,我們就要找到它的具體配置路徑,路徑如下:
# Windows下路徑[你的logstash安裝路徑]\vendor\bundle\jruby\x.x\gems\logstash-patterns-core-x.x.x\patterns\grok-patterns  現在對常用的表達式進行說明:

常用表達式

  USERNAME 或 USER
  用戶名,由數字、大小寫及特殊字符(._-)組成的字符串
  比如:1234、Bob、Alex.Wong等

  EMAILLOCALPART
  電子郵件用戶名部分,首位由大小寫字母組成,其他位由數字、大小寫及特殊字符(_.+-=:)組成的字符串。注意,國內的QQ純數字郵箱賬號是無法匹配的,需要修改正則
  比如:stone、Gary_Lu、abc-123等

  EMAILADDRESS
  電子郵件
  比如:stone@abc.com、Gary_Lu@gmail.com、abc-123@163.com等

  HTTPDUSER
  Apache服務器的用戶,可以是EMAILADDRESS或USERNAME
  INT
  整數,包括0和正負整數
  比如:0、-123、43987等

  BASE10NUM 或 NUMBER
  十進制數字,包括整數和小數
  比如:0、18、5.23等

  BASE16NUM
  十六進制數字,整數
  比如:0x0045fa2d、-0x3F8709等

  BASE16FLOAT
  十六進制數字,整數和小數
  WORD
  字符串,包括數字和大小寫字母
  比如:String、3529345、ILoveYou等

  NOTSPACE
  不帶任何空格的字符串
  SPACE
  空格字符串
  QUOTEDSTRING 或 QS
  帶引號的字符串
  比如:"This is an apple"、'What is your name?'等

  UUID
  標准UUID
  比如:550E8400-E29B-11D4-A716-446655440000

  MAC
  MAC地址,可以是Cisco設備里的MAC地址,也可以是通用或者Windows系統的MAC地址
  IP
  IP地址,IPv4或IPv6地址
  比如:127.0.0.1、FE80:0000:0000:0000:AAAA:0000:00C2:0002等

  HOSTNAME
  主機名稱
  IPORHOST
  IP或者主機名稱
  HOSTPORT
  主機名(IP)+端口
  比如:127.0.0.1:3306、api.stozen.NET:8000等

  PATH
  路徑,Unix系統或者Windows系統里的路徑格式
  比如:/usr/local/nginx/sbin/nginx、c:\windows\system32\clr.exe等

  URIPROTO
  URI協議
  比如:http、ftp等

  URIHOST
  URI主機
  比如:www.stozen.Net、10.0.0.1:22等

  URIPATH
  URI路徑
  比如://www.stozen.net/abc/、/api.PHP等

  URIPARAM
  URI里的GET參數
  比如:?a=1&b=2&c=3

  URIPATHPARAM
  URI路徑+GET參數
  比如://www.stozen.net/abc/api.php?a=1&b=2&c=3

  URI
  完整的URI
  比如:http://www.stozen.net/abc/api.php?a=1&b=2&c=3

日期時間表達式

  MONTH
  月份名稱
  比如:Jan、January等

  MONTHNUM
  月份數字
  比如:03、9、12等

  MONTHDAY
  日期數字
  比如:03、9、31等

  DAY
  星期幾名稱
  比如:Mon、Monday等

  YEAR
  年份數字
  HOUR
  小時數字
  MINUTE
  分鍾數字
  SECOND
  秒數字
  TIME
  時間
  比如:00:01:23

  DATE_US
  美國日期格式
  比如:10-15-1982、10/15/1982等

  DATE_EU
  歐洲日期格式
  比如:15-10-1982、15/10/1982、15.10.1982等

  ISO8601_TIMEZONE
  ISO8601時間格式
  比如:+10:23、-1023等

  TIMESTAMP_ISO8601
  ISO8601時間戳格式
  比如:2016-07-03T00:34:06+08:00

  DATE
  日期,美國日期%{DATE_US}或者歐洲日期%{DATE_EU}
  DATESTAMP
  完整日期+時間
  比如:07-03-2016 00:34:06

  HTTPDATE
  http默認日期格式
  比如:03/Jul/2016:00:36:53 +0800

Log表達式

  LOGLEVEL
  日志等級
  比如:Alert、alert、ALERT、Error等

三、創建自己的Grok表達式
  在業務領域中,可能會有越來越多的日志格式出現在我們眼前,而Grok的默認表達式顯然已無法滿足我們的需求(比如用戶身份證號、手機號等信息),所以,我們需要自己動手添加些表達式。
表達式正則表達式說明DATE_CHS%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}中國人習慣的日期格式ZIPCODE_CHS[1-9]\d{5}國內郵政編碼GAME_ACCOUNT[a-zA-Z][a-zA-Z0-9_]{4,15}游戲賬號,首字符為字母,4-15位字母、數字、下划線組成  還有很多,需要您在業務中靈活運用!

官方grok自帶語法

USERNAME [a-zA-Z0-9_-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b

POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
#QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"])*"|(?:'(?:\\.|[^\\'])*')|(?:`(?:\\.|[^\\`])*`)))
QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"]+)*"|(?:'(?:\\.|[^\\']+)*')|(?:`(?:\\.|[^\\`]+)*`)))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}

# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
HOST %{HOSTNAME}
IPORHOST (?:%{HOSTNAME}|%{IP})
HOSTPORT (?:%{IPORHOST=~/\./}:%{POSINT})

# paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (?:/(?:[\w_%!$@:.,-]+|\\.)*)+
NUXTTY (?:/dev/pts/%{NONNEGINT})
BSDTTY (?:/dev/tty[pq][a-z0-9])
TTY (?:%{BSDTTY}|%{LINUXTTY})
WINPATH (?:[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=#%_-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*'|(){},~#%&/=:;_-]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?

# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])

# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)

# Years?
YEAR [0-9]+
# Time: HH:MM:SS
#TIME \d{2}:\d{2}(?::\d{2}(?:\.\d+)?)?
# I'm still on the fence about using grok to perform the time match,
# since it's probably slower.
# TIME %{POSINT<24}:%{POSINT<60}(?::%{POSINT<60}(?:\.%{POSINT})?)?
HOUR (?:2[0123]|[01][0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[PMCE][SD]T)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}

# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG (?:[\w._/%-]+)
SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{POSINT:facility}.%{POSINT:priority}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT:ZONE}

# Shortcuts
QS %{QUOTEDSTRING}

# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
COMBINEDAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" %{QS:agent}

# Log Levels
LOGLEVEL ([D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL)/*#UNIXPATH (?<![\w*/

案例 json格式日志

{"@timestamp":"2021-08-28T21:17:31+08:00","host":"172.31.2.107","clientip":"172.31.0.1","size":0,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"172.31.2.107","url":"/web/index.html","domain":"172.31.2.107","xff":"-","referer":"-","status":"304"}

通過json模塊處理

input {
  redis {
    data_type => "list"
    key => "qq-m44-nginx-log"
    host => "172.31.2.106"
    port => "6379"
    db => "3"
    password => "123456"
    codec => json
  }
}

# 過濾器
filter {
  json {
    source => "message"
    remove_field => ["message","@version","path","beat","input","log","offset","prospector","source","tags"]
  }
  date {
        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
        target => "@timestamp"
    }
}

output {
  if [fields][app] == "nginx-errorlog" {
    elasticsearch {
      hosts => ["172.31.2.101:9200"]
      index => "qq-123test-filebeat-nginx-errorlog-%{+YYYY.MM.dd}"
  }}

  if [fields][app] == "nginx-accesslog" {
    elasticsearch {
      hosts => ["172.31.2.101:9200"]
      index => "qq-123test-filebeat-nginx-accesslog-%{+YYYY.MM.dd}"
  }}
}

訪問nginx,終端輸出效果

{
           "agent" => {
                "name" => "es-web1.example.local",
                "type" => "filebeat",
        "ephemeral_id" => "2a8806fd-48de-46e0-bdde-502aa74b4c83",
             "version" => "7.12.1",
            "hostname" => "es-web1.example.local",
                  "id" => "51f9df27-4170-4844-ba12-c719de1f4410"
    },
          "domain" => "172.31.2.107",
          "status" => "304",
    "upstreamtime" => "-",
            "size" => 0,
             "xff" => "-",
             "ecs" => {
        "version" => "1.8.0"
    },
      "@timestamp" => 2021-08-29T05:31:29.000Z,
        "clientip" => "172.31.0.1",
         "referer" => "-",
    "responsetime" => 0.0,
    "upstreamhost" => "-",
       "http_host" => "172.31.2.107",
             "url" => "/web/index.html",
            "host" => "172.31.2.107",
          "fields" => {
        "group" => "n125",
          "app" => "nginx-accesslog"
    }
}


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM