kibana自帶grok插件工具
處理日志讀取,思路是:先分析日志信息是什么格式,以及日志規則需要filter里面的什么模塊處理或者組合使用處理??
官網地址
https://www.elastic.co/guide/en/logstash/7.12/filter-plugins.html
grok正則測試
https://grokdebug.herokuapp.com/
logstash的grok路徑
[root@es-web1 ~]# ll /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns
-rw-r--r-- 1 root root 5514 Apr 21 03:50 /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns
案例 非json格式日志
192.168.7.10 - - [24/May/2021:15:50:47 +0800] "GET /shijiange HTTP/1.1" 404 571 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
通過grok正則獲取
%{IP:clientip} - - \[(?<requesttime>[^ ]+ \+\d+)\] "(?<requesttype>\w+) (?<requesturl>[^ ]+) HTTP/\d.\d" (?<status>\d+) (?<size>\d+) "[^"]+" "(?<ua>[^"]+)"
效果
Grok提供的常用Patterns說明及舉例
大多數Linux使用人員都有過用正則表達式來查詢機器中相關文件或文件里內容的經歷,在Grok里,我們也是使用正則表達式來識別日志里的相關數據塊。
有兩種方式來使用正則表達式:
直接寫正則來匹配
用Grok表達式映射正則來匹配
在我看來,每次重新寫正則是一件很痛苦的事情,為什么不用表達式來一勞永逸呢?
特別提示:Grok表達式很像C語言里的宏定義
要學習Grok的默認表達式,我們就要找到它的具體配置路徑,路徑如下:
# Windows下路徑[你的logstash安裝路徑]\vendor\bundle\jruby\x.x\gems\logstash-patterns-core-x.x.x\patterns\grok-patterns 現在對常用的表達式進行說明:
常用表達式
USERNAME 或 USER
用戶名,由數字、大小寫及特殊字符(._-)組成的字符串
比如:1234、Bob、Alex.Wong等
EMAILLOCALPART
電子郵件用戶名部分,首位由大小寫字母組成,其他位由數字、大小寫及特殊字符(_.+-=:)組成的字符串。注意,國內的QQ純數字郵箱賬號是無法匹配的,需要修改正則
比如:stone、Gary_Lu、abc-123等
EMAILADDRESS
電子郵件
比如:stone@abc.com、Gary_Lu@gmail.com、abc-123@163.com等
HTTPDUSER
Apache服務器的用戶,可以是EMAILADDRESS或USERNAME
INT
整數,包括0和正負整數
比如:0、-123、43987等
BASE10NUM 或 NUMBER
十進制數字,包括整數和小數
比如:0、18、5.23等
BASE16NUM
十六進制數字,整數
比如:0x0045fa2d、-0x3F8709等
BASE16FLOAT
十六進制數字,整數和小數
WORD
字符串,包括數字和大小寫字母
比如:String、3529345、ILoveYou等
NOTSPACE
不帶任何空格的字符串
SPACE
空格字符串
QUOTEDSTRING 或 QS
帶引號的字符串
比如:"This is an apple"、'What is your name?'等
UUID
標准UUID
比如:550E8400-E29B-11D4-A716-446655440000
MAC
MAC地址,可以是Cisco設備里的MAC地址,也可以是通用或者Windows系統的MAC地址
IP
IP地址,IPv4或IPv6地址
比如:127.0.0.1、FE80:0000:0000:0000:AAAA:0000:00C2:0002等
HOSTNAME
主機名稱
IPORHOST
IP或者主機名稱
HOSTPORT
主機名(IP)+端口
比如:127.0.0.1:3306、api.stozen.NET:8000等
PATH
路徑,Unix系統或者Windows系統里的路徑格式
比如:/usr/local/nginx/sbin/nginx、c:\windows\system32\clr.exe等
URIPROTO
URI協議
比如:http、ftp等
URIHOST
URI主機
比如:www.stozen.Net、10.0.0.1:22等
URIPATH
URI路徑
比如://www.stozen.net/abc/、/api.PHP等
URIPARAM
URI里的GET參數
比如:?a=1&b=2&c=3
URIPATHPARAM
URI路徑+GET參數
比如://www.stozen.net/abc/api.php?a=1&b=2&c=3
URI
完整的URI
比如:http://www.stozen.net/abc/api.php?a=1&b=2&c=3
日期時間表達式
MONTH
月份名稱
比如:Jan、January等
MONTHNUM
月份數字
比如:03、9、12等
MONTHDAY
日期數字
比如:03、9、31等
DAY
星期幾名稱
比如:Mon、Monday等
YEAR
年份數字
HOUR
小時數字
MINUTE
分鍾數字
SECOND
秒數字
TIME
時間
比如:00:01:23
DATE_US
美國日期格式
比如:10-15-1982、10/15/1982等
DATE_EU
歐洲日期格式
比如:15-10-1982、15/10/1982、15.10.1982等
ISO8601_TIMEZONE
ISO8601時間格式
比如:+10:23、-1023等
TIMESTAMP_ISO8601
ISO8601時間戳格式
比如:2016-07-03T00:34:06+08:00
DATE
日期,美國日期%{DATE_US}或者歐洲日期%{DATE_EU}
DATESTAMP
完整日期+時間
比如:07-03-2016 00:34:06
HTTPDATE
http默認日期格式
比如:03/Jul/2016:00:36:53 +0800
Log表達式
LOGLEVEL
日志等級
比如:Alert、alert、ALERT、Error等
三、創建自己的Grok表達式
在業務領域中,可能會有越來越多的日志格式出現在我們眼前,而Grok的默認表達式顯然已無法滿足我們的需求(比如用戶身份證號、手機號等信息),所以,我們需要自己動手添加些表達式。
表達式正則表達式說明DATE_CHS%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}中國人習慣的日期格式ZIPCODE_CHS[1-9]\d{5}國內郵政編碼GAME_ACCOUNT[a-zA-Z][a-zA-Z0-9_]{4,15}游戲賬號,首字符為字母,4-15位字母、數字、下划線組成 還有很多,需要您在業務中靈活運用!
官方grok自帶語法
USERNAME [a-zA-Z0-9_-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b
POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
#QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"])*"|(?:'(?:\\.|[^\\'])*')|(?:`(?:\\.|[^\\`])*`)))
QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"]+)*"|(?:'(?:\\.|[^\\']+)*')|(?:`(?:\\.|[^\\`]+)*`)))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
HOST %{HOSTNAME}
IPORHOST (?:%{HOSTNAME}|%{IP})
HOSTPORT (?:%{IPORHOST=~/\./}:%{POSINT})
# paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (?:/(?:[\w_%!$@:.,-]+|\\.)*)+
NUXTTY (?:/dev/pts/%{NONNEGINT})
BSDTTY (?:/dev/tty[pq][a-z0-9])
TTY (?:%{BSDTTY}|%{LINUXTTY})
WINPATH (?:[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=#%_-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*'|(){},~#%&/=:;_-]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
# Years?
YEAR [0-9]+
# Time: HH:MM:SS
#TIME \d{2}:\d{2}(?::\d{2}(?:\.\d+)?)?
# I'm still on the fence about using grok to perform the time match,
# since it's probably slower.
# TIME %{POSINT<24}:%{POSINT<60}(?::%{POSINT<60}(?:\.%{POSINT})?)?
HOUR (?:2[0123]|[01][0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[PMCE][SD]T)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG (?:[\w._/%-]+)
SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{POSINT:facility}.%{POSINT:priority}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT:ZONE}
# Shortcuts
QS %{QUOTEDSTRING}
# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
COMBINEDAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" %{QS:agent}
# Log Levels
LOGLEVEL ([D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL)/*#UNIXPATH (?<![\w*/
案例 json格式日志
{"@timestamp":"2021-08-28T21:17:31+08:00","host":"172.31.2.107","clientip":"172.31.0.1","size":0,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"172.31.2.107","url":"/web/index.html","domain":"172.31.2.107","xff":"-","referer":"-","status":"304"}
通過json模塊處理
input {
redis {
data_type => "list"
key => "qq-m44-nginx-log"
host => "172.31.2.106"
port => "6379"
db => "3"
password => "123456"
codec => json
}
}
# 過濾器
filter {
json {
source => "message"
remove_field => ["message","@version","path","beat","input","log","offset","prospector","source","tags"]
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
output {
if [fields][app] == "nginx-errorlog" {
elasticsearch {
hosts => ["172.31.2.101:9200"]
index => "qq-123test-filebeat-nginx-errorlog-%{+YYYY.MM.dd}"
}}
if [fields][app] == "nginx-accesslog" {
elasticsearch {
hosts => ["172.31.2.101:9200"]
index => "qq-123test-filebeat-nginx-accesslog-%{+YYYY.MM.dd}"
}}
}
訪問nginx,終端輸出效果
{
"agent" => {
"name" => "es-web1.example.local",
"type" => "filebeat",
"ephemeral_id" => "2a8806fd-48de-46e0-bdde-502aa74b4c83",
"version" => "7.12.1",
"hostname" => "es-web1.example.local",
"id" => "51f9df27-4170-4844-ba12-c719de1f4410"
},
"domain" => "172.31.2.107",
"status" => "304",
"upstreamtime" => "-",
"size" => 0,
"xff" => "-",
"ecs" => {
"version" => "1.8.0"
},
"@timestamp" => 2021-08-29T05:31:29.000Z,
"clientip" => "172.31.0.1",
"referer" => "-",
"responsetime" => 0.0,
"upstreamhost" => "-",
"http_host" => "172.31.2.107",
"url" => "/web/index.html",
"host" => "172.31.2.107",
"fields" => {
"group" => "n125",
"app" => "nginx-accesslog"
}
}