原文鏈接:https://blog.csdn.net/napoay/article/details/62885899
一、簡介
Grok是迄今為止使蹩腳的、無結構的日志結構化和可查詢的最好方式。Grok在解析 syslog logs、apache and other webserver logs、mysql logs等任意格式的文件上表現完美。
Grok內置了120多種的正則表達式庫,地址:https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns。
二、入門例子
下面是一條tomcat日志:
83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
從filebeat中輸出到logstash,配置如下:
input {
beats {
port => "5043" } } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}"} } } output { stdout { codec => rubydebug } }
fileter中的message
代表一條一條的日志,%{COMBINEDAPACHELOG}
代表解析日志的正則表達式,COMBINEDAPACHELOG的具體內容見:https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/httpd。解析后:
{
"request" => "/presentations/logstash-monitorama-2013/images/kibana-search.png", "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"", "offset" => 325, "auth" => "-", "ident" => "-", "input_type" => "log", "verb" => "GET", "source" => "/path/to/file/logstash-tutorial.log", "message" => "83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"", "type" => "log", "tags" => [ [0] "beats_input_codec_plain_applied" ], "referrer" => "\"http://semicomplete.com/presentations/logstash-monitorama-2013/\"", "@timestamp" => 2016-10-11T21:04:36.167Z, "response" => "200", "bytes" => "203023", "clientip" => "83.149.9.216", "@version" => "1", "beat" => { "hostname" => "My-MacBook-Pro.local", "name" => "My-MacBook-Pro.local" }, "host" => "My-MacBook-Pro.local", "httpversion" => "1.1", "timestamp" => "04/Jan/2015:05:13:42 +0000" }
再比如,下面這條日志:
55.3.244.1 GET /index.html 15824 0.043
- 1
這條日志可切分為5個部分,IP(55.3.244.1)
、方法(GET)
、請求文件路徑(/index.html)
、字節數(15824)
、訪問時長(0.043)
,對這條日志的解析模式(正則表達式匹配)如下:
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
- 1
寫到filter中:
filter {
grok {
match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"} } }
- 1
- 2
- 3
- 4
- 5
解析后:
client: 55.3.244.1 method: GET request: /index.html bytes: 15824 duration: 0.043
- 1
- 2
- 3
- 4
- 5
三、解析任意格式日志
解析任意格式日志的步驟:
- 先確定日志的切分原則,也就是一條日志切分成幾個部分。
- 對每一塊進行分析,如果Grok中正則滿足需求,直接拿來用。如果Grok中沒用現成的,采用自定義模式。
- 學會在Grok Debugger中調試。
下面給出例子,來兩條日志:
2017-03-07 00:03:44,373 4191949560 [ CASFilter.java:330:DEBUG] entering doFilter() 2017-03-16 00:00:01,641 133383049 [ UploadFileModel.java:234:INFO ] 上報內容准備寫入文件
- 1
- 2
- 3
切分原則:
2017-03-16 00:00:01,641:時間 133383049:編號 UploadFileModel.java:java類名 234:代碼行號 INFO:日志級別 entering doFilter():日志內容
- 1
- 2
- 3
- 4
- 5
- 6
- 7
前五個字段用Grok中已有的,分別是TIMESTAMP_ISO8601
、NUMBER
、JAVAFILE
、NUMBER
、LOGLEVEL
,最后一個采用自定義正則的形式,日志級別的]之后的內容不論是中英文,都作為日志信息處理,使用自定義正則表達式子的規則如下:
(?<field_name>the pattern here)
- 1
最后一個字段的內容用info表示,正則如下:
(?<info>([\s\S]*))
- 1
上面兩條日志對應的完整的正則如下,其中\s*
用於剔除空格。
\s*%{TIMESTAMP_ISO8601:time}\s*%{NUMBER:num} \[\s*%{JAVAFILE:class}\s*\:\s*%{NUMBER:lineNumber}\s*\:%{LOGLEVEL:level}\s*\]\s*(?<info>([\s\S]*))
- 1
正則解析容易出錯,強烈建議使用Grok Debugger調試,姿勢如下。