五、Logstash日志收集實踐
在學習Logstash之前,我們需要先了解以下幾個基本概念:
logstash收集日志基本流程: input-->codec-->filter-->codec-->output
1.input:從哪里收集日志。
2.filter:發出去前進行過濾
3.output:輸出至Elasticsearch或Redis消息隊列
4.codec:輸出至前台,方便邊實踐邊測試
5.數據量不大日志按照月來進行收集
處理流程(圖解)
Logstash安裝
Logstash需要Java環境,所以直接使用yum安裝。
1.安裝java
[root@linux-node1 ~]# java -version openjdk version "1.8.0_111" OpenJDK Runtime Environment (build 1.8.0_111-b15) OpenJDK 64-Bit Server VM (build 25.111-b15, mixed mode)
2.下載並安裝GPG key
[root@linux-node1 ~]# rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
3.添加logstash的yum倉庫
[root@linux-node1 ~]#cat /etc/yum.repos.d/logstash.repo [logstash-2.3] name=Logstash repository for2.3.x packages baseurl=https://packages.elastic.co/logstash/2.3/centos gpgcheck=1 gpgkey=https://packages.elastic.co/GPG-KEY-elasticsearch enabled=1
4.安裝Logstash
[root@linux-node1 ~]#yum install -y logstash
注:因為鏡像是國外的,如果下載緩慢或者無法下載,可以使用代理翻牆或者香港地區主機訪問下載rpm包安裝,下載rpm安裝包同樣需要上面的yum倉庫
通常使用rubydebug方式前台輸出展示以及測試
[root@linux-node1 /]# /opt/logstash/bin/logstash -e 'input { stdin {} } output { stdout{codec => rubydebug} }' OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N#這個jdk的警告就是顯示需要加CPU hello #輸入 Settings: Default pipeline workers: 1 Pipeline main started { "message" => "hello", "@version" => "1", "@timestamp" => "2017-01-03T17:00:24.285Z", "host" => "linux-node1.example.com" }
把內容寫到elasticsearch中
[root@linux-node1 /]# /opt/logstash/bin/logstash -e 'input { stdin {} } output { elasticsearch { hosts => ["192.168.230.128:9200"]}}' OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N Settings: Default pipeline workers: 1 Pipeline main started 123123 hehehehe 123hehe hehe123 kkkksisi
數據瀏覽,查看寫得數據
寫到標准輸出,同時寫到elas里面
在elasticsearch中寫一份,同時在本地輸出一份,也就是在本地保留一份文本文件,也就不用在elasticsearch中再定時備份到遠端一份了。此處使用的保留文本文件三大優勢:1)文本最簡單 2)文本可以二次加工 3)文本的壓縮比最高
[root@linux-node1 /]# /opt/logstash/bin/logstash -e 'input { stdin {} } output { elasticsearch { hosts => ["192.168.230.128:9200"]} stdout{codec => rubydebug}}' OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N Settings: Default pipeline workers: 1 Pipeline main started 123hehe { "message" => "123hehe", "@version" => "1", "@timestamp" => "2017-01-03T17:58:07.237Z", "host" => "linux-node1.example.com" } wo si ceshi { "message" => "wo si ceshi", "@version" => "1", "@timestamp" => "2017-01-03T17:58:30.635Z", "host" => "linux-node1.example.com" }
刷新查看
以上學習了logstash的多個輸出,下一步寫成配置文件,總不能在命令行一直敲吧
配置鏈接官網
https://www.elastic.co/guide/en/logstash/2.3/configuration.html
[root@linux-node1 /]# cd /etc/logstash/conf.d/ [root@linux-node1 conf.d]# cat 01-logstash.conf input { stdin { } }#標准輸入 output { elasticsearch { hosts => ["localhost:9200"] }#寫到elasticsearch中 stdout { codec => rubydebug }#寫到標准輸出中 } [root@linux-node1 conf.d]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/01-logstash.conf #-f指定配置文件 OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N Settings: Default pipeline workers: 1 Pipeline main started hehe { "message" => "hehe", "@version" => "1", "@timestamp" => "2017-01-03T19:36:03.909Z", "host" => "linux-node1.example.com" }
瀏覽器刷新查看
配置文件語法

#配置文件的結構 # This is a comment. You should use comments to describe # parts of your configuration. input { ... } filter { ... } output { ... } #必需的兩個選項input和output ,filter是可選的,通過大括號來指定輸入輸出 一個插件的配置包含插件的名稱,然后是一塊插件的設置。例如,這個輸入部分配置兩個輸入文件: input {#這是一個文件file的插件 file{ path =>"/var/log/messages" type=>"syslog" } file{ path =>"/var/log/apache/access.log" type=>"apache" } } #還可以使用數組 Array An array can be a single string value or multiple values. If you specify the same setting multiple times, it appends to the array. Example: path =>["/var/log/messages","/var/log/*.log"]#通過*的方式 path =>"/data/mysql/mysql.log"#還可以寫多個 #這個示例配置路徑是一個數組,其中包含一個元素的三個字符串。 #Boolean 兩個選項true和faslse #一個布爾必須或真或假。注意,真假關鍵詞並不包含在引號。 A boolean must be either true or false. Note that the true and false keywords are not enclosed in quotes. Example: ssl_enable => true #Bytes 字節 #如果沒有指定單位,整數字符串表示的字節數。 Examples: my_bytes =>"1113"# 1113 bytes my_bytes =>"10MiB"# 10485760 bytes my_bytes =>"100kib"# 102400 bytes my_bytes =>"180 mb"# 180000000 bytes #編解碼器是Logstash編解碼器的名稱用於表示數據。編解碼器可以使用輸入和輸出。 Example: codec =>"json" #Hash "field1" => "value1" 注意多個關鍵值項之間用空格分隔而不是逗號。 A hash is a collection of key value pairs specified in the format "field1"=>"value1". Note that multiple key value entries are separated by spaces rather than commas. Example: match =>{ "field1"=>"value1" "field2"=>"value2" ... } #Number #數字必須是有效數值(浮點或整數)。 Numbers must be valid numeric values (floating point or integer). Example: port =>33 #Password #密碼是一個沒有記錄或打印的單個值的字符串.。 A password is a string with a single value that is not logged or printed. Example: my_password =>"password" #Path #一條路徑是一個字符串,它表示一個有效的操作系統路徑。 A path is a string that represents a valid operating system path. String #字符串必須是一個單一的字符序列。注意,字符串值括在引號,雙或單。文字引用的字符串需要用反斜杠轉義 Example: my_path =>"/tmp/logstash" Example: name =>"Hello world" name =>'It\'s a beautiful day' Comments #注釋是一樣的,無論是在perl還是在ruby或者python Comments are the same as in perl, ruby, and python. A comment starts with a # character, and does not need to be at the beginning of a line. For example: # this is a comment input { # comments can appear at the end of a line, too # ... }
Input plugins 插件file

Input plugins 插件file #https://www.elastic.co/guide/en/logstash/2.3/input-plugins.html Required configuration options: file{ path =>... } Available configuration options: #add_field Value type is hash Default value is {} Add a field to an event 解釋: 值類型為哈希值 默認值為{} 向事件添加字段 #discover_interval 值類型為number 默認值為15 #delimiter 值類型為string 默認值為“\” 設置新的行分隔符,默認為“\n” #discover_interval Value type is number Default value is 15 發現間隔默認十五秒,監控文件的變化,文件變化才收集,15秒監控一次 #sincedb_write_interval Value type is number Default value is 15 值類型是number 默認值是15 多久寫一次suncedb 默認15秒 #exclude 對文件名排除 Value type is array There is no default value for this setting. 值類型是數組 此設置沒有默認值.。 Exclusions (matched against the filename, not full path). Filename patterns are valid here, too. For example, if you have path =>"/var/log/*" You might want to exclude gzipped files: exclude =>"*.gz" #sincedb_path sincedb的路徑是一個隱藏文件 值類型是字符串 沒有默認值的設置。 sincedb數據庫文件的路徑(當前位置的跟蹤監控日志文件),將被寫入磁盤。默認將編寫sincedb文件路徑匹配$HOME/.sincedb*注意:它必須是一個文件路徑和不是一個目錄路徑 #string, one of ["beginning", "end"] 開始的地方,有兩個,一個是從頭,默認是從尾 什么意思呢?我現在要讀一個文件,從哪開始收集。比如一個空文件,從尾部開始收集,寫一行收一行,如果這個文件已經有內容了,如果默認什么 參數都不加,從尾部開始收,這個文件之前的內容是不收的,想要把之前的內容也收怎么辦,那么可以改成這個beginning
Logstash收集系統日志
[root@linux-node1 ~]# cat file.conf input{ #標准輸入 file { path => "/var/log/messages"#路徑 type => "system"#設置類型,系統日志 start_position => "beginning"#從頭開始收集 } } output{#標准輸出 elasticsearch { hosts => ["192.168.230.128:9200"] index => "system-%{+YYY.MM.dd}"#指定索引,索引的名稱,可以指定年月日,會自動生成索引 } } [root@linux-node1 ~]# /opt/logstash/bin/logstash -f file.conf 生成system-2017.01.03
索引的內容
注意:對於文件來說是一行一行來收集的,對於els來說叫事件,不是行
Logstash收集java日志
/var/log/elasticsearch/check-cluster.log 這是elasticsearch自帶的java日志,我們來收集它,在系統日志的基礎上增加收集java日志
[root@linux-node1 ~]# cat file.conf input{ file { path => "/var/log/messages" type => "system"#設置類型 start_position => "beginning" } file { path => "/var/log/elasticsearch/check-cluster.log"#java日志的路徑 type => "es-error"#設置類型 start_position => "beginning" #從頭開始收集 } } #使用類型來做判斷,是system的收集到system的索引里,是es-error的收集到es-error里 output{ if [type] == "system" { elasticsearch { hosts => ["192.168.230.128:9200"] index => "system-%{+YYY.MM.dd}" } } if [type] == "es-error" { elasticsearch { hosts => ["192.168.230.128:9200"] index => "es-error-%{+YYY.MM.dd}" } } }
if判斷詳細用法
https://www.elastic.co/guide/en/logstash/2.3/event-dependent-configuration.html
[root@linux-node1 ~]# /opt/logstash/bin/logstash -f file.conf Settings: Default pipeline workers: 1 Pipeline main started
數據瀏覽,查看索引節點
但現在有一個問題,java日志打印出來的是堆棧,每一個都給我收集一行了,文件都是按行收集的,這沒法看,連不起來,讓開發怎么看
我們希望看到的是這樣一個日志格式,而不是像Elasticsearch中那樣,一行一行的
先來測試下 那我應該怎樣把多行變為一行呢,我們發現上面的日志格式是[]時間點開頭到下一個[]時間點是為一個事件,我們引入Codec multiline插件 [root@linux-node1 conf.d]# cat multilne.conf input { stdin { codec => multiline { pattern =>"^\["#以中括號開頭,\轉義 negate => true what =>"previous" } } } output { stdout { codec =>"rubydebug" } }
意思是什么呢,就是在日志中以中括號開頭,出現下一次中括號之前的輸出算是一個完整的事件,如下圖:輸入[1][2]等會輸出, 輸入不帶[]的就不輸出
測試好以后引入到剛才的file.conf文件中
[root@linux-node1 ~]# mv file.conf all.conf #先改個名吧,以后陸續會用到這個文件 [root@linux-node1 ~]# cat all.conf input{ file{ path =>"/var/log/messages" type=>"system" start_position =>"beginning" } file{ path =>"/var/log/elasticsearch/check-cluster.log" type=>"es-error" start_position =>"beginning" codec => multiline { #增加這幾行內容 pattern =>"^\[" negate => true what =>"previous" } } } output{ if[type]=="system"{ elasticsearch { hosts =>["192.168.230.128:9200"] index =>"system-%{+YYY.MM.dd}" } } if[type]=="es-error"{ elasticsearch { hosts =>["192.168.230.128:9200"] index =>"es-error-%{+YYY.MM.dd}" } } } [root@linux-node1 ~]# /opt/logstash/bin/logstash -f all.conf #指定文件再次刷新查看,配置文件沒錯,但還是不太好看,我們引入kibana
為了方便查看,我們先學習kibana