假設日志文件中的每一行記錄格式為json的,如:
{"Method":"JSAPI.JSTicket","Message":"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw","CreateTime":"2015/10/13 9:39:59","AppGUID":"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d","_PartitionKey":"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d","_RowKey":"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c","_UnixTS":1444700398710}
默認配置下,logstash處理插入進elasticsearch后,查到的結果是這樣的:
1 { 2 "_index": "logstash-2015.10.16", 3 "_type": "voip_feedback", 4 "_id": "sheE9eXiQASMDVtRJ0EYcg", 5 "_version": 1, 6 "found": true, 7 "_source": { 8 "message": "{\"Method\":\"JSAPI.JSTicket\",\"Message\":\"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw\",\"CreateTime\":\"2015/10/13 9:39:59\",\"AppGUID\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_PartitionKey\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_RowKey\":\"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c\",\"_UnixTS\":1444700398710}", 9 "@version": "1", 10 "@timestamp": "2015-10-16T00:39:51.252Z", 11 "type": "voip_feedback", 12 "host": "ipphone", 13 "path": "/usr1/data/voip_feedback.txt" 14 } 15 }
即會將json記錄做為一個字符串放到”message”下,但是我是想讓logstash自動解析json記錄,將各字段放入elasticsearch中。有三種配置方式可以實現。
第一種,直接設置format => json
1 file { 2 type => "voip_feedback" 3 path => ["/usr1/data/voip_feedback.txt"] 4 format => json 5 sincedb_path => "/home/jfy/soft/logstash-1.4.2/voip_feedback.access" 6 }
這種方式查詢出的結果是:
1 { 2 "_index": "logstash-2015.10.16", 3 "_type": "voip_feedback", 4 "_id": "NrNX8HrxSzCvLl4ilKeyCQ", 5 "_version": 1, 6 "found": true, 7 "_source": { 8 "Method": "JSAPI.JSTicket", 9 "Message": "JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw", 10 "CreateTime": "2015/10/13 9:39:59", 11 "AppGUID": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d", 12 "_PartitionKey": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d", 13 "_RowKey": "1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c", 14 "_UnixTS": 1444700398710, 15 "@version": "1", 16 "@timestamp": "2015-10-16T00:16:11.455Z", 17 "type": "voip_feedback", 18 "host": "ipphone", 19 "path": "/usr1/data/voip_feedback.txt" 20 } 21 }
可以看到,json記錄已經被直接解析成各字段放入到了_source中,但是原始記錄內容沒有被保存
第二種,使用codec => json
1 file { 2 type => "voip_feedback" 3 path => ["/usr1/data/voip_feedback.txt"] 4 sincedb_path => "/home/jfy/soft/logstash-1.4.2/voip_feedback.access" 5 codec => json { 6 charset => "UTF-8" 7 } 8 }
這種方式查詢出的結果與第一種一樣,字段被解析,原始記錄內容也沒有保存
第三種,使用filter json
1 filter { 2 if [type] == "voip_feedback" { 3 json { 4 source => "message" 5 #target => "doc" 6 #remove_field => ["message"] 7 } 8 } 9 }
這種方式查詢出的結果是這樣的:
1 { 2 "_index": "logstash-2015.10.16", 3 "_type": "voip_feedback", 4 "_id": "CUtesLCETAqhX73NKXZfug", 5 "_version": 1, 6 "found": true, 7 "_source": { 8 "message": "{\"Method222\":\"JSAPI.JSTicket\",\"Message\":\"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw\",\"CreateTime\":\"2015/10/13 9:39:59\",\"AppGUID\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_PartitionKey\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_RowKey\":\"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c\",\"_UnixTS\":1444700398710}", 9 "@version": "1", 10 "@timestamp": "2015-10-16T00:28:20.018Z", 11 "type": "voip_feedback", 12 "host": "ipphone", 13 "path": "/usr1/data/voip_feedback.txt", 14 "Method222": "JSAPI.JSTicket", 15 "Message": "JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw", 16 "CreateTime": "2015/10/13 9:39:59", 17 "AppGUID": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d", 18 "_PartitionKey": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d", 19 "_RowKey": "1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c", 20 "_UnixTS": 1444700398710, 21 "tags": [ 22 "111", 23 "222" 24 ] 25 } 26 }
可以看到,原始記錄被保存,同時字段也被解析保存。如果確認不需要保存原始記錄內容,可以加設置:remove_field => [“message”]
比較以上三種方法,最方便直接的就是在file中設置format => json
另外需要注意的是,logstash會在向es插入數據時默認會在_source下增加type,host,path三個字段,如果json內容中本身也含有type,host,path字段,那么解析后將覆蓋掉logstash默認的這三個字段,尤其是type字段,這個同時也是做為index/type用的,覆蓋掉后,插入進es中的index/type就是json數據記錄中的內容,將不再是logstash config中配置的type值。
這時需要設置filter.json.target,設置該字段后json原始內容將不會放在_source下,而是放到設置的”doc”下:
1 { 2 "_index": "logstash-2015.10.20", 3 "_type": "3alogic_log", 4 "_id": "xfj3ngd5S3iH2YABjyU6EA", 5 "_version": 1, 6 "found": true, 7 "_source": { 8 "@version": "1", 9 "@timestamp": "2015-10-20T11:36:24.503Z", 10 "type": "3alogic_log", 11 "host": "server114", 12 "path": "/usr1/app/log/mysql_3alogic_log.log", 13 "doc": { 14 "id": 633796, 15 "identity": "13413602120", 16 "type": "EAP_TYPE_PEAP", 17 "apmac": "88-25-93-4E-1F-96", 18 "usermac": "00-65-E0-31-62-5D", 19 "time": "20151020-193624", 20 "apmaccompany": "TP-LINK TECHNOLOGIES CO.,LTD", 21 "usermaccompany": "" 22 } 23 } 24 }
這樣就不會覆蓋掉_source下的type,host,path值
而且在kibana中顯示時字段名稱為doc.type,doc.id…
補充: 無法解析的json不記錄到elasticsearch中
output { stdout{ codec => rubydebug } #無法解析的json不記錄到elasticsearch中 if "_jsonparsefailure" not in [tags] { elasticsearch { host => "localhost" } }
轉載自:http://blog.csdn.net//jiao_fuyou/article/details/49174269
由於自己的項目只處理JSON字符串的日志,網上搜集資料過程中,還找到了一些對於系統日志類型以及普通打印類型字符串的日志格式處理,留下連接以后有需要參考。
logstash使用grok正則解析日志和kibana遇到的問題
http://udn.yyuap.com/doc/logstash-best-practice-cn/filter/grok.html
https://github.com/elastic/logstash/blob/v1.1.9/patterns/grok-patterns https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html