本文內容
- 語法
- 測試數據
- 可選配置項
mutate 插件可以在字段上執行變換,包括重命名、刪除、替換和修改。這個插件相當常用。
比如:
- 你已經根據 Grok 表達式將 Tomcat 日志的內容放到各個字段中,想把狀態碼、字節大小或是響應時間,轉換成整型;
- 你已經根據正則表達式將日志內容放到各個字段中,但是字段的值,大小寫都有,這對於 Elasticsearch 的全文檢索來說,顯然用處不大,那么可以用該插件,將字段內容全部轉換成小寫。
遷移到:http://www.bdata-cap.com/newsinfo/1712678.html
語法
該插件必須是用 mutate 包裹,如下所示:
mutate {}
可用的配置選項如下表所示:
設置 | 輸入類型 | 是否必填 | 默認值 |
add_field | hash | No | {} |
add_tag | array | No | [] |
convert | hash | No | |
gsub | array | No | |
join | hash | No | |
lowercase | array | No | |
merge | hash | No | |
periodic_flush | boolean | No | false |
remove_field | array | No | [] |
remove_tag | array | No | [] |
rename | hash | No | |
replace | hash | No | |
split | hash | No | |
strip | array | No | |
update | hash | No | |
uppercase | array | No |
其中,add_field、remove_field、add_tag、remove_tag 是所有 Logstash 插件都有。它們在插件過濾成功后生效。雖然 Logstash 叫過濾,但不僅僅過濾功能。
tag 作用是,當你對字段處理期間,還期望進行后續處理,就先作個標記。Logstash 有個內置 tags 數組,包含了期間產生的 tag,無論是 Logstash 自己產生的,還是你添加的,比如,你用 grok 解析日志,但是錯了,那么 Logstash 自己就會自己添加一個 _grokparsefailure 的 tag。這樣,你在 output 時,可以對解析失敗的日志不做任何處理;
而 field 作用是,對字段的操作,比如,你想利用已有的字段,創建新的字段。這些在后面再說。
另外,你會發現,上表中所有選項,要么是動詞,要么是動賓短語。估計你也猜到了,選項其實就是 ruby 函數,而它們后面,即“=>”,跟着的肯定是一堆參數(要是你寫程序,你也會這么干)。第一個參數,肯定是字段,也就是你期望該函數作用在哪個字段上,從第二個字段開始往后,是具體參數~
什么是字段?比如,你想解析 Tomcat 日志,把一行訪問日志拆分后,得到客戶端IP、字節大小、響應時間等放到指定變量,那么這個變量就是字段。
下面具體介紹各個選項。
測試數據
假設有 Tomcat access 日志:
192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/goLogin" "" 8080 200 1692 23 "http://10.1.8.193:8080/goMain" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"
192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/js/common/jquery-1.10.2.min.js" "" 8080 304 - 67 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"
192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/css/common/login.css" "" 8080 304 - 75 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"
192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET "/js/system/login.js" "" 8080 304 - 53 "http://10.1.8.193:8080/goLogin" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"
它是按如下 Tomcat 配置產生的:
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
prefix="localhost_access_log." suffix=".txt"
pattern="%h %l %u %t %m "%U" "%q" %p %s %b %D "%{Referer}i" "%{User-Agent}i"" />
若用如下 Grok 表達式解析該日志:
%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}
會得到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-17T08:26:07.794Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""
}
注意,日志拆分到各個字段后的數據類型。port、statusCode、bytes、reqTime 字段肯定是(最好是)數字,不過這里暫時先用字符串。后面會介紹,下面的示例都在此基礎上。
可配置選項
add_field
- 值是散列,就是鍵值對,比如 add_field => {"field1"=>"value1","field2"=>"value2"}。
- 默認值是空對象,即
{}
添加新的字段。
示例:
input {
stdin {
}
}
filter {
grok {
match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]
}
mutate {
add_field=>{
"SayHi"=>"Hello , %{clientip}"
}
}
}
output{
stdout{
codec=>rubydebug
}
}
注意黑體部分,如果用這個配置,解析前面的 Tcomat access 日志,會得到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-17T04:52:02.031Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"SayHi" => "Hello , 192.168.6.25"
}
你會看到多了一個 SayHi 字段。這個字段是寫死的,當然也可以動態。如果將
"SayHi"=>"Hello , %{clientip}"
改成:
"another_%{clientip}"=>"Hello , %{clientip}"
你會看到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-17T06:38:04.427Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"another_192.168.6.25" => "Hello , 192.168.6.25"
}
雖然這個例子不太合理,但你現在知道,用已有字段的值,可以生成新的字段和它的值。上面示例只添加了一個字段,你也可以添加多個字段:
add_field=>{
"another_%{clientip}"=>"Hello , %{clientip}"
"another_%{http_method}"=>"Hello, %{http_method}"
}
add_tag
- 值是 array 數組
- 默認值為空數組,即
[]
添加新的標簽。
示例:
mutate {
add_tag=>[
"foo_%{clientip}"
]
}
你會看到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-17T06:48:43.278Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"tags" => [
[0] "foo_192.168.6.25"
]
}
與 add_field 類似,也可以一次添加多個 tags。
注意,add_tag 是數組 [],不是 {}。
convert
- 值是 hash
- 無默認值
數據類型轉換。
如果要轉換成 boolean,那么可接受的數據是:
true
,t
,yes
,y
, 和1
false
,f
,no
,n
, 和0
另外,還可轉換成 integer, float, string。
示例:
mutate {
#convert=>["reqTime","integer","statusCode","integer","bytes","integer"]
convert=>{"port"=>"integer"}
}
convert 有兩種寫法。一種是用數組,兩個為一組;另一種是散列。得到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-17T09:06:25.360Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => 8080,
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""
}
注意,
- port 字段,已經沒有雙引號啦。
- mutate 插件選項的值類型設計得很簡單,要么是散列(鍵值對),要么數組……比如,convert=>["reqTime","integer","statusCode","integer"],兩個為一組,第一個表示字段,第二個為想轉換的數據類型,並沒有采用嵌套或是復合類型。看來作者的意圖是——簡單,復雜的數據類型,雖然看起來容易,但要付出成本的。簡單沒關系,約定好就行。Logstash 很多插件和其選項都這樣。
gsub
- 值是 array 數組
- 無默認值
字符串替換。用正則表達式和字符串都行。它只能用於字符串,如果不是字符串,那么什么都不會做,也不會報錯。
該配置的值是數組,三個為一組,分別表示:字段名稱,待匹配的字符串(或正則表達式),待替換的字符串。
示例:在解析 Tomcat 日志,會遇到一種情況,資源的字節大小,可能會是“-”,因此,需要將“-”,替換成0,然后在用convert轉換成數字型。
input {
stdin {
}
}
filter {
grok {
match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]
}
mutate {
gsub=>["bytes","_","0"]
convert=>["port","integer","reqTime","integer","statusCode","integer","bytes","integer"]
}
}
output{
stdout{
codec=>rubydebug
}
}
得到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/js/common/jquery-1.10.2.min.js\" \"\" 8080 304 - 67 \"http://10.1.8.193:8080/goLogin\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-17T09:17:21.745Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/js/common/jquery-1.10.2.min.js\"",
"request_query" => "\"\"",
"port" => 8080,
"statusCode" => 304,
"bytes" => 0,
"reqTime" => 67,
"referer" => "\"http://10.1.8.193:8080/goLogin\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""
}
join
- 值是 hash
- 無默認值
用分隔符連接數組. 如果字段不是數組,那什么都不做。
示例:
filter { mutate { join =>{"fieldname"=>","}}}
lowercase 和 uppercase
- 值是數組 array
- 沒有默認值
把字符串轉換成小寫或大寫。
示例:
filter {
mutate {
lowercase =>["fieldname"]}}
示例:
filter {
mutate {
uppercase =>["fieldname"]}}
merge
- 值是 hash
- 無默認值
合並兩個數組或散列字段。存在三種情況,合並后是數組:
- 數組和字符串,可以合並
- 字符串和字符串,可以合並
- 數組和散列不能合並
示例:
mutate {
add_field=>{"arr_clientip"=>"%{clientip}"}
add_field=>{"arrmstr_clientip"=>"%{clientip}"}
add_field=>{"arrmarr_clientip"=>"%{clientip}"}
#merge=>{"merge_clientip"=>"clientip"}
}
mutate {
split=>{"arr_clientip"=>"."}
split=>{"arrmstr_clientip"=>"."}
split=>{"arrmarr_clientip"=>"."}
}
mutate {
merge=>{"arrmstr_clientip"=>"clientip"}
merge=>{"arrmarr_clientip"=>"arr_clientip"}
}
=> 后面的字段值會合並到前面的字段。
得到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-18T02:53:35.671Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"arr_clientip" => [
[0] "192",
[1] "168",
[2] "6",
[3] "25"
],
"arrmstr_clientip" => [
[0] "192",
[1] "168",
[2] "6",
[3] "25",
[4] "192.168.6.25"
],
"arrmarr_clientip" => [
[0] "192",
[1] "168",
[2] "6",
[3] "25",
[4] "192",
[5] "168",
[6] "6",
[7] "25"
]
}
periodic_flush
- 值是 boolean
- 默認值是
false
按時間間隔調用。可選。
remove_field
- 值是數組 array
- 默認值是數組
[]
移除字段。
示例:移除 message 字段。
mutate {
remove_field=>["message"]
}
得到如下結果:
{
"@version" => "1",
"@timestamp" => "2016-05-18T02:04:16.879Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""
}
message 字段已經沒有了~message 字段保存了原始日志,如果保留的話,就意味着日志存了兩份:分割前和分割后。
當然,也可以一次移除多個字段。
remove_tag
- 值是數組 array
- 默認值是
[]
移除標識。
示例:
filter {
mutate {
remove_tag =>["foo_%{somefield}"]}}
也可以一次移動多個 tag:
filter {
mutate {
remove_tag =>["foo_%{somefield}","sad_unwanted_tag"]}}
rename
- 值是 hash
- 無默認值
重命名一個或多個字段。
示例:
input {
stdin {
}
}
filter {
grok {
match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]
}
mutate {
rename=>{"clientip"=>"host"}
}
}
output{
stdout{
codec=>rubydebug
}
}
得到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-17T09:29:44.018Z",
"host" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""
}
Grok 里,客戶端IP本來叫 clientip,但是可以在 mutate 里重新命名為 host。
replace
- 值是 hash
- 無默認值
用一個新的值替換掉指定字段的值。
示例:
input {
stdin {
}
}
filter {
grok {
match=>["message","%{IPORHOST:clientip} %{NOTSPACE:identd} %{NOTSPACE:auth} \[%{HTTPDATE:timestamp}\] %{WORD:http_method} %{NOTSPACE:request} %{NOTSPACE:request_query|-} %{NUMBER:port} %{NUMBER:statusCode} (%{NOTSPACE:bytes}|-) %{NUMBER:reqTime} %{QS:referer} %{QS:userAgent}"]
}
mutate {
replace=>{"message"=>"%{clientip}: My new Message."}
}
}
output{
stdout{
codec=>rubydebug
}
}
得到如下結果:
{
"message" => "192.168.6.25: My new Message.",
"@version" => "1",
"@timestamp" => "2016-05-18T01:55:34.566Z",
"host" => "vcyber",
"clientip" => "192.168.6.25",
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""
}
message 字段的值已經變了。
split
- 值是 hash
- 無默認值
用分隔符或字符分割一個字符串。只能應用在字符串上。
示例:把客戶端IP按英文句號分割成數組。
mutate {
split=>{"clientip"=>"."}
}
得到如下結果:
{
"message" => "192.168.6.25 - - [24/Apr/2016:01:25:53 +0800] GET \"/goLogin\" \"\" 8080 200 1692 23 \"http://10.1.8.193:8080/goMain\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\"",
"@version" => "1",
"@timestamp" => "2016-05-18T01:58:40.687Z",
"host" => "vcyber",
"clientip" => [
[0] "192",
[1] "168",
[2] "6",
[3] "25"
],
"identd" => "-",
"auth" => "-",
"timestamp" => "24/Apr/2016:01:25:53 +0800",
"http_method" => "GET",
"request" => "\"/goLogin\"",
"request_query" => "\"\"",
"port" => "8080",
"statusCode" => "200",
"bytes" => "1692",
"reqTime" => "23",
"referer" => "\"http://10.1.8.193:8080/goMain\"",
"userAgent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0\""
}
strip
- 值是數組 array
- 無默認值
去掉字段首尾的空格。
示例:
filter {
mutate {
strip =>["field1","field2"]}}
update
- 值是 hash
- 無默認值
Update an existing field with a new value. If the field does not exist, then no action will be taken.
示例:
filter { mutate { update =>{"sample"=>"My new message"}}}