07_Flume_regex interceptor實踐


 實踐一:regex filter interceptor

1、目標場景

regex filter interceptor的作用:

1)將event body的內容和配置中指定的正則表達式進行匹配
2)如果內容匹配,則將該event丟棄
3)如果內容不匹配,則將該event放行

 

2、Flume Agent配置文件

# 01 define agent name, source/sink/channel 
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 02 source,http,jsonhandler
a1.sources.r1.type = http
a1.sources.r1.bind = master
a1.sources.r1.port = 6666
a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler

# 03 regex filter interceptor, match event body for filter a1.sources.r1.interceptors = i1 a1.sources.r1.interceptors.i1.type = regex_filter a1.sources.r1.interceptors.i1.regex = ^[0-9]*$ # filter matched event a1.sources.r1.interceptors.i1.excludeEvents = true  

# 04 logger sink
a1.sinks.k1.type = logger

# 05 channel,memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 06 bind source,sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3、驗證regex filter interceptor

1) 通過curl -X POST -d 'json數據' 發送帶有不同body的HTTP請求,其中有1個滿足regex

 

2)觀察終端打印出的event,body為1234的event被過濾, 並沒有出現

 

 4、regex filter interceptor的官方文檔

 

 

實踐二:regex extractor interceptor

1、目標場景

regex extractor interceptor的作用:
1)將event body的內容和配置中指定的正則表達式進行匹配
2)如果內容匹配,將配合配置文件中給定的key, 組成key:value添加到event的header中
3)event body中的內容不會變化

 

2、Flume Agent的配置文件

# 01 define agent name, source/sink/channel 
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 02 source,http,jsonhandler
a1.sources.r1.type = http
a1.sources.r1.bind = master
a1.sources.r1.port = 6666
a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler

# 03 regex extractor interceptor,match event body to extract character and digital
a1.sources.r1.interceptors = i1  
a1.sources.r1.interceptors.i1.type = regex_extractor
a1.sources.r1.interceptors.i1.regex = (^[a-zA-Z]*)\\s([0-9]*$) # regex匹配並進行分組,匹配結果將有兩個部分, 注意\s空白字符要進行轉義 # specify key for 2 matched part a1.sources.r1.interceptors.i1.serializers = s1 s2 # key name a1.sources.r1.interceptors.i1.serializers.s1.name = word a1.sources.r1.interceptors.i1.serializers.s2.name = digital 

# 04 logger sink
a1.sinks.k1.type = logger

# 05 channel,memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 06 bind source,sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3、驗證regex extractor interceptor

1) 通過curl -X POST -d 'json數據'的方式發送HTTP請求,body中的內容為"shayzhang 1234", 其中shayzhang,1234將被正則表達式匹配

 

2) 觀察logger打印到終端的event,header中將增加兩部分 word:shayzhang, digital:1234

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM