在前篇幾十條業務線日志系統如何收集處理?中已經介紹了Flume的眾多應用場景,那此篇中先介紹如何搭建單機版日志系統。
環境
CentOS7.0
Java1.8
下載
官網下載 http://flume.apache.org/download.html
當前最新版 apache-flume-1.7.0-bin.tar.gz
下載后上傳到CentOS中的/usr/local/ 文件夾中,並解壓到當前文件中重命名為flume170 /usr/local/flume170
tar -zxvf apache-flume-1.7.0-bin.tar.gz
安裝配置
修改 flume-env.sh 配置文件,主要是添加JAVA_HOME變量設置
JAVA_HOME=/usr/lib/jvm/java8
設置Flume的全局變量
打開profile
vi /etc/profile
添加
export FLUME=/usr/local/flume170
export PATH=$PATH:$FLUME/bin
然后使環境變量生效
source /etc/profile
驗證是否安裝成功
flume-ng version
測試小實例
參考網上Spool類型的示例
Spool監測配置的目錄下新增的文件,並將文件中的數據讀取出來。需要注意兩點:
1) 拷貝到spool目錄下的文件不可以再打開編輯。
2) spool目錄下不可包含相應的子目錄
創建agent配置文件
# vi /usr/local/flume170/conf/spool.conf a1.sources = r1 a1.channels = c1 a1.sinks = k1 # Describe/configure the source a1.sources.r1.type = spooldir a1.sources.r1.channels = c1 a1.sources.r1.spoolDir =/usr/local/flume170/logs a1.sources.r1.fileHeader = true # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Describe the sink a1.sinks.k1.type = logger a1.sinks.k1.channel = c1
spoolDir:設置監控的文件夾,當有文件進入時會讀取文件的內容再通過sink發送,發送完成后會在文件名后加上.complete
啟動flume agent a1
/usr/local/flume170/bin/flume-ng agent -c . -f /usr/local/flume170/conf/spool.conf -n a1 -Dflume.root.logger=INFO,console
追加一個文件到/usr/local/flume170/logs目錄
# echo "spool test1" > /usr/local/flume170/logs/spool_text.log
在控制台,可以看到以下相關信息:
14/08/10 11:37:13 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10 11:37:13 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10 11:37:14 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/flume170/logs/spool_text.log to/usr/local/flume170/logs/spool_text.log.COMPLETED 14/08/10 11:37:14 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10 11:37:14 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10 11:37:14 INFO sink.LoggerSink: Event: { headers:{file=/usr/local/flume170/logs/spool_text.log} body: 73 70 6F 6F 6C 20 74 65 73 74 31 spool test1 } 14/08/10 11:37:15 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10 11:37:15 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10 11:37:16 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10 11:37:16 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10 11:37:17 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
出現上面的內容就表明已經可以運行了,整個安裝過程很簡單,主要是配置。
至於分布式的需要設置source和sink。

如上圖,將每個業務中的Flume產生的日志再用一個Flume來接收匯總,然后將匯總后的日志統一發送給KafKa作統一處理,最后保存到HDFS或HBase中。上圖中,每個業務中的Flume可以做負載和主備,由此可以看出有很強的擴展性。