Flume監聽文件目錄sink至hdfs配置


一:flume介紹

        Flume是一個分布式、可靠、和高可用的海量日志聚合的系統,支持在系統中定制各類數據發送方,用於收集數據;同時,Flume提供對數據進行簡單處理,並寫到各種數據接受方(可定制)的能力。,Flume架構分為三個部分 源-Source,接收器-Sink,通道-Channel

 

二:配置文件

    此配置文件source為一個目錄,注意,該目錄下的文件應為只讀,不可寫,且文件名不能相同,采用的channels為file,sink為hdfs,此處往hdfs寫的策略是當時間達到3600s或者文件大小達到128M。

agent1.sources = spooldirSource
agent1.channels = fileChannel
agent1.sinks = hdfsSink

agent1.sources.spooldirSource.type=spooldir
agent1.sources.spooldirSource.spoolDir=/data/lwq/new_log
agent1.sources.spooldirSource.channels=fileChannel

agent1.sinks.hdfsSink.type=hdfs
agent1.sinks.hdfsSink.hdfs.path=hdfs://dev228:8020/raw/lwq/%y-%m-%d
agent1.sinks.hdfsSink.hdfs.filePrefix=lwq
agent1.sinks.sink1.hdfs.round = true
# Number of seconds to wait before rolling current file (0 = never roll based on time interval)
agent1.sinks.hdfsSink.hdfs.rollInterval = 3600
# File size to trigger roll, in bytes (0: never roll based on file size)
agent1.sinks.hdfsSink.hdfs.rollSize = 128000000
agent1.sinks.hdfsSink.hdfs.rollCount = 0
agent1.sinks.hdfsSink.hdfs.batchSize = 1000

#Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.
agent1.sinks.hdfsSink.hdfs.roundValue = 1
agent1.sinks.hdfsSink.hdfs.roundUnit = minute
agent1.sinks.hdfsSink.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfsSink.channel=fileChannel
agent1.sinks.hdfsSink.hdfs.fileType = DataStream


agent1.channels.fileChannel.type = file
agent1.channels.fileChannel.checkpointDir=/usr/share/apache-flume-1.5.0-bin/checkpoint
agent1.channels.fileChannel.dataDirs=/usr/share/apache-flume-1.5.0-bin/dataDir

 

三:啟動命令

  

1 ${FLUME_HOME}/bin/flume-ng agent --conf ./conf/ -f conf/flume-site.xml -Dflume.root.logger=DEBUG,console -n agent1 > log.log 2>&1 &

 

     

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM