Flume案例之采集特定目錄的數據到HDFS



一,准備環境

  CentOs7,jdk1.7,hadoop -2.6.1, apache-flume-1.6.0-bin.tar.gz

二,編寫配置文件

        在/home/flume/conf的目錄下  創建 配置文件

#定義三大組件的名稱
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1

# 配置source組件
agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /home/data
agent1.sources.source1.fileHeader = false

#配置攔截器
agent1.sources.source1.interceptors = i1
agent1.sources.source1.interceptors.i1.type = host
agent1.sources.source1.interceptors.i1.hostHeader = hostname

# 配置sink組件
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path =hdfs://server1:9000/flume/collection/%y-%m-%d/%H-%M   #按時間的格式命名
agent1.sinks.sink1.hdfs.filePrefix = access_log
agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
agent1.sinks.sink1.hdfs.batchSize= 100
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat =Text
agent1.sinks.sink1.hdfs.rollSize = 102400
agent1.sinks.sink1.hdfs.rollCount = 1000000
agent1.sinks.sink1.hdfs.rollInterval = 60
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true


# 配置channels組件
agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive = 120
agent1.channels.channel1.capacity = 500000
agent1.channels.channel1.transactionCapacity = 600


# 配置組件關系
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

  在/home下創建data文件夾

三,運行程序

  在/home/flume 目錄下運行代碼

 bin/flume-ng agent -c conf -f conf/hdfs-logger.conf -n agent1  -Dflume.root.logger=INFO,console

  成功后,向data中添加txt文件。

四,查看結果

  用HDFS查看Flume目錄下的結果收集文件。

五,錯誤糾正

Resources are low on NN. Please add or free up more resources then turn off safe mode manually.
NOTE:  If you turn off safe mode before adding resources, the NN will immediately return to safe mode. 
Use "hdfs dfsadmin -safemode leave" to turn safe mode off.

  在hadoop的目錄下運行代碼:

 bin/hadoop  dfsadmin -safemode leave

  

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM